Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 19, 2011 08:05 pm

Optimizing Long Lists Of YesNo Values With JavaScript

Very frequently in Web development (and programming in general), you need to store a long list of boolean values (yes/no, true/false, checked/unchecked… you get the idea) into something that accepts only strings. Maybe its because you want to store them in localStorage or in a cookie, or send them through the body of an HTTP request. Ive needed to do this countless times.

The last time I stumbled on such a case wasnt with my own code. It was when Christian Heilmann showed me his then new slide deck, with a cool feature where you could toggle the visibility of individual slides in and out of the presentation. On seeing it, I was impressed. Looking more closely, though, I realized that the checkbox states did not persist after the page reloaded. So, someone could spend a long time carefully tweaking their slides, only to accidentally hit F5 or crash their browser, and then — boom! — all their work would be lost. Christian told me that he was already working on storing the checkbox states in localStorage. Then, naturally, we endlessly debated the storage format. That debate inspired me to write this article, to explore the various approaches in depth.

Using An Array

We have two (reasonable) ways to model our data in an array. One is to store true/false values, like so:

[false, true, true, false, false, true, true]

The other is to store an array of 0s and 1s, like so:

[0, 1, 1, 0, 0, 1, 1]

Whichever solution we go with, we will ultimately have to convert it to a string, and then convert it back to an array when it is read. We have two ways to proceed: either with the old Array#join() (or Array#toString()) and String#split(), or with the fancier JSON.stringify() and JSON.parse().

With the JSON way, the code will be somewhat shorter, although it is the JavaScript equivalent of slicing bread with a chainsaw. The impact on performance doesnt seem to be significant, but youd be cutting down browser support quite a bit.

The main drawback of using array-based strings is their size in bytes. If you go with the number method, you would use almost 2 bytes (or characters) per number (or, more precisely, 2N − 1, since youd need one delimiter per number, except for the last one):

[0, 1, 1, 0, 0, 1, 1].toString().length // 13, for 7 values

So, for 512 numbers, that would be 1023 bytes (or 1 KB). If you go with the boolean method, its even worse:

[false, true, true, false, false, true, true].toString().length // 37, also for 7 values

Thats around 5 to 6 characters per value, so 2560 to 3072 bytes for 512 numbers (which is 2.5 to 3 KB). JSON.stringify() even wastes 2 more characters in each case, for the opening and closing brackets, but its advantage is that you get your original value types back with JSON.parse() instead of strings.

Using A String

Using a string saves some space, because no delimiters are involved. For example, if you go with the number approach and store strings like '01001101010111', you are essentially storing one character per value, which is 100% better than the better of the two previous approaches. You can then get the values into an array by using String#split:

'01001101010111'.split(''); // ['0','1','0','0','1','1','0','1','0','1','0','1','1','1']

Or you could just loop over the string using string.charAt(i) — or even the string indexes (string[i]), if you dont care about older browsers.

Using Bitfields

Did the previous method make you think of binary numbers? Its not just you. The concept of bitfields is quite popular in other programming languages, but not so much in JavaScript. In a nutshell, bitfields are used to pack a lot of boolean values into the bits of the boolean representation of a number. For example, if you have eight values (true, false, false, true, false, true, true, false), the number would be 10010110 in binary; so, 150 in decimal and 96 in hex. Thats 2 characters instead of 8, so 75% saved. In general, 1 digit in the hex representation corresponds to exactly 4 bits. (Thats because 16 = 24. In general, in a base2n system, you can pack n bits into every base2n digit.) So, we werent lucky with that 75%; its always that much.

Thus, instead of storing that string as a string and using 1 character per value, we can be smarter and convert it to a (hex) number first. How do we do that? Its no more than a line of code:

parseInt('10010110', 2).toString(16); // returns '96'

And how do we read it back? Thats just as simple:

parseInt('96', 16).toString(2); // returns  '10010110'

From this point on, we can follow the same process as the previous method to loop over the values and do something useful with them.

Can We Do Better?

In fact, we can! Why convert it to a hex (base 16) number, which uses only 6 of the 26 alphabet letters? The Number#toString() method allows us to go up to base 36 (throwing a RangeError for >= 37), which effectively uses all letters in the alphabet, all the way up to z! This way, we can have a compression of up to 6 characters for 32 values, which means saving up to 81.25% compared to the plain string method! And the code is just as simple:

parseInt( '1001011000', 2).toString(36); // returns 'go' (instead of '258', which would be the hex version)parseInt('go', 36).toString(2); // returns  '1001011000'

Base 64

The previous method allows us to go up only to base 36. But in modern JavaScript, we can use the methods atob() and btoa() to utilize a base 64 encoding system. Thats 64 = 26, which means we can pack 6 yes/no characters into a single character, saving us 83.3% compared to the plain string method. Lets look at the code:

atob(parseInt('1001011000', 2)); // returns 'M'parseInt(btoa('M')).toString(2) // returns '1001011000'

For some of you, this will be enough. But I can almost hear the more inquisitive minds out there shouting, "But we have 255 symbols, we are still not using strings to their full potential! And youd be right. There is a reason why every time you open a binary file in a text editor, you get weird symbols mixed with numbers, uppercase letters, lowercase letters and whatnot. Every character in an ASCII string is a byte (8 bits), which means that if we use the right compression algorithm, we should be able to store 8 yes/no values in it (saving 87.5%).

The problem is that JavaScript doesnt offer a built-in way to do that, so the code becomes a bit more complicated.

Packing 8 Values Into One Character

You can use String.fromCharCode to get the individual characters. It accepts a numerical value of up to 65,535 and returns a character (and for values greater than that, it returns an empty string). In this case, however, were going to use character codes of up to 255 (8 bits).

(Aside: Unicode characters are variable width, so they take up from 1 up to 4 bytes per character. Thus, if you pack your data in characters beyond the first 255, you will be consuming 2 bytes instead of 1, so you might as well have used 2 ASCII characters instead — the file size would be the same. Only if, for some reason, you have a limit that is set in number of (Unicode) characters, then using the full range might be a good idea. But in most cases, the main concern is byte size, so its entirely pointless.)

So, we have to split our string into chunks of 8 characters in size. We can do that through .match(/.{1,8}/g). To sum up, the full solution would look like this:

function pack(/* string */ values) {    var chunks = values.match(/.{1,8}/g), packed = '';    for (var i=0; i < chunks.length; i++) {        packed += String.fromCharCode(parseInt(chunks[i], 2));    }    return packed;}function unpack(/* string */ packed) {    var values = '';    for (var i=0; i < packed.length; i++) {        values += packed.charCodeAt(i).toString(2);    }    return values;}

It wasnt that hard, was it?

With these few lines of code, you can pack the aforementioned 512 values into — drum roll, please — 64 characters!

Quite an improvement over our original 1 KB (with the array method), isnt it?

Limitations

Numbers in JavaScript have limits. For the methods discussed here that involve an intermediate state of converting to a number, the limit appears to be 1023 yes/no values, because parseInt('11111111', 2) returns Infinity when the number of aces is bigger than 1023. This limit does not apply to the last method, because were only converting blocks of bits instead of the whole thing. And, of course, it doesnt apply to the first two methods (array and string) because they dont involve packing the values into an integer.

"I Think You Took It A Bit Too Far

This might be overkill for some cases. But it will definitely come in handy when you want to store a lot of boolean values in any limited space that can only store strings. And no optimization is overkill for things that go through the wire frequently. For example, cookies are sent on every single request, so they should be as tiny as possible. Another use case would be online multiplayer games, for which response times should be lightning-fast, otherwise the games wouldnt be fun.

And even if this kind of optimization isnt your thing, I hope youve found the thought process and the code involved educational.

(al)

Image on front page created by Ruiwen Chua.


Lea Verou for Smashing Magazine, 2011.


Original Link:

Share this article:    Share on Facebook
No Article Link

Smashing Magazine

Smashing Magazine delivers useful and innovative information to Web designers and developers.

More About this Source Visit Smashing Magazine