Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
December 1, 2020 11:45 am GMT

Do You Actually Know What A String In JavaScript Is? Here's What I Found.

We preferred to think that String in JavaScript is an array of characters.

const name = Nickconsole.log(name.length) // 4
Enter fullscreen mode Exit fullscreen mode

Variable name has 4 characters N, i, c, k and length is also 4.

Everything seems logical.

Lets go further and add emoji to my name.

const name = Nick console.log(name.length) // 7
Enter fullscreen mode Exit fullscreen mode

Hmm, strange.

Variable name must have 6 characters N, i, c, k, (whitespace) and

But have 7.

It seems like the bull has 2 characters.

const emoji = console.log(emoji.length) // 2
Enter fullscreen mode Exit fullscreen mode

Interesting

Lets figure out why.

We go to the official documentation of ECMAScript (its a programming language on which JavaScript is based).

Scroll to 6.1.4 The String Type.

And find this:

The String type is the set of all ordered sequences of zero or more 16-bit unsigned integer values (elements) up to a maximum length of 2 - 1 elements. The String type is generally used to represent textual data in a running ECMAScript program, in which case each element in the String is treated as a UTF-16 code unit value.

So string in JavaScript is a sequence of UTF-16 code unit values.

What is UTF-16?

A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point to a unique byte sequence.

One UTF-16 code unit value is a number from 0x0000 to 0xFFFF.

What is 0x0000 and 0xFFFF?

0x represent the hexadecimal numeral system, often shortened to "hex", is a numeral system made up of 16 symbols (base 16). The standard numeral system is called decimal (base 10) and uses ten symbols: 0,1,2,3,4,5,6,7,8,9. Hexadecimal uses the decimal numbers and six extra symbols.

If we convert my name Nick to UTF-16 (like JavaScript see it) we will get 0x004e 0x0069 0x0063 0x006b.

0x004e = N

0x0069 = i

0x0063 = c

0x006b = k

But how does JavaScript treat emojis?

In UTF-16, Unicode characters from the Basic Multilingual Plane (contains characters for almost all modern languages) are encoded with one code unit.

Other characters from the non-Basic Multilingual Plane (emojis, musical notations, cards, hieroglyphs, etc) require two code units.

So UTF-16 format represents emoji with two code units (0Xd83d 0Xdc03).

Thats why .length gives 2.

To consolidate everything we have learned, lets play a little with Unicode and JavaScript.

const name = Nickconst nameInUnicode = \u004e\u0069\u0063\u006bconsole.log(name === nameInUnicode) // trueconsole.log(nameInUnicode.length) // 4const fullName = Nick const fullNameInUnicode = \u004e\u0069\u0063\u006b\u0020\ud83d\udc03console.log(fullName === fullNameInUnicode) // trueconsole.log(fullNameInUnicode.length) // 7
Enter fullscreen mode Exit fullscreen mode

What is \u?

A Unicode character escape sequence represents the single Unicode code point formed by the hexadecimal number following the \u or \U characters.

In the end

Knowing that string in JavaScript is a sequence of UTF-16 code unit values can save you from unpredictable bugs when you work with different characters not from BMP, like emojis.

If you like this article, share it with your friends and follow me on Twitter.

Also, every week I send out a "321" newsletter with 3 tech news, 2 articles, and 1 piece of advice for you.

Subscribe to my 321 newsletter here


Original Link: https://dev.to/nickbulljs/do-you-actually-know-what-string-in-javascript-is-here-s-what-i-found-23l7

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To