double-byte character

M

Martin Honnen

tony said:
is it possible to detect any double-byte character in the text? thanks.

Since JavaScript 1.3 (in Netscape 4.06) and JScript 4 (in IE 4) the
strings in JavaScript are sequences of Unicode characters, you can
access any character in a string with
string.charAt(index)
and the Unicode character code of any character in a string with
string.charCodeAt(index)
There is no byte type in JavaScript 1.x and there is no access to the
internal byte representation of an Unicode character or a complete string.
The internal string representation choosen is usually UTF-16 so in that
sense all characters are double byte characters. But as said, as a
scripter you deal with sequences of Unicode characters and the internal
encoding in bytes does not matter for scripting.
 
S

Stephen Chalmers

tony wong said:
is it possible to detect any double-byte character in the text? thanks.

If you mean to detect the presence of any character whose hi-byte is non-zero:

if( /[\u0100-\uffff]/.test( text ) )
...
 
T

Thomas 'PointedEars' Lahn

Martin said:
The internal string representation choosen is usually UTF-16 so
in that sense all characters are double byte characters.

No, they are not. I thought a similar thing before (about UTF-8),
but this is not how UTF works. Additional code units (surrogate
pairs) are used if needed for a character, i.e. all Unicode
characters beyond code point 0xFFFF are represented in UTF-16/UCS2
by two 16-bit words or four bytes each.

<http://www.unicode.org/faq/basic_q.html#19>
<http://en.wikipedia.org/wiki/UTF-16/UCS-2>


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top