How to get Unicode attributes of a character?

G

golubovsky

Hi,

Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

Thanks.
 
B

Bart Van der Donck

Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

I think you're mixing a few things.

To get the unicode code point from a character:

alert('L'.charCodeAt(0))

To find out if a string is a digit:

if (/^\d+$/.test('456')) { alert('is digit') }

To find out if a string is uppercase:

if (/^[A-Z]+$/.test('ADQ')) { alert('is upper') }

More info: http://www.merlyn.demon.co.uk/js-valid.htm
 
M

Martin Honnen

Bart said:
Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

To find out if a string is uppercase:

if (/^[A-Z]+$/.test('ADQ')) { alert('is upper') }

The original poster seems to be looking for something different. Unicode
defines character categories and blocks that contain quite a lot more
letters than the Latin A-Z.

Neither the regular expression language in ECMAScript edition 3 nor the
string functions have much support for that, besides toUpperCase and
toLowerCase respectively toLocaleLowerCase and toLocaleUpperCase going
beyond a-z/A-Z.

Regular expression language in Java or .NET have more support for such
Unicode categories (e.g. \p{Lu} for all upper case letters), with
JavaScript you are currently forced to list the ranges you are
interested in yourself.
 
G

golubovsky

Hi,

Martin said:
The original poster seems to be looking for something different. Unicode
defines character categories and blocks that contain quite a lot more
letters than the Latin A-Z.

Exactly. Those attributes (as well as simple case mapings) that are
defined in the Unicode characters database (a large comma-separated
text file distributed from Unicode.org).
Regular expression language in Java or .NET have more support for such
Unicode categories (e.g. \p{Lu} for all upper case letters), with
JavaScript you are currently forced to list the ranges you are
interested in yourself.

Well, to get a category or case mapping for a character, using of
regexps is a bit of overkill (and this type of regexps is not supported
anyway). Looks like I'll have to compile the characters database myself
(I did that for C/Haskell, so there shouldn't be any trouble, just size
increase).

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
DewittMill
Top