How to get Unicode attributes of a character?

golubovsky · Jan 5, 2007

Hi,

Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

Thanks.

Bart Van der Donck · Jan 5, 2007

Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

I think you're mixing a few things.

To get the unicode code point from a character:

alert('L'.charCodeAt(0))

To find out if a string is a digit:

if (/^\d+$/.test('456')) { alert('is digit') }

To find out if a string is uppercase:

if (/^[A-Z]+$/.test('ADQ')) { alert('is upper') }

More info: http://www.merlyn.demon.co.uk/js-valid.htm

Martin Honnen · Jan 5, 2007

Bart said:
Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

Click to expand...

To find out if a string is uppercase:

if (/^[A-Z]+$/.test('ADQ')) { alert('is upper') }

The original poster seems to be looking for something different. Unicode
defines character categories and blocks that contain quite a lot more
letters than the Latin A-Z.

Neither the regular expression language in ECMAScript edition 3 nor the
string functions have much support for that, besides toUpperCase and
toLowerCase respectively toLocaleLowerCase and toLocaleUpperCase going
beyond a-z/A-Z.

Regular expression language in Java or .NET have more support for such
Unicode categories (e.g. \p{Lu} for all upper case letters), with
JavaScript you are currently forced to list the ranges you are
interested in yourself.

golubovsky · Jan 5, 2007

Hi,

Martin said:
The original poster seems to be looking for something different. Unicode
defines character categories and blocks that contain quite a lot more
letters than the Latin A-Z.

Exactly. Those attributes (as well as simple case mapings) that are
defined in the Unicode characters database (a large comma-separated
text file distributed from Unicode.org).

Regular expression language in Java or .NET have more support for such
Unicode categories (e.g. \p{Lu} for all upper case letters), with
JavaScript you are currently forced to list the ranges you are
interested in yourself.

Well, to get a category or case mapping for a character, using of
regexps is a bit of overkill (and this type of regexps is not supported
anyway). Looks like I'll have to compile the characters database myself
(I did that for C/Haskell, so there shouldn't be any trouble, just size
increase).

Thanks.

How can I get my menu inside of a menu to function properly?	1	Jan 19, 2023
Unicode-AGE of a character?	5	Jan 10, 2012
How to replace UniCode representation with actual character?	6	Dec 18, 2013
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
How can I get a character, given its Unicode index?	5	Aug 30, 2009
I am trying to make an audio player, how do I get the selected file to be playable?	5	Mar 29, 2022
I'm tempted to quit out of frustration	1	Aug 13, 2023
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023

How to get Unicode attributes of a character?

golubovsky

Bart Van der Donck

Martin Honnen

golubovsky

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads