str.isnumeric and Cuneiforms

M

Marco

Is it normal the str.isnumeric() returns False for these Cuneiforms?

'\U00012456'
'\U00012457'
'\U00012432'
'\U00012433'

They are all in the Nl category.

Marco
 
S

Steven D'Aprano

Is it normal the str.isnumeric() returns False for these Cuneiforms?

'\U00012456'
'\U00012457'
'\U00012432'
'\U00012433'

They are all in the Nl category.

Are you sure about that? Do you have a reference?

It seems to me that they are not:


py> c = '\U00012456'
py> import unicodedata
py> unicodedata.numeric(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not a numeric character


Although it is possible that unicodedata is buggy, or perhaps just
doesn't support the multilingual plane characters.
 
J

jmfauth

Is it normal the str.isnumeric() returns False for these Cuneiforms?

'\U00012456'
'\U00012457'
'\U00012432'
'\U00012433'
They are all in the Nl category.

Indeed there are, but Unicode (ver. 5.0.0) does not assign numeric
values to these code points.

Do not ask me, why?

jmf
 
M

Marco Buttu

Are you sure about that? Do you have a reference?

I I was just playing with Unicode on Python 3.3a:
from unicodedata import category, name
from sys import maxunicode
nl = [chr(c) for c in range(maxunicode + 1) \
.... if category(chr(c)).startswith('Nl')]
numerics = [chr(c) for c in range(maxunicode + 1) \
.... if chr(c).isnumeric()]
.... print(hex(ord(c)), category(c), unicodedata.name(c))
....
0x12432 Nl CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH
0x12433 Nl CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN
0x12456 Nl CUNEIFORM NUMERIC SIGN NIGIDAMIN
0x12457 Nl CUNEIFORM NUMERIC SIGN NIGIDAESH

So they are in the Nl category but are not "numerics", and that sounds
strange because other Cuneiforms are "numerics":
(True, False)
It seems to me that they are not:


py> c = '\U00012456'
py> import unicodedata
py> unicodedata.numeric(c)
Traceback (most recent call last):
File "<stdin>", line 1, in<module>
ValueError: not a numeric character

Exactly, as I wrote above, is that right?
 
M

Marco Buttu

Indeed there are, but Unicode (ver. 5.0.0) does not assign numeric
values to these code points.

help(unicodedata) says Python 3.3a refers to Unicode 6.0.0
 
M

Marco Buttu

Is it normal the str.isnumeric() returns False for these Cuneiforms?

'\U00012456'
'\U00012457'
'\U00012432'
'\U00012433'

They are all in the Nl category.

Marco

It's ok, I found that they don't have a number assigned in the
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt database.
 
J

jmfauth

It's ok, I found that they don't have a number assigned in theftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txtdatabase.

Good. I was about to send this information. I have all this (not
updated)
stuff locally on my hd.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top