Missing unicode data?

  • Thread starter Klaus Alexander Seistrup
  • Start date
K

Klaus Alexander Seistrup

Hi group,

I just came across the following exception:

#v+

$ python
Python 2.4.2 (#2, Sep 30 2005, 21:19:01)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
$

#v-

When checking unicodedata.name() against each uchar in the file
/usr/share/unidata/UnicodeData-4.0.1d1b.txt that came with the
console-data package on my Ubuntu Linux installation a total of
1226 unicode characters seems to be missing from the unicodedata
module (2477 missing characters when checking against the latest
database from unicode.org¹). Is this a deliberate omission?

Cheers,
Klaus.

¹) http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
 
F

Fredrik Lundh

Klaus said:
When checking unicodedata.name() against each uchar in the file
/usr/share/unidata/UnicodeData-4.0.1d1b.txt that came with the
console-data package on my Ubuntu Linux installation a total of
1226 unicode characters seems to be missing from the unicodedata
module (2477 missing characters when checking against the latest
database from unicode.org¹). Is this a deliberate omission?

I'm pretty sure unicodename.name() doesn't look in the UnicodeData file
on your machine, nor in the latest file from unicode.org. in other
words, you get whatever version that was used to create the Unicode data
set in your Python distribution.

this is usually the version that was current when that Python version
was originally released (i.e. in your case, when 2.4 was released).

iirc, 2.4 uses Unicode 3.2, and 2.5 uses Unicode 4.1. to update, use
the tools under Tools/unicode.

</F>
 
K

Klaus Alexander Seistrup

Fredrik Lundh skrev:
I'm pretty sure unicodename.name() doesn't look in the Unicode-
Data file on your machine, nor in the latest file from unicode.org.

I am pretty sure of that, too. I was only using those files as a
reference against the unicode data that comes with my python interpreter.
in other words, you get whatever version that was used to create
the Unicode data set in your Python distribution.

I see.
iirc, 2.4 uses Unicode 3.2, and 2.5 uses Unicode 4.1. to update,
use the tools under Tools/unicode.

Thanks for the hint.

Mvh,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top