Unicode 4.0 updates to unicodedata?

D

David Opstad

Hi, all! I'm relatively new to Python, but have definitely fallen in
love with it. It reminds me of Mesa (old Xerox development language) and
LISP a bit.

Anyway, on to the question. Now that Unicode 4.0 has been released (just
got my copy today), any guesses on how long before the unicodedata
module will be updated to include all the new names? How do things like
that work, anyway; is there somebody whose task it is to update that, or
are they awaiting volunteers to help out? And once the module is
updated, is it generally usable on earlier Python releases (I'm running
the 2.2 that came with the OS X developer package for Jaguar)?

Cheers!

Dave Opstad
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

David Opstad said:
Anyway, on to the question. Now that Unicode 4.0 has been released (just
got my copy today), any guesses on how long before the unicodedata
module will be updated to include all the new names?

It might happen for Python 2.4, but by the time Python 2.4 is
released, the Unicode 4.0 database might get skipped, and Python might
incorporate Unicode 4.2 (or some such) instead.

The tricky part is that IDNA specifies Unicode 3.2 as the basis of
international domain names, so some technology must be found to
incorporate two versions of the database in Python, without adding too
much overhead.
How do things like that work, anyway; is there somebody whose task
it is to update that, or are they awaiting volunteers to help out?

In general, it would be somebody's task (i.e. mine) to incorporate a
new version. However, since this is more than running the generator
again (as actual code changes have to go with it), contributions are
welcome.
And once the module is updated, is it generally usable on earlier
Python releases (I'm running the 2.2 that came with the OS X
developer package for Jaguar)?

If you want to backport that database yourself, you could just as well
create your own version of the Unicode 4.0 database. Just run the
generator, and rename the unicodedata module to unicodedata40 (inside
the module's source code). Python won't then use this database
internally (for .is*, and .upper, ...), but you could readily invoke
the unicodedata40 functions yourself.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top