UTF Questions

F

Fuzzyman

I have a couple of questions about the UTF encodings.

The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

It possibly has something to do with my next question. I know that
unicode has (recently?) been expanded to include new character sets.
This means that the latest unicode standard can't be fully supported
with 2 bytes per character. As far as I know though, Python doesn't
(yet) support the extended version of unicode anyway ? Am I correct ?

Best Reagrds,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
S

Serge Orlov

Fuzzyman said:
I have a couple of questions about the UTF encodings.

The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

Probably because there is little demand for it. The most widespread
unicode encodings are utf-8 and utf-16
It possibly has something to do with my next question. I know that
unicode has (recently?) been expanded to include new character sets.
This means that the latest unicode standard can't be fully supported
with 2 bytes per character. As far as I know though, Python doesn't
(yet) support the extended version of unicode anyway ? Am I correct ?

Python does support them. PEP 261 has the answers for your questions.

Serge.
 
S

Serge Orlov

Fuzzyman said:
Thanks Serge.

You're welcome. While we at it, iconvcodec supports utf-32 and more. I have sent
a 2.4 windows build of iconvcodec module to the author. He promised to publish it
soon.

Serge.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fuzzyman said:
The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

Because nobody has contributed such an implementation.

Notice that this is really trivial to implement, with very few lines
of pure Python code. In fact, given a Unicode string s, the line

codecs.BOM_UTF32+array.array("i",map(ord,s)).tostring()

generates UTF-32 for the string s. Creating a codec on top of this
approach is left as an exercise for the reader.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top