UTF Questions

Fuzzyman · Mar 21, 2005

I have a couple of questions about the UTF encodings.

The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

It possibly has something to do with my next question. I know that
unicode has (recently?) been expanded to include new character sets.
This means that the latest unicode standard can't be fully supported
with 2 bytes per character. As far as I know though, Python doesn't
(yet) support the extended version of unicode anyway ? Am I correct ?

Best Reagrds,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Serge Orlov · Mar 21, 2005

Fuzzyman said:
I have a couple of questions about the UTF encodings.

The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

Probably because there is little demand for it. The most widespread
unicode encodings are utf-8 and utf-16

It possibly has something to do with my next question. I know that
unicode has (recently?) been expanded to include new character sets.
This means that the latest unicode standard can't be fully supported
with 2 bytes per character. As far as I know though, Python doesn't
(yet) support the extended version of unicode anyway ? Am I correct ?

Python does support them. PEP 261 has the answers for your questions.

Serge.

Fuzzyman · Mar 22, 2005

Thanks Serge.

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml

Serge Orlov · Mar 22, 2005

Fuzzyman said:
Thanks Serge.

You're welcome. While we at it, iconvcodec supports utf-32 and more. I have sent
a 2.4 windows build of iconvcodec module to the author. He promised to publish it
soon.

Serge.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Mar 27, 2005

Fuzzyman said:
The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

Because nobody has contributed such an implementation.

Notice that this is really trivial to implement, with very few lines
of pure Python code. In fact, given a Unicode string s, the line

codecs.BOM_UTF32+array.array("i",map(ord,s)).tostring()

generates UTF-32 for the string s. Creating a codec on top of this
approach is left as an exercise for the reader.

Regards,
Martin

Unicode questions	17	Oct 19, 2010
Proper use of the codecs module.	3	Aug 16, 2013
UTF-8 and strings	44	Jun 7, 2011
Unicode/UTF-8 confusion	1	Mar 15, 2008
UTF-8 and stdin/stdout?	5	May 28, 2008
Sniffing encoding type by looking at file BOM header	2	Mar 24, 2010
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
What the \xc2\xa0 ?!!	1	Sep 7, 2010

UTF Questions

Fuzzyman

Serge Orlov

Fuzzyman

Serge Orlov

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads