Non-unicode strings & Python.

Jonathon Blake · Aug 31, 2004

All:

Question

Python is currently Unicode Compliant.

What happens when strings are read in from text files that were
created using GB 2312-1980, or KPS 9566-2003, or other, equally
obscure code ranges?

The idea is to read text in the file format, and replace it with the
appropriate Unicode character,then write it out as a new text file.
[Trivial to program, but incredibly time consuming to actually code]

xan

jonathon

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Aug 31, 2004

Jonathon said:
What happens when strings are read in from text files that were
created using GB 2312-1980, or KPS 9566-2003, or other, equally
obscure code ranges?

Python has two kinds of strings: byte strings, and Unicode strings.
If you read data from a file, you get byte strings - i.e. a sequence
of bytes representing literally the encoded contents of the file.
If you want Unicode strings, you need to use codecs.open.

The idea is to read text in the file format, and replace it with the
appropriate Unicode character,then write it out as a new text file.
[Trivial to program, but incredibly time consuming to actually code]

Not at all:

data = codecs.open(filename, "r", encoding="gb2312")
codecs.open(newfile, "w", encoding="utf-8").write(data)

assuming that by "appropriate Unicode character" you actually mean
"I want to write the file encoded as UTF-8".

Regards,
Martin

Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
unable to print Unicode characters in Python 3	12	Jan 26, 2009
API design for Python 2 / 3 compatibility	3	Apr 13, 2013
Shrinky-dink Python (also, non-Unicode Python build is broken)	10	Jan 16, 2006
Revised PEP 349: Allow str() to return unicode strings	2	Aug 22, 2005
PEP 383: Non-decodable Bytes in System Character Interfaces	1	Apr 22, 2009
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
Q: The `print' statement over Unicode	9	May 4, 2005

Non-unicode strings & Python.

Jonathon Blake

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads