Python 2.1 / 2.3: xreadlines not working with codecs.open

Eric Brunel · Jun 23, 2005

Hi all,

I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

For example, if a file foo.txt contains some text encoded in latin1:

import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()]

Click to expand...

Click to expand...

['\xe9\xe0\xe7\xf9\n']

But:

[u'\ufffd\ufffd']

The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

I tested with Python 2.1 and 2.3 on Linux and Windows: same result (I haven't Python 2.4 installed here)

Can anybody confirm the problem? Is this a bug? I searched this usegroup and the known Python bugs, but the problem did not seem to be reported yet.

TIA

Eric Brunel · Jun 28, 2005

Hi all,

I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

For example, if a file foo.txt contains some text encoded in latin1:

import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()]

Click to expand...

['\xe9\xe0\xe7\xf9\n']

But:

import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
f.readlines()

Click to expand...

Click to expand...

[u'\ufffd\ufffd']

The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

Replying to myself. One more funny thing:

import codecs, xreadlines
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in xreadlines.xreadlines(f)]

Click to expand...

Click to expand...

[u'\ufffd\ufffd']

So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this happens in Python 2.3, but also in Python 2.1, where the implementation for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's escaping me here... Reading the source didn't help.

At least, it does provide a workaround...

Peter Otten · Jun 28, 2005

Eric said:
I just found a problem in the xreadlines method/module when used with
codecs.open: the codec specified in the open does not seem to be taken
into account by xreadlines which also returns byte-strings instead of
unicode strings.

So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this
happens in Python 2.3, but also in Python 2.1, where the implementation
for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's
escaping me here... Reading the source didn't help.

codecs.StreamReaderWriter seems to delegate everything it doesn't implement
itself to the underlying file instance which is ignorant of the encoding.
The culprit:

def __getattr__(self, name,
getattr=getattr):

""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)

At least, it does provide a workaround...

Note that the xreadlines module hasn't made it into Python 2.4.

Peter

Richard Brodie · Jun 28, 2005

Eric Brunel said:
Replying to myself. One more funny thing:

import codecs, xreadlines
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in xreadlines.xreadlines(f)]

Click to expand...

Click to expand...

[u'\ufffd\ufffd']

You've specified utf-8 as the encoding instead of iso8859-1,
by the way.

u'a' in string.letters fails: a Python 2.3 bug?	2	Oct 10, 2003
Python 3.1.1 bytes decode with replace bug	9	Oct 24, 2009
Fatal Python error with Py_InitModule (Python 2.3)	1	Aug 24, 2003
Python 2.3: socket.gethostbyname(socket.gethostname()) fails?	0	Oct 27, 2003
q: how to output a unicode string?	5	Apr 24, 2007
What happened with python? messed strings?	6	Apr 20, 2008
Python port to Swiss Ephemeris work with 2.3?	6	Sep 4, 2003
Error with Python 2.3 as a shared Library	1	Aug 13, 2003

Python 2.1 / 2.3: xreadlines not working with codecs.open

Eric Brunel

Eric Brunel

Peter Otten

Richard Brodie

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads