Python 2.1 / 2.3: xreadlines not working with codecs.open

Discussion in 'Python' started by Eric Brunel, Jun 23, 2005.

  1. Eric Brunel

    Eric Brunel Guest

    Hi all,

    I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

    For example, if a file foo.txt contains some text encoded in latin1:

    >>> import codecs
    >>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
    >>> [l for l in f.xreadlines()]

    ['\xe9\xe0\xe7\xf9\n']

    But:

    >>> import codecs
    >>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
    >>> f.readlines()

    [u'\ufffd\ufffd']

    The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

    I tested with Python 2.1 and 2.3 on Linux and Windows: same result (I haven't Python 2.4 installed here)

    Can anybody confirm the problem? Is this a bug? I searched this usegroup and the known Python bugs, but the problem did not seem to be reported yet.

    TIA
    --
    python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"
     
    Eric Brunel, Jun 23, 2005
    #1
    1. Advertising

  2. Eric Brunel

    Eric Brunel Guest

    On Thu, 23 Jun 2005 14:23:34 +0200, Eric Brunel <> wrote:

    > Hi all,
    >
    > I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.
    >
    > For example, if a file foo.txt contains some text encoded in latin1:
    >
    >>>> import codecs
    >>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
    >>>> [l for l in f.xreadlines()]

    > ['\xe9\xe0\xe7\xf9\n']
    >
    > But:
    >
    >>>> import codecs
    >>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
    >>>> f.readlines()

    > [u'\ufffd\ufffd']
    >
    > The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.


    Replying to myself. One more funny thing:

    >>> import codecs, xreadlines
    >>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
    >>> [l for l in xreadlines.xreadlines(f)]

    [u'\ufffd\ufffd']

    So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this happens in Python 2.3, but also in Python 2.1, where the implementation for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's escaping me here... Reading the source didn't help.

    At least, it does provide a workaround...
    --
    python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"
     
    Eric Brunel, Jun 28, 2005
    #2
    1. Advertising

  3. Eric Brunel

    Peter Otten Guest

    Eric Brunel wrote:

    > I just found a problem in the xreadlines method/module when used with
    > codecs.open: the codec specified in the open does not seem to be taken
    > into account by xreadlines which also returns byte-strings instead of
    > unicode strings.


    > So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this
    > happens in Python 2.3, but also in Python 2.1, where the implementation
    > for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's
    > escaping me here... Reading the source didn't help.


    codecs.StreamReaderWriter seems to delegate everything it doesn't implement
    itself to the underlying file instance which is ignorant of the encoding.
    The culprit:

    def __getattr__(self, name,
    getattr=getattr):

    """ Inherit all other methods from the underlying stream.
    """
    return getattr(self.stream, name)

    > At least, it does provide a workaround...


    Note that the xreadlines module hasn't made it into Python 2.4.

    Peter
     
    Peter Otten, Jun 28, 2005
    #3
  4. "Eric Brunel" <> wrote in message news:eek:...
    >
    > Replying to myself. One more funny thing:
    >
    > >>> import codecs, xreadlines
    > >>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
    > >>> [l for l in xreadlines.xreadlines(f)]

    > [u'\ufffd\ufffd']


    You've specified utf-8 as the encoding instead of iso8859-1,
    by the way.
     
    Richard Brodie, Jun 28, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sam
    Replies:
    1
    Views:
    405
  2. Brett Hedges

    Using xreadlines

    Brett Hedges, Feb 27, 2009, in forum: Python
    Replies:
    1
    Views:
    1,357
  3. Brett Hedges

    Re: Using xreadlines

    Brett Hedges, Feb 27, 2009, in forum: Python
    Replies:
    3
    Views:
    423
    Roy H. Han
    Feb 27, 2009
  4. Ryan McGuire
    Replies:
    4
    Views:
    1,215
    Chris Rebert
    Aug 27, 2009
  5. Karl Knechtel
    Replies:
    2
    Views:
    387
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page