Using codecs.EncodedFile() with Python 2.5

Discussion in 'Python' started by David Hughes, Jan 3, 2007.

  1. David Hughes

    David Hughes Guest

    I used this function successfully with Python 2.4 to alter the encoding
    of a set of database records from latin-1 to utf-8, but the same
    program raises an exception using Python 2.5. This small example shows
    the problem:

    import codecs
    fo = open('test.dat', 'w')
    fo.write('G\xe2teaux')
    fo.close()

    fi = open("test.dat",'r')
    fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
    astring = fx.readline()
    print astring
    ustring = unicode(astring, 'utf-8' )
    print repr(ustring)
    print ustring.encode('latin-1')
    print ustring.encode('utf-8')

    Python 2.4 gives:

    Gâteaux
    u'G\xe2teaux'
    Gâteaux
    Gâteaux

    which I believe is correct, while 2.5 produces

    Traceback (most recent call last):
    File "test_codec.py", line 8, in <module>
    astring = fx.readline()
    File "C:\Python25\lib\codecs.py", line 709, in readline
    data = self.reader.readline()
    File "C:\Python25\lib\codecs.py", line 471, in readline
    data = self.read(readsize, firstline=True)
    File "C:\Python25\lib\codecs.py", line 418, in read
    newchars, decodedbytes = self.decode(data, self.errors)
    UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
    invalid data

    Is there a genuine problem here, or have I been misusing this function?
    --
    Regards
    David Hughes
    David Hughes, Jan 3, 2007
    #1
    1. Advertising

  2. David Hughes

    Peter Otten Guest

    David Hughes wrote:

    > I used this function successfully with Python 2.4 to alter the encoding
    > of a set of database records from latin-1 to utf-8, but the same
    > program raises an exception using Python 2.5. This small example shows
    > the problem:
    >
    > import codecs
    > fo = open('test.dat', 'w')
    > fo.write('G\xe2teaux')
    > fo.close()
    >
    > fi = open("test.dat",'r')
    > fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
    > astring = fx.readline()
    > print astring
    > ustring = unicode(astring, 'utf-8' )
    > print repr(ustring)
    > print ustring.encode('latin-1')
    > print ustring.encode('utf-8')
    >
    > Python 2.4 gives:
    >
    > Gâteaux
    > u'G\xe2teaux'
    > Gâteaux
    > Gâteaux
    >
    > which I believe is correct, while 2.5 produces
    >
    > Traceback (most recent call last):
    > File "test_codec.py", line 8, in <module>
    > astring = fx.readline()
    > File "C:\Python25\lib\codecs.py", line 709, in readline
    > data = self.reader.readline()
    > File "C:\Python25\lib\codecs.py", line 471, in readline
    > data = self.read(readsize, firstline=True)
    > File "C:\Python25\lib\codecs.py", line 418, in read
    > newchars, decodedbytes = self.decode(data, self.errors)
    > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
    > invalid data
    >
    > Is there a genuine problem here, or have I been misusing this function?


    This is indeed a bug in Python 2.5. Fixed in subversion.

    http://svn.python.org/view/python/trunk/Lib/codecs.py?rev=52517&view=log

    Peter
    Peter Otten, Jan 3, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Radovan Garabik

    how to register private python codecs?

    Radovan Garabik, Jul 1, 2003, in forum: Python
    Replies:
    1
    Views:
    726
    Steven Taschuk
    Jul 1, 2003
  2. aurora
    Replies:
    2
    Views:
    543
    aurora
    Jan 14, 2006
  3. Neil Cerutti

    codecs.EncodedFile

    Neil Cerutti, Oct 18, 2006, in forum: Python
    Replies:
    2
    Views:
    1,043
    Neil Cerutti
    Oct 19, 2006
  4. Karl Knechtel
    Replies:
    2
    Views:
    359
    Walter Dörwald
    Jul 10, 2012
  5. Mark Lawrence

    EncodedFile/StreamRecoder

    Mark Lawrence, Jan 27, 2014, in forum: Python
    Replies:
    0
    Views:
    68
    Mark Lawrence
    Jan 27, 2014
Loading...

Share This Page