codecs, csv issues

Discussion in 'Python' started by George Sakkis, Aug 22, 2008.

  1. I'm trying to use codecs.open() and I see two issues when I pass
    encoding='utf8':

    1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
    platform-specific byte(s).

    import codecs
    f = codecs.open('tmp.txt', 'w', encoding='utf8')
    s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    print >> f, s
    print >> f, s
    f.close()

    This doesn't happen for the default encoding (=None).

    2) csv.writer doesn't seem to work as expected when being passed a
    codecs object; it treats it as if encoding is ascii:

    import codecs, csv
    f = codecs.open('tmp.txt', 'w', encoding='utf8')
    s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    # this works fine
    print >> f, s
    # this doesn't
    csv.writer(f).writerow()
    f.close()

    Traceback (most recent call last):
    ....
    csv.writer(f).writerow()
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
    position 0: ordinal not in range(128)

    Is this the expected behavior or are these bugs ?

    George
     
    George Sakkis, Aug 22, 2008
    #1
    1. Advertisements

  2. George Sakkis

    Peter Otten Guest

    George Sakkis wrote:

    > I'm trying to use codecs.open() and I see two issues when I pass
    > encoding='utf8':
    >
    > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
    > platform-specific byte(s).
    >
    > import codecs
    > f = codecs.open('tmp.txt', 'w', encoding='utf8')
    > s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    > print >> f, s
    > print >> f, s
    > f.close()
    >
    > This doesn't happen for the default encoding (=None).
    >
    > 2) csv.writer doesn't seem to work as expected when being passed a
    > codecs object; it treats it as if encoding is ascii:
    >
    > import codecs, csv
    > f = codecs.open('tmp.txt', 'w', encoding='utf8')
    > s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    > # this works fine
    > print >> f, s
    > # this doesn't
    > csv.writer(f).writerow()
    > f.close()
    >
    > Traceback (most recent call last):
    > ...
    > csv.writer(f).writerow()
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
    > position 0: ordinal not in range(128)
    >
    > Is this the expected behavior or are these bugs ?


    Looking into the documentation

    """
    Note: This version of the csv module doesn't support Unicode input. Also,
    there are currently some issues regarding ASCII NUL characters.
    Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
    the examples in section 9.1.5. These restrictions will be removed in the
    future.
    """

    and into the source code

    if encoding is not None and \
    'b' not in mode:
    # Force opening of the file in binary mode
    mode = mode + 'b'

    I'd be willing to say that both are implementation limitations.

    Peter
     
    Peter Otten, Aug 22, 2008
    #2
    1. Advertisements

  3. George Sakkis

    John Machin Guest

    On Aug 22, 11:52 pm, George Sakkis <> wrote:
    > I'm trying to use codecs.open() and I see two issues when I pass
    > encoding='utf8':
    >
    > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
    > platform-specific byte(s).
    >
    > import codecs
    > f = codecs.open('tmp.txt', 'w', encoding='utf8')
    > s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    > print >> f, s
    > print >> f, s
    > f.close()


    This is documented behaviour:
    """
    Note
    Files are always opened in binary mode, even if no binary mode was
    specified. This is done to avoid data loss due to encodings using 8-
    bit values. This means that no automatic conversion of '\n' is done on
    reading and writing.
    """
     
    John Machin, Aug 22, 2008
    #3
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michal Mikolajczyk
    Replies:
    0
    Views:
    931
    Michal Mikolajczyk
    Feb 13, 2004
  2. Tintin92
    Replies:
    1
    Views:
    2,151
    Andrew Thompson
    Feb 14, 2007
  3. jliu66
    Replies:
    0
    Views:
    765
    jliu66
    Oct 19, 2007
  4. sso
    Replies:
    20
    Views:
    3,208
    Martin Gregorie
    Apr 26, 2009
  5. Tim
    Replies:
    1
    Views:
    487
    Peter Otten
    Jul 5, 2010
  6. Li Chen
    Replies:
    18
    Views:
    1,022
    Azmi Farih
    Mar 23, 2010
  7. Karl Knechtel
    Replies:
    2
    Views:
    545
    Walter Dörwald
    Jul 10, 2012
  8. Sacha Rook

    csv read clean up and write out to csv

    Sacha Rook, Nov 2, 2012, in forum: Python
    Replies:
    2
    Views:
    467
    Hans Mulder
    Nov 2, 2012
Loading...