codecs, csv issues

Discussion in 'Python' started by George Sakkis, Aug 22, 2008.

  1. I'm trying to use codecs.open() and I see two issues when I pass
    encoding='utf8':

    1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
    platform-specific byte(s).

    import codecs
    f = codecs.open('tmp.txt', 'w', encoding='utf8')
    s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    print >> f, s
    print >> f, s
    f.close()

    This doesn't happen for the default encoding (=None).

    2) csv.writer doesn't seem to work as expected when being passed a
    codecs object; it treats it as if encoding is ascii:

    import codecs, csv
    f = codecs.open('tmp.txt', 'w', encoding='utf8')
    s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    # this works fine
    print >> f, s
    # this doesn't
    csv.writer(f).writerow()
    f.close()

    Traceback (most recent call last):
    ....
    csv.writer(f).writerow()
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
    position 0: ordinal not in range(128)

    Is this the expected behavior or are these bugs ?

    George
    George Sakkis, Aug 22, 2008
    #1
    1. Advertising

  2. George Sakkis

    Peter Otten Guest

    George Sakkis wrote:

    > I'm trying to use codecs.open() and I see two issues when I pass
    > encoding='utf8':
    >
    > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
    > platform-specific byte(s).
    >
    > import codecs
    > f = codecs.open('tmp.txt', 'w', encoding='utf8')
    > s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    > print >> f, s
    > print >> f, s
    > f.close()
    >
    > This doesn't happen for the default encoding (=None).
    >
    > 2) csv.writer doesn't seem to work as expected when being passed a
    > codecs object; it treats it as if encoding is ascii:
    >
    > import codecs, csv
    > f = codecs.open('tmp.txt', 'w', encoding='utf8')
    > s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    > # this works fine
    > print >> f, s
    > # this doesn't
    > csv.writer(f).writerow()
    > f.close()
    >
    > Traceback (most recent call last):
    > ...
    > csv.writer(f).writerow()
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
    > position 0: ordinal not in range(128)
    >
    > Is this the expected behavior or are these bugs ?


    Looking into the documentation

    """
    Note: This version of the csv module doesn't support Unicode input. Also,
    there are currently some issues regarding ASCII NUL characters.
    Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
    the examples in section 9.1.5. These restrictions will be removed in the
    future.
    """

    and into the source code

    if encoding is not None and \
    'b' not in mode:
    # Force opening of the file in binary mode
    mode = mode + 'b'

    I'd be willing to say that both are implementation limitations.

    Peter
    Peter Otten, Aug 22, 2008
    #2
    1. Advertising

  3. George Sakkis

    John Machin Guest

    On Aug 22, 11:52 pm, George Sakkis <> wrote:
    > I'm trying to use codecs.open() and I see two issues when I pass
    > encoding='utf8':
    >
    > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
    > platform-specific byte(s).
    >
    > import codecs
    > f = codecs.open('tmp.txt', 'w', encoding='utf8')
    > s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
    > print >> f, s
    > print >> f, s
    > f.close()


    This is documented behaviour:
    """
    Note
    Files are always opened in binary mode, even if no binary mode was
    specified. This is done to avoid data loss due to encodings using 8-
    bit values. This means that no automatic conversion of '\n' is done on
    reading and writing.
    """
    John Machin, Aug 22, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michal Mikolajczyk
    Replies:
    0
    Views:
    648
    Michal Mikolajczyk
    Feb 13, 2004
  2. Skip Montanaro
    Replies:
    0
    Views:
    713
    Skip Montanaro
    Feb 13, 2004
  3. Tintin92
    Replies:
    1
    Views:
    1,702
    Andrew Thompson
    Feb 14, 2007
  4. jliu66
    Replies:
    0
    Views:
    506
    jliu66
    Oct 19, 2007
  5. Karl Knechtel
    Replies:
    2
    Views:
    367
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page