csv bugs

Discussion in 'Python' started by Magnus Lie Hetland, Mar 2, 2004.

  1. It seems that when a line termination is escaped (using the current
    escape character), csv.reader treats it as a line continuation, which
    is well an good -- but it doesn't discard the escape character;
    instead, it escapes it implicitly. This seems like a bug to me. E.g.

    foo:bar:baz\
    frozz:bozz

    with separator ':' and escape character '\\' is parsed into

    ['foo', 'bar', 'baz\\\nfrozz', 'bozz']

    In my opinion, it *ought* to be parsed into

    ['foo', 'bar', 'baz\nfrozz', 'bozz']

    As far as I know, this is the UNIX convention, as used in (e.g.)
    /etc/passwd.

    Am I off target here? If the current behaviour is desirable (although
    I can't see why it should be) then at least I think there should be a
    way of implementing "normal" line continuations (as in my example),
    which is the standard UNIX behavior, and the behavior of Python
    source, for that matter. Otherwise, csv can't be used to parse (e.g.)
    /etc/passwd...

    And another thing: Perhaps a 'passwd' dialect could be added alongside
    'excel'? Something like:

    class passwd(Dialect):
    delimiter = ':'
    doublequote = False
    escapechar = '\\'
    lineterminator = '\n'
    quotechar = '?'
    quoting = QUOTE_NONE
    skipinitialspace = False
    register_dialect("passwd", passwd)

    For some reason you *have* to supply a quotechar, even if you set
    QUOTE_NONE... I guess that's a bug too, in my book.

    If there are no objections, I might submit some of this as a bug
    report or two (or even a patch).

    --
    Magnus Lie Hetland "The mind is not a vessel to be filled,
    http://hetland.org but a fire to be lighted." [Plutarch]
    Magnus Lie Hetland, Mar 2, 2004
    #1
    1. Advertising

  2. (A better place for this discussion would probably be .
    I'm adding it to the cc list.)

    Magnus> It seems that when a line termination is escaped (using the
    Magnus> current escape character), csv.reader treats it as a line
    Magnus> continuation, which is well an good -- but it doesn't discard
    Magnus> the escape character; instead, it escapes it implicitly. This
    Magnus> seems like a bug to me. E.g.

    Magnus> foo:bar:baz\
    Magnus> frozz:bozz

    Magnus> with separator ':' and escape character '\\' is parsed into

    Magnus> ['foo', 'bar', 'baz\\\nfrozz', 'bozz']

    Magnus> In my opinion, it *ought* to be parsed into

    Magnus> ['foo', 'bar', 'baz\nfrozz', 'bozz']

    Magnus> As far as I know, this is the UNIX convention, as used in (e.g.)
    Magnus> /etc/passwd.

    That may be, however development of the csv module's parser was driven by
    how Microsoft Excel behaves. The assumption was (rightly I think) that
    Excel reads or writes more CSV files than anything else. I don't believe it
    does anything with backslashes.

    Magnus> Am I off target here? If the current behaviour is desirable
    Magnus> (although I can't see why it should be) then at least I think
    Magnus> there should be a way of implementing "normal" line
    Magnus> continuations (as in my example), which is the standard UNIX
    Magnus> behavior, and the behavior of Python source, for that
    Magnus> matter. Otherwise, csv can't be used to parse (e.g.)
    Magnus> /etc/passwd...

    You're welcome to submit a patch. I don't have time for it.

    Magnus> And another thing: Perhaps a 'passwd' dialect could be added
    Magnus> alongside 'excel'? Something like:

    Magnus> class passwd(Dialect):
    Magnus> delimiter = ':'
    Magnus> doublequote = False
    Magnus> escapechar = '\\'
    Magnus> lineterminator = '\n'
    Magnus> quotechar = '?'
    Magnus> quoting = QUOTE_NONE
    Magnus> skipinitialspace = False
    Magnus> register_dialect("passwd", passwd)

    I'll take a look at that.

    Magnus> For some reason you *have* to supply a quotechar, even if you
    Magnus> set QUOTE_NONE... I guess that's a bug too, in my book.

    Maybe. Maybe just a feature.

    Magnus> If there are no objections, I might submit some of this as a bug
    Magnus> report or two (or even a patch).

    Please do.

    Skip
    Skip Montanaro, Mar 2, 2004
    #2
    1. Advertising

  3. In <> Magnus Lie Hetland wrote:
    > And another thing: Perhaps a 'passwd' dialect could be added alongside
    > 'excel'? Something like:


    I wanted this, and started to write it in Nov-2003, but because of bugs
    in csv, outlined in

    http://groups.google.com/groups?selm=

    it is not possible to implement a passwd dialect, at least as of Python
    2.3.2. Unless I missed something obvious.

    --
    Francis Avila
    Francis Avila, Mar 3, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Daniel Ortmann
    Replies:
    4
    Views:
    570
    Skip Montanaro
    Jul 2, 2003
  2. Michal Mikolajczyk
    Replies:
    0
    Views:
    643
    Michal Mikolajczyk
    Feb 13, 2004
  3. Skip Montanaro
    Replies:
    0
    Views:
    707
    Skip Montanaro
    Feb 13, 2004
  4. Tintin92
    Replies:
    1
    Views:
    1,693
    Andrew Thompson
    Feb 14, 2007
  5. Josef 'Jupp' Schugt

    Still use 'ruby-bugs' for Ruby bugs?

    Josef 'Jupp' Schugt, Nov 4, 2004, in forum: Ruby
    Replies:
    2
    Views:
    160
    Tom Copeland
    Nov 4, 2004
Loading...

Share This Page