CSV Parser and unusual (?) linesterminator - misunderstanding?

T

Tino Lange

Hi!

I'm trying to use the csv Parser included with Python. Field Delimiter is
"|", Line Delimiter is "#". Unfortunately it doesn't work as expected. The
parser seems to just ignore the 'lineterminator'?

Here's some example:
$ cat test.py
#! /usr/bin/env python

import sys, csv, cStringIO

class SpecialCSVDialect(csv.Dialect):
delimiter = '|'
lineterminator = '#'
quotechar = '"'
doublequote = True
skipinitialspace = False
quoting = csv.QUOTE_MINIMAL

csv.register_dialect("SpecialCSV", SpecialCSVDialect)

memfile = cStringIO.StringIO("1a|1b|1c|1d#2a|2b|2c|2d#3a|3b|3c|3d#")
cfile = csv.reader(memfile, dialect="SpecialCSV")

while 1:
try:
data = cfile.next()
except csv.Error, (errmsg):
print >> sys.stderr, "SpecialCSVError '%s' - aborting...!" % (errmsg)
sys.exit()
except StopIteration:
break
print data
$ ./test.py
['1a', '1b', '1c', '1d#2a', '2b', '2c', '2d#3a', '3b', '3c', '3d#']
$

I would have been expecting that the parser returns three lines, i. e.
['1a', '1b', '1c', '1d']
['2a', '2b', '2c', '2d']
['3a', '3b', '3c', '3d']

Any hints what I'm doing wrong here?

Thanks

Tino
 
P

Peter Otten

Tino said:
I'm trying to use the csv Parser included with Python. Field Delimiter is
"|", Line Delimiter is "#". Unfortunately it doesn't work as expected. The
parser seems to just ignore the 'lineterminator'?

The csv reader accepts '\r' '\r\n' or '\n' as line endings, even mixed in
the same file. This behaviour is hardcoded. Only the writer uses the
lineterminator specified in the dialect.

The following workaround might suffice

memfile = (s + "\n" for s in "1a|1b|1c|1d#2a|2b|2c|2d#3a|3b|3c
3d#".split("#") if s)

Note that the reader expects a row iterator, so the split operation would be
necessary even if arbitrary line terminators were recognized.

Peter
 
P

Peter Otten

Peter said:
memfile = (s + "\n" for s in "1a|1b|1c|1d#2a|2b|2c|2d#3a|3b|3c
3d#".split("#") if s)

I just found out that you need not add "\n" to the line.

Peter
 
T

Tino Lange

Peter said:
The csv reader accepts '\r' '\r\n' or '\n' as line endings, even mixed in
the same file. This behaviour is hardcoded. Only the writer uses the
lineterminator specified in the dialect.

Boah ... Really?

a) this is not in the documentation ... or did I oversee something?

b) this is really unacceptable, or? At least we here have many CSV's with
other lineterminators than '\n'.

Is this going to be changed? Is someone working on it? Or are patches for SF
wanted?

Cheers,

Tino
 
S

skip

Tino> Is this going to be changed? Is someone working on it? Or are
Tino> patches for SF wanted?

A patch that removes this constraint would be helpful.

Skip
 
P

Peter Otten

Tino said:
Boah ... Really?

a) this is not in the documentation ... or did I oversee something?

It's only in the development version of the documentation, see

http://docs.python.org/dev/lib/csv-fmt-params.html
b) this is really unacceptable, or? At least we here have many CSV's with
other lineterminators than '\n'.

Do you need the elaborate (read: odd) escaping techniques of the original
CSV for these? Or would [row.split(fieldsep) for row in
data.split(linesep)] work just as well?
Is this going to be changed? Is someone working on it? Or are patches for
SF wanted?

I would think so.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top