Alternatives for the CSV module

?

-

I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.

Example of a file

"ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤

Here the field delimiter is "<>" and the "line" terminator "¤¤".
Fields can be enclosed in quotes, and a double qoute is treated as
normal text.

This is not the only format the parser can expect. The format is given
to the program by the user, so the program should have no problems
parsing the text. An ideal solution would be a similar parser to the
standard CSV-parser, except that it accepts strings as delimiters.

I could always manipulate the input file and replace the delimiters by
single characters, but I would like a more generic solution.

SimpleParse (http://simpleparse.sourceforge.net/) looks like a good
alternative. It doesn't support Unicode, but most most files can be
converted to ISO-8859-1 first.

Would SimpleParse be suitable for this purpose, or are there better
alternatives out there, like a more flexible CSV-parser?
 
J

John J. Lee

I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.
[...]

Dunno if it's any good to you, but there's one called DSV.


John
 
S

Skip Montanaro

Well, I might disagree with you there. By all reasonable accounts,
delimited files containing multi-character delimiters are not CSV files, at
least not as operationally defined by Excel (which I mention only because
it's probably the largest producer and consumer of such files).

That's pretty generic. How about this (untested):

class DelimitedFile:
def __init__(self, fname, mode='rb', ind=',', outd=','):
self.f = open(fname, mode)
self.ind = ind
self.outd = outd

def __iter__(self):
return self

def next(self):
line = self.f.next()
return line.replace(self.ind, self.outd)

Use it like so:

import csv

class d(csv.Excel):
delimiter = '\001'
lineterminator = '¤¤'

reader = csv.reader(DelimitedFile(fname, ind='<>', outd='\001'),
dialect=d)

for row in reader:
print row

The goal is of course to pick a delimiter which won't appear in the file,
hence the Ctl-A.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,008
Latest member
HaroldDark

Latest Threads

Top