Alternatives for the CSV module

- · Sep 12, 2004

I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.

Example of a file

"ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤

Here the field delimiter is "<>" and the "line" terminator "¤¤".
Fields can be enclosed in quotes, and a double qoute is treated as
normal text.

This is not the only format the parser can expect. The format is given
to the program by the user, so the program should have no problems
parsing the text. An ideal solution would be a similar parser to the
standard CSV-parser, except that it accepts strings as delimiters.

I could always manipulate the input file and replace the delimiters by
single characters, but I would like a more generic solution.

SimpleParse (http://simpleparse.sourceforge.net/) looks like a good
alternative. It doesn't support Unicode, but most most files can be
converted to ISO-8859-1 first.

Would SimpleParse be suitable for this purpose, or are there better
alternatives out there, like a more flexible CSV-parser?

John J. Lee · Sep 12, 2004

I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.

[...]

Dunno if it's any good to you, but there's one called DSV.

John

Skip Montanaro · Sep 12, 2004

Well, I might disagree with you there. By all reasonable accounts,
delimited files containing multi-character delimiters are not CSV files, at
least not as operationally defined by Excel (which I mention only because
it's probably the largest producer and consumer of such files).

That's pretty generic. How about this (untested):

class DelimitedFile:
def __init__(self, fname, mode='rb', ind=',', outd=','):
self.f = open(fname, mode)
self.ind = ind
self.outd = outd

def __iter__(self):
return self

def next(self):
line = self.f.next()
return line.replace(self.ind, self.outd)

Use it like so:

import csv

class d(csv.Excel):
delimiter = '\001'
lineterminator = '¤¤'

reader = csv.reader(DelimitedFile(fname, ind='<>', outd='\001'),
dialect=d)

for row in reader:
print row

The goal is of course to pick a delimiter which won't appear in the file,
hence the Ctl-A.

Skip

How to Make CSV Contact Files Work Seamlessly Across All Smartphones?	0	Sep 17, 2025
Can I convert PST to CSV without losing data?	2	Apr 17, 2026
Whats the best approach for converting OST to PST files?	5	Feb 10, 2025
What are the steps to convert Outlook PST files to various formats?	6	Dec 26, 2024
What are the benefits of using an EDB to PST converter?	0	Feb 10, 2025
Using the CSV module	1	May 9, 2007
csv: No fields, or one field?	3	Apr 25, 2012
comparing dialects of csv-module	3	Dec 19, 2009

Alternatives for the CSV module

-

John J. Lee

Skip Montanaro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads