csv module strangeness.

T

tobiah

I'm trying to create a cvs.reader object using a custom dialect.

The docs are a little terse, but I gather that I am supposed
to subclass cvs.Dialect:

class dialect(csv.Dialect):
pass

Now, the docs say that all of the attributes have reasonable
defaults, but instantiating the above gives:

Traceback (most recent call last):
File "<stdin>", line 15, in ?
File "/usr/local/lib/python2.4/csv.py", line 39, in __init__
raise Error, "Dialect did not validate: %s" % ", ".join(errors)
_csv.Error: Dialect did not validate: delimiter character not set, quotechar not set, lineterminator not set, doublequote parameter must be True or False, skipinitialspace parameter must be True or False, quoting parameter not set

So I look at the source. The Dialect class is very simple,
and starts with:

class Dialect:
_name = ""
_valid = False
# placeholders
delimiter = None
quotechar = None
escapechar = None
doublequote = None
skipinitialspace = None
lineterminator = None
quoting = None

So, it's no wonder that it fails its validate() call.
The only thing that I can think of to do is to set
these on the class itself before instantiation:

###############################################
import csv

class dialect(csv.Dialect):
pass

dialect.delimiter = "\t"
dialect.quotechar = '"'
dialect.lineterminator = "\n"
dialect.doublequote = True
dialect.skipinitialspace = True
dialect.quoting = csv.QUOTE_MINIMAL

d = dialect()

reader = csv.reader(open('list.csv'))
for row in reader:
print row
###############################################

This runs, but the delimiter is still the comma.
When list.csv is comma delim, it works correctly,
but when list.csv has tab separated values, I
get back a single field with the entire line in
it.

I suppose I must be doing something horribly wrong.

Thanks,

Tobiah
 
T

tobiah

Ok, I'm an idiot. I didn't even pass my dialect
object to the reader() call.

So now it works, but it is still strange about
the absent defaults.

Tobiah
 
M

Marc 'BlackJack' Rintsch

I'm trying to create a cvs.reader object using a custom dialect.

The docs are a little terse, but I gather that I am supposed
to subclass cvs.Dialect:

class dialect(csv.Dialect):
pass

Now, the docs say that all of the attributes have reasonable
defaults, but instantiating the above gives:

Traceback (most recent call last):
File "<stdin>", line 15, in ?
File "/usr/local/lib/python2.4/csv.py", line 39, in __init__
raise Error, "Dialect did not validate: %s" % ", ".join(errors)
_csv.Error: Dialect did not validate: delimiter character not set, quotechar not set, lineterminator not set, doublequote parameter must be True or False, skipinitialspace parameter must be True or False, quoting parameter not set

So I look at the source. The Dialect class is very simple,
and starts with:

class Dialect:
_name = ""
_valid = False
# placeholders
delimiter = None
quotechar = None
escapechar = None
doublequote = None
skipinitialspace = None
lineterminator = None
quoting = None

So, it's no wonder that it fails its validate() call.
The only thing that I can think of to do is to set
these on the class itself before instantiation:

###############################################
import csv

class dialect(csv.Dialect):
pass

dialect.delimiter = "\t"
dialect.quotechar = '"'
dialect.lineterminator = "\n"
dialect.doublequote = True
dialect.skipinitialspace = True
dialect.quoting = csv.QUOTE_MINIMAL

That's possible but why didn't you follow the way `csv.Dialect` set the
class attributes?

class MyDialect(csv.Dialect):
delimiter = '\t'
lineterminator = '\n'
# and so on…

Ciao,
Marc 'BlackJack' Rintsch
 
T

tobiah

That's possible but why didn't you follow the way `csv.Dialect` set the
class attributes?

class MyDialect(csv.Dialect):
delimiter = '\t'
lineterminator = '\n'
# and so on…

Because I'm hung over.
 
F

Fredrik Lundh

tobiah said:
The docs are a little terse, but I gather that I am supposed
to subclass cvs.Dialect:

class dialect(csv.Dialect):
pass

Now, the docs say that all of the attributes have reasonable
defaults, but instantiating the above gives:

you may be misreading the docs; the Dialect has no values at all, and
must be subclassed (and the subclass must provide settings). The
easiest way to do get reasonable defaults is to subclass an existing
dialect class, such as csv.excel:

class dialect(csv.excel):
...
> The only thing that I can think of to do is to set
> these on the class itself before instantiation:

the source code for the Dialect class that you posted shows how to set
class attributes; simple assign them inside the class statement!

class dialect(csv.excel):
# like excel, but with a different delimiter
delimiter = "|"

you must also remember to pass the dialect to the reader:

reader = csv.reader(open('list.csv'), dialect)
for row in reader:
print row

note that you don't really have to create an instance; the reader
expects an object with a given set of attributes, and the class object
works as well as an instance of the same class.

</F>
 
T

tobiah

you may be misreading the docs; the Dialect has no values at all, and
must be subclassed (and the subclass must provide settings).

The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

delimiter
A one-character string used to separate fields. It defaults to ','.

doublequote
Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar must be a one-character string which is used as a prefix to the quotechar. It defaults to True.

escapechar
A one-character string used to escape the delimiter if quoting is set to QUOTE_NONE. It defaults to None.

lineterminator
The string used to terminate lines in the CSV file. It defaults to '\r\n'.

quotechar
A one-character string used to quote elements containing the delimiter or which start with the quotechar. It defaults to '"'.

quoting
Controls when quotes should be generated by the writer. It can take on any of the QUOTE_* constants (see section 12.20.1) and defaults to QUOTE_MINIMAL.

skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.
 
P

Peter Otten

tobiah said:
The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

delimiter
A one-character string used to separate fields. It defaults to ','.

Note that you need not bother with a dialect class just to change the
delimiter:
.... "alpha\tbeta\tgamma\r\n"
.... "one\ttoo\ttree\r\n").... print row
....
['alpha', 'beta', 'gamma']
['one', 'too', 'tree']

Peter
 
S

skip

tobiah> So now it works, but it is still strange about the absent
tobiah> defaults.

The csv.Dialect class is essentially pure abstract. Most of the time I
subclass csv.excel and just change the one or two things I need.

Skip
 
J

John Machin

tobiah said:
The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

That particular case is handled by the built-in (but cunningly
concealed) 'excel-tab' class:
|>>> import csv
|>>> csv.list_dialects()
['excel-tab', 'excel']
|>>> td = csv.get_dialect('excel-tab')
|>>> dir(td)
['__doc__', '__init__', '__module__', '_name', '_valid', '_validate',
'delimiter', 'doublequote', 'escapechar', 'lineterminator',
'quotechar', 'quoting', 'skipinitialspace']
|>>> td.delimiter
'\t'

However, more generally, the docs also clearly state that "In addition
to, or instead of, the dialect parameter, the programmer can also
specify individual formatting parameters, which have the same names as
the attributes defined below for the Dialect class."

In practice, using a Dialect class would be a rather rare occurrence.

E.g. here's the guts of the solution to the "fix a csv file by
rsplitting one column" problem, using the "quoting" attribute on the
assumption that the solution really needs those usually redundant
quotes:

import sys, csv

def fix(inf, outf, fixcol):
wtr = csv.writer(outf, quoting=csv.QUOTE_ALL)
for fields in csv.reader(inf):
fields[fixcol:fixcol+1] = fields[fixcol].rsplit(None, 1)
wtr.writerow(fields)

if __name__ == "__main__":
av = sys.argv
fix(open(av[1], 'rb'), open(av[2], 'wb'), int(av[3]))

HTH,
John
 
T

tobiah

However, more generally, the docs also clearly state that "In addition
to, or instead of, the dialect parameter, the programmer can also
specify individual formatting parameters, which have the same names as
the attributes defined below for the Dialect class."

I definitely missed that. Knowing that, I don't think I will ever need the Dialect
class, but I still think that the docs for the Dialect class are broken.
 
J

John Machin

tobiah said:
I definitely missed that. Knowing that, I don't think I will ever need the Dialect
class, but I still think that the docs for the Dialect class are broken.

FWIW, I think the whole Dialect class idea is a baroque byzantine
over-elaborated unnecessity that also happens to suffer from poor docs.
[Exit, pursued by a bear]
 
F

Fredrik Lundh

tobiah said:
The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

The "it defaults to" clauses should probably be seen in the context of
the two "the programmer can" sentences in the first paragraph on that
page; the default is what's used if you don't do that.

I agree that the first paragraph could need some work. Any volunteers?

</F>
 
T

tobiah

The docs clearly state what the defaults are, but they are not
The "it defaults to" clauses should probably be seen in the context of
the two "the programmer can" sentences in the first paragraph on that
page; the default is what's used if you don't do that.

I agree that the first paragraph could need some work. Any volunteers?

</F>

I agree with Henryk's evaluation, but if Dialect is to remain,
why not just fix all of the 'None' assignments in the class
definition to match the sensible defaults that are already in
the docs?
 
T

tobiah

The docs clearly state what the defaults are, but they are not
The "it defaults to" clauses should probably be seen in the context of
the two "the programmer can" sentences in the first paragraph on that
page; the default is what's used if you don't do that.

I agree that the first paragraph could need some work. Any volunteers?

</F>

I agree with Henryk's evaluation, but if Dialect is to remain,
why not just fix all of the 'None' assignments in the class
definition to match the sensible defaults that are already in
the docs?
 
T

tobiah

John said:
Henryk?? Have I missed a message in the thread, or has the effbot
metamorphosed into the aitchbot?

How strange. Either my client was whacked, or I was. I was
actually referring to your "baroque byzantine over-elaborated unnecessity"
comment. Henryk was from a later thread, I guess.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top