csv module

L

Laurent Laporte

hello,

I'm using cvs standard module under Python 2.3 / 2.4 to write a file
delimited with tabs. I use the "excel-tab" dialect to do that.

To read my CSV file, I choose to 'sniff' with a sample data in order to
get the dialect.
The problem I meet is that I get a wrong dialect: the sniffer return an
empty string delimiter. It is probably a bug in _guess_delimiter()
method.

The message I obtain is:
TypeError: bad argument type for built-in operation

Do you know a way to sniff tab-delimited data ?
Is it a known bug ?

Bye.
 
F

Fredrik Lundh

Laurent said:
I'm using cvs standard module under Python 2.3 / 2.4 to write a file
delimited with tabs. I use the "excel-tab" dialect to do that.

To read my CSV file, I choose to 'sniff' with a sample data in order to
get the dialect.
The problem I meet is that I get a wrong dialect: the sniffer return an
empty string delimiter. It is probably a bug in _guess_delimiter()
method.

The message I obtain is:
TypeError: bad argument type for built-in operation

Do you know a way to sniff tab-delimited data ?
Is it a known bug ?

http://www.python.org/sf/1157169

</F>
 
S

skip

Laurent> To read my CSV file, I choose to 'sniff' with a sample data in
Laurent> order to get the dialect. The problem I meet is that I get a
Laurent> wrong dialect: the sniffer return an empty string delimiter. It
Laurent> is probably a bug in _guess_delimiter() method.

Laurent> The message I obtain is:
Laurent> TypeError: bad argument type for built-in operation

Laurent> Do you know a way to sniff tab-delimited data ?
Laurent> Is it a known bug ?

Using a file with the following contents:
'1\t2\tabc\n3\t4\tdef\n'

I get:
'\t'

Can you provide a concrete example (preferably in a bug report on SF)?

Skip
 
S

skip

me> Using a file with the following contents:

me> >>> open("tabber.csv", "rb").read()
me> '1\t2\tabc\n3\t4\tdef\n'

me> I get:

me> >>> sniffer = csv.Sniffer()
me> >>> d = sniffer.sniff(open("tabber.csv", "rb").read())
me> >>> d.delimiter
me> '\t'

BTW, this also seems to work with a Mac-style EOL:
'\t'

Perhaps this has been fixed in CVS.

Skip
 
L

Laurent Laporte

Sorry,

Here is my example:

Python 2.3.1 (#1, Sep 29 2003, 15:42:58)
[GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-98)] on linux2
Type "help", "copyright", "credits" or "license" for more information.''

In fact, I found the pb (thanks to you): I add a newline '\r\n' to
separate the header from the records...
 
L

Laurent Laporte

In fact, there is another bug:

In my CVS file, all the records ends with a trailing tab '\t'
except the header because the last field is always empty.

For example, I get :''

It is done in the _guess_delimiter() method during the building of
frequency tables. A striping is done for each line (why??)
If I change:
freq = line.strip().count(char)
by:
freq = line.count(char)
It works fine.

Do you have a workaround for that?

------- Laurent.
 
S

skip

Laurent> If I change:
Laurent> freq = line.strip().count(char)
Laurent> by:
Laurent> freq = line.count(char)
Laurent> It works fine.

Laurent> Do you have a workaround for that?

Nope. I just checked in precisely your fix to the Python repository.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top