Problems with csv module

F

Florian Lindner

Hello,
I've one problem using the csv module.
The code:

self.reader = csv.reader(f, delimiter = ",")

works perfectly. But when I use a variable for delimiter:

self.reader = csv.reader(f, delimiter = Adelimiter)

I get the traceback:


File "/home/florian/visualizer/ConfigReader.py", line 13, in __init__
self.reader = csv.reader(f, delimiter = Adelimiter)
TypeError: bad argument type for built-in operation


The command

print "Adelimiter: ", Adelimiter, len(Adelimiter)

prints

Adelimiter: , 1

So I think Adelimiter is ok?!

What is wrong there?

It is Python 2.3.5.

Thx,

Florian
 
R

Richie Hindle

[Florian]
I've one problem using the csv module.
The code:

self.reader = csv.reader(f, delimiter = ",")

works perfectly. But when I use a variable for delimiter:

self.reader = csv.reader(f, delimiter = Adelimiter)

I get the traceback:


File "/home/florian/visualizer/ConfigReader.py", line 13, in __init__
self.reader = csv.reader(f, delimiter = Adelimiter)
TypeError: bad argument type for built-in operation

Is this your problem?:
Traceback (most recent call last):
<type 'unicode'>
 
F

Florian Lindner

Richie said:
[Florian]
I've one problem using the csv module.
The code:

self.reader = csv.reader(f, delimiter = ",")

works perfectly. But when I use a variable for delimiter:

self.reader = csv.reader(f, delimiter = Adelimiter)

I get the traceback:


File "/home/florian/visualizer/ConfigReader.py", line 13, in __init__
self.reader = csv.reader(f, delimiter = Adelimiter)
TypeError: bad argument type for built-in operation

Is this your problem?:
Traceback (most recent call last):
<type 'unicode'>

Yes, thats my problem.

You mean that csv.reader can't work with unicode as the delimiter parameter?
Sorry, I don't really get your point what you're saying...

Florian
 
R

Richie Hindle

[Florian]
You mean that csv.reader can't work with unicode as the delimiter parameter?

Exactly. http://www.python.org/doc/2.3.5/lib/module-csv.html says:

"Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters. Accordingly,
all input should generally be printable ASCII to be safe. These restrictions
will be removed in the future. "

That note is still there in the current development docs, so it looks like
it hasn't yet been fixed.
 
F

Florian Lindner

Richie said:
[Florian]
You mean that csv.reader can't work with unicode as the delimiter
parameter?

Exactly. http://www.python.org/doc/2.3.5/lib/module-csv.html says:

"Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should generally be printable ASCII to be safe.
These restrictions will be removed in the future. "

That note is still there in the current development docs, so it looks like
it hasn't yet been fixed.

Uhh.. thanks!

How can I convert Unicode to ASCII?

Thx,

Florian
 
R

Richie Hindle

[Florian]
How can I convert Unicode to ASCII?

You're writing code using Unicode and you don't know how to convert it
ASCII? You need to do some reading. Here are a few links - Google can
provide many more:

http://docs.python.org/tut/node5.html#SECTION005130000000000000000
http://diveintopython.org/xml_processing/unicode.html
http://www.jorendorff.com/articles/unicode/python.html

The short answer to your question is this:
My string <type 'unicode'> My string <type 'str'>

but you should really do some reading.
 
S

Skip Montanaro

Richie> Exactly....

Richie> "Note: This version of the csv module doesn't support Unicode
Richie> input....

Richie> That note is still there in the current development docs, so it
Richie> looks like it hasn't yet been fixed.

I can confirm this. While the note as written focused on csv files encoded
using non-ASCII codecs, it also holds true for the API. Manipulating
Unicode from C isn't as simple as simple as from Python and none of those of
us with our fingerprints on the csv module code had much/any Unicode
experience.

Skip
 
F

Fredrik Lundh

Richie said:
[Florian]
You mean that csv.reader can't work with unicode as the delimiter parameter?

Exactly. http://www.python.org/doc/2.3.5/lib/module-csv.html says:

"Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters. Accordingly,
all input should generally be printable ASCII to be safe. These restrictions
will be removed in the future. "

That note is still there in the current development docs, so it looks like
it hasn't yet been fixed.

does the CSV format even support Unicode-encoded data streams?

(in contrast to, say, Latin-1 or UTF-8 encoded string fields)

this is a very common XML confusion, where people think that just be-
cause a file format can be used to store Unicode data, a parser for that
format ought to be able to parse Unicode strings...

</F>
 
S

Skip Montanaro

Fredrik> does the CSV format even support Unicode-encoded data streams?

Based on the requests I've seen here and on the (e-mail address removed) mailing list,
it appears people are certainly generating CSV files which contain
Unicode-encoded data.

Skip
 
F

Fredrik Lundh

Skip said:
Fredrik> does the CSV format even support Unicode-encoded data streams?

Based on the requests I've seen here and on the (e-mail address removed) mailing list,
it appears people are certainly generating CSV files which contain Unicode-
encoded data.

in what encodings?

is the encoding specified inside the file? if so, how?

(it should be noted that the phrase "Unicode-encoded data" that I
used doesn't make much sense, even in the original context. what
I meant to say was that CSV, as far as I know, isn't defined as a
stream of Unicode character, but rather as a stream of bytes in an
ASCII-compatible encoding. this means that you can use e.g. ISO-
8859-1 or UTF-8 for string values, but not that you can encode the
whole thing as, say UTF-16 or UCS-4).

</F>
 
S

Skip Montanaro

Fredrik> in what encodings?

I've seen hints about iso-8859-1/iso-8859-15 and mention that Excel 2000
supports utf-8. Whether Excel can dump csv files in utf-8 or not, I don't
know, though I'd suppose so.

Fredrik> is the encoding specified inside the file? if so, how?

Not that I'm aware of. AFAIK, you just have to know the file's encoding.

Skip
 
J

John Machin

Fredrik> in what encodings?

I've seen hints about iso-8859-1/iso-8859-15 and mention that Excel 2000
supports utf-8.

I have Excel 2002 and have done some experimentation. It "supports"
utf-8 only to the extent that most times it doesn't mangle the data
(i.e. you can save it again without loss); you just can't make any
sense out of what's on the screen. Specifically:

open a file with CSV extension: Excel assumes blindly that it's
encoded according to your locale (e.g. cp1252).

open a file with TXT extension: Excel gives you the option of
specifying which one of a large number of *legacy* encodings -- yes,
that's correct, utf-* are not on the list!

NOTE: the above applies even if you have a utf-8-encoded BOM at the
start of the file.

This behaviour appears to be Excel-specific; MS Word, Wordpad and even
the humble Notepad recognise the utf-8-encoded BOM and display
sensibly (with a Unicode font, of course).

Whether Excel can dump csv files in utf-8 or not, I don't
know, though I'd suppose so.

Unfortunately, your supposition is incorrect. There is no way of
specifying the encoding directly. The nearest available options are:

(1) csv : encoded in your locale-specific legacy encoding. "illegal"
characters are silently replaced by "?" on Windows and (I deduce)
underscore on a Macintosh.
(2) text : ditto
(3) Unicode text: utf-16 -- it *does* subsequently open these
correctly i.e. silently detects the encoding and displays properly.
 
J

John Machin

in what encodings?

is the encoding specified inside the file? if so, how?

(it should be noted that the phrase "Unicode-encoded data" that I
used doesn't make much sense, even in the original context. what
I meant to say was that CSV, as far as I know, isn't defined as a
stream of Unicode character, but rather as a stream of bytes in an
ASCII-compatible encoding. this means that you can use e.g. ISO-
8859-1 or UTF-8 for string values, but not that you can encode the
whole thing as, say UTF-16 or UCS-4).

The CSV format is not defined at all, AFAIK.

Empirically, writing CSV works more-or-less like this, for each row:
# pseudocode, untested
control_chars = '\r\n' # or maybe more or maybe just '\n'
out_list = []
for each field:
if field contains quote_char:
out_field = quote_char + \
field.replace(quote_char, quote_char + quote_char) + \
quote_char
elif field contains any one of delimiter or control_chars:
out_field = quote_char + field + quote_char
else:
out_field = field
out_list.append(out_field)

then you write delimiter.join(out_list) followed by "\r\n"

So there is no reason at all why a writer and a reader couldn't use
the above quoting mechanism to transfer columnar data containing
Unicode -- they just have to agree on the encoding, control
characters, quote_char, delimiter, and line terminator.

Excel (see my other post in this thread) provides a writing ("save as
Unicode text") and reading mechanism which uses u'\t' as the
delimiter, u'\r\n' as the line terminator, u'\"' as the quote_char,
and utf-16 as the encoding. I haven't done an exhaustive check to see
what its definition of control_chars would be.
 
J

John Machin

T

Tim Roberts

Fredrik Lundh said:
does the CSV format even support Unicode-encoded data streams?

Since there is no RFC or ISO standard for CSV, I'd say the answer was
"yes".

I just tried it with Excel, which is probably as close as we can get to the
canonical csv application. It can read a UCS-16 csv file, but it
mishandles it. It doesn't split at the commas. It treats each line as a
single cell.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top