J
John Machin
That's how Excel and the `csv` module do it.
Why don't you try yourself? The `csv` module returns two records, the
first has six items:
1: erik
2: viking
3: ham
4: spam and eggs
5: He said "Ni!"
6: line one
'line two' is the only item in the next record then.
The rules for quoting when writing can be expressed as:
def outrow(inrow, quotechar='"', delimiter=','):
out = []
for field in inrow:
if quotechar in field:
field = quotechar + field.replace(quotechar, quotechar*2) +
quotechar
elif delimiter in field or '\n' in field:
# See note below.
field = quotechar + field + quotechar
out.append(field)
return delimiter.join(out)
Note: characters other than delimiter and \n can be included in the
"to be quoted" list.
What readers do with data that can *not* have been produced by a
writer following the rules can get worse than BlackJack's example.
Consider this: file nihao1.csv contains the following single line:
'Is the "," a mistake in "Ni, hao!"?\r\n'
Openoffice.org's Calc 2.1 shows the equivalent of
['Is the "', ' a mistake in Ni', ' hao!"?\n'] in a Text Import window,
but then silently produces nothing. A file with two such lines causes
5 fields to be shown in the window -- it apparently thinks the
newlines are inside quoted fields!
Gnumeric 1.7.6 silently produces the equivalent of
result = ['Is the "', ' a mistake in ', 'hao!"?']
map(len, result) -> [8, 14, 6]
What happened to Ni?
Multiple such lines produce multiple rows.
Excel 11.0 (2003) silently produces in effect
result = ['Is the "', ' a mistake in Ni', ' hao!"?']
map(len, result) -> [8, 16, 7]
Multiple such lines produce multiple rows.
The csv module does what Excel does.
Consumers of csv files are exhorted to apply whatever sanity checks
they can. Examples:
(1) If the csv file was produced as a result of a database query, the
number of columns should be known and used as a check on the length of
each row received.
(2) A field containing an odd number of " characters (or more
generally, not meeting whatever quoting convention might be expected
in the underlying data) should be treated with suspicion.
Cheers,
John