split CSV fields

R

robert

What is a most simple expression for splitting a CSV line with "-protected fields?

s='"123","a,b,\"c\"",5.640'
 
F

Fredrik Lundh

robert said:
What is a most simple expression for splitting a CSV line
> with "-protected fields?

s='"123","a,b,\"c\"",5.640'

import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>
 
D

Diez B. Roggisch

robert said:
What is a most simple expression for splitting a CSV line with "-protected
fields?

s='"123","a,b,\"c\"",5.640'

Use the csv-module. It should have a dialect for this, albeit I'm not 100%
sure if the escaping of the " is done properly from csv POV. Might be that
it requires excel-standard.

Diez
 
J

John Machin

Fredrik said:
robert said:
What is a most simple expression for splitting a CSV line
with "-protected fields?

s='"123","a,b,\"c\"",5.640'

import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
| >>> import csv
| >>> s='"123","a,b,\"c\"",5.640'
| >>> cols = list(csv.reader())
| >>> cols
[['123', 'a,b,c""', '5.640']]
# maybe we need a bit more:
| >>> cols = list(csv.reader())[0]
| >>> cols
['123', 'a,b,c""', '5.640']

I'd guess that the OP is expecting 'a,b,"c"' for the second field.

Twiddling with the knobs doesn't appear to help:

| >>> list(csv.reader(, escapechar='\\'))[0]
['123', 'a,b,c""', '5.640']
| >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
['123', 'a,b,c""', '5.640']

Looks like a bug to me; AFAICT from the docs, the last attempt should
have worked.

Cheers,
John
 
J

John Machin

John said:
Fredrik said:
robert said:
What is a most simple expression for splitting a CSV line
with "-protected fields?

s='"123","a,b,\"c\"",5.640'

import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
| >>> import csv
| >>> s='"123","a,b,\"c\"",5.640'
| >>> cols = list(csv.reader())
| >>> cols
[['123', 'a,b,c""', '5.640']]
# maybe we need a bit more:
| >>> cols = list(csv.reader())[0]
| >>> cols
['123', 'a,b,c""', '5.640']

I'd guess that the OP is expecting 'a,b,"c"' for the second field.

Twiddling with the knobs doesn't appear to help:

| >>> list(csv.reader(, escapechar='\\'))[0]
['123', 'a,b,c""', '5.640']
| >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
['123', 'a,b,c""', '5.640']

Looks like a bug to me; AFAICT from the docs, the last attempt should
have worked.


Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>> csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.
 
F

Fredrik Lundh

John said:
Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>> csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.

the documentation also mentions a "quoting" parameter that "controls
when quotes should be generated by the writer and recognised by the
reader.". not sure how that changes things.

anyway, it's either unclear documentation or a bug in the code. better
submit a bug report so someone can fix one of them.

</F>
 
J

John Machin

John said:
John said:
Fredrik said:
robert wrote:

What is a most simple expression for splitting a CSV line
with "-protected fields?

s='"123","a,b,\"c\"",5.640'

import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
| >>> import csv
| >>> s='"123","a,b,\"c\"",5.640'
| >>> cols = list(csv.reader())
| >>> cols
[['123', 'a,b,c""', '5.640']]
# maybe we need a bit more:
| >>> cols = list(csv.reader())[0]
| >>> cols
['123', 'a,b,c""', '5.640']

I'd guess that the OP is expecting 'a,b,"c"' for the second field.

Twiddling with the knobs doesn't appear to help:

| >>> list(csv.reader(, escapechar='\\'))[0]
['123', 'a,b,c""', '5.640']
| >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
['123', 'a,b,c""', '5.640']

Looks like a bug to me; AFAICT from the docs, the last attempt should
have worked.


Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>> csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.


Doh. The OP's string was a raw string. I need some sleep.
Scrap bug #1!

| >>> s=r'"123","a,b,\"c\"",5.640'
| >>> list(csv.reader())[0]
['123', 'a,b,\\c\\""', '5.640']
# What's that???
| >>> list(csv.reader(, escapechar='\\'))[0]
['123', 'a,b,"c"', '5.640']
| >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
['123', 'a,b,"c"', '5.640']

And there's still the problem with doublequote ....

Goodnight ...
 
J

John Machin

Fredrik said:
the documentation also mentions a "quoting" parameter that "controls
when quotes should be generated by the writer and recognised by the
reader.". not sure how that changes things.

Hi Fredrik, I read that carefully -- "quoting" appears to have no
effect in this situation.
anyway, it's either unclear documentation or a bug in the code. better
submit a bug report so someone can fix one of them.

Tomorrow :)
Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top