split CSV fields

Discussion in 'Python' started by robert, Nov 16, 2006.

  1. robert

    robert Guest

    What is a most simple expression for splitting a CSV line with "-protected fields?

    s='"123","a,b,\"c\"",5.640'
     
    robert, Nov 16, 2006
    #1
    1. Advertising

  2. robert

    Guest

    s.split(',');
    robert wrote:
    > What is a most simple expression for splitting a CSV line with "-protected fields?
    >
    > s='"123","a,b,\"c\"",5.640'
     
    , Nov 16, 2006
    #2
    1. Advertising

  3. robert wrote:

    > What is a most simple expression for splitting a CSV line
    > with "-protected fields?
    >
    > s='"123","a,b,\"c\"",5.640'


    import csv

    the preferred way is to read the file using that module. if you insist
    on processing a single line, you can do

    cols = list(csv.reader([string]))

    </F>
     
    Fredrik Lundh, Nov 16, 2006
    #3
  4. robert wrote:

    > What is a most simple expression for splitting a CSV line with "-protected
    > fields?
    >
    > s='"123","a,b,\"c\"",5.640'


    Use the csv-module. It should have a dialect for this, albeit I'm not 100%
    sure if the escaping of the " is done properly from csv POV. Might be that
    it requires excel-standard.

    Diez
     
    Diez B. Roggisch, Nov 16, 2006
    #4
  5. robert

    Peter Otten Guest

    robert wrote:

    > What is a most simple expression for splitting a CSV line with "-protected
    > fields?
    >
    > s='"123","a,b,\"c\"",5.640'


    >>> import csv
    >>> class mydialect(csv.excel):

    .... escapechar = "\\"
    ....
    >>> csv.reader(['"123","a,b,\\"c\\"",5.640'], dialect=mydialect).next()

    ['123', 'a,b,"c"', '5.640']

    Peter
     
    Peter Otten, Nov 16, 2006
    #5
  6. robert

    John Machin Guest

    Fredrik Lundh wrote:
    > robert wrote:
    >
    > > What is a most simple expression for splitting a CSV line
    > > with "-protected fields?
    > >
    > > s='"123","a,b,\"c\"",5.640'

    >
    > import csv
    >
    > the preferred way is to read the file using that module. if you insist
    > on processing a single line, you can do
    >
    > cols = list(csv.reader([string]))
    >
    > </F>


    Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
    (Intel)] on win32
    | >>> import csv
    | >>> s='"123","a,b,\"c\"",5.640'
    | >>> cols = list(csv.reader())
    | >>> cols
    [['123', 'a,b,c""', '5.640']]
    # maybe we need a bit more:
    | >>> cols = list(csv.reader())[0]
    | >>> cols
    ['123', 'a,b,c""', '5.640']

    I'd guess that the OP is expecting 'a,b,"c"' for the second field.

    Twiddling with the knobs doesn't appear to help:

    | >>> list(csv.reader(, escapechar='\\'))[0]
    ['123', 'a,b,c""', '5.640']
    | >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
    ['123', 'a,b,c""', '5.640']

    Looks like a bug to me; AFAICT from the docs, the last attempt should
    have worked.

    Cheers,
    John
     
    John Machin, Nov 16, 2006
    #6
  7. robert

    John Machin Guest

    John Machin wrote:
    > Fredrik Lundh wrote:
    > > robert wrote:
    > >
    > > > What is a most simple expression for splitting a CSV line
    > > > with "-protected fields?
    > > >
    > > > s='"123","a,b,\"c\"",5.640'

    > >
    > > import csv
    > >
    > > the preferred way is to read the file using that module. if you insist
    > > on processing a single line, you can do
    > >
    > > cols = list(csv.reader([string]))
    > >
    > > </F>

    >
    > Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
    > (Intel)] on win32
    > | >>> import csv
    > | >>> s='"123","a,b,\"c\"",5.640'
    > | >>> cols = list(csv.reader())
    > | >>> cols
    > [['123', 'a,b,c""', '5.640']]
    > # maybe we need a bit more:
    > | >>> cols = list(csv.reader())[0]
    > | >>> cols
    > ['123', 'a,b,c""', '5.640']
    >
    > I'd guess that the OP is expecting 'a,b,"c"' for the second field.
    >
    > Twiddling with the knobs doesn't appear to help:
    >
    > | >>> list(csv.reader(, escapechar='\\'))[0]
    > ['123', 'a,b,c""', '5.640']
    > | >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
    > ['123', 'a,b,c""', '5.640']
    >
    > Looks like a bug to me; AFAICT from the docs, the last attempt should
    > have worked.


    Given Peter Otten's post, looks like
    (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
    escapechar in my first twiddle, which should give the same result as
    Peter's.
    (2)
    | >>> csv.excel.doublequote
    True
    According to my reading of the docs:
    """
    doublequote
    Controls how instances of quotechar appearing inside a field should be
    themselves be quoted. When True, the character is doubled. When False,
    the escapechar is used as a prefix to the quotechar. It defaults to
    True.
    """
    Peter's example should not have worked.
     
    John Machin, Nov 16, 2006
    #7
  8. John Machin wrote:

    > Given Peter Otten's post, looks like
    > (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
    > escapechar in my first twiddle, which should give the same result as
    > Peter's.
    > (2)
    > | >>> csv.excel.doublequote
    > True
    > According to my reading of the docs:
    > """
    > doublequote
    > Controls how instances of quotechar appearing inside a field should be
    > themselves be quoted. When True, the character is doubled. When False,
    > the escapechar is used as a prefix to the quotechar. It defaults to
    > True.
    > """
    > Peter's example should not have worked.


    the documentation also mentions a "quoting" parameter that "controls
    when quotes should be generated by the writer and recognised by the
    reader.". not sure how that changes things.

    anyway, it's either unclear documentation or a bug in the code. better
    submit a bug report so someone can fix one of them.

    </F>
     
    Fredrik Lundh, Nov 16, 2006
    #8
  9. robert

    John Machin Guest

    John Machin wrote:
    > John Machin wrote:
    > > Fredrik Lundh wrote:
    > > > robert wrote:
    > > >
    > > > > What is a most simple expression for splitting a CSV line
    > > > > with "-protected fields?
    > > > >
    > > > > s='"123","a,b,\"c\"",5.640'
    > > >
    > > > import csv
    > > >
    > > > the preferred way is to read the file using that module. if you insist
    > > > on processing a single line, you can do
    > > >
    > > > cols = list(csv.reader([string]))
    > > >
    > > > </F>

    > >
    > > Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
    > > (Intel)] on win32
    > > | >>> import csv
    > > | >>> s='"123","a,b,\"c\"",5.640'
    > > | >>> cols = list(csv.reader())
    > > | >>> cols
    > > [['123', 'a,b,c""', '5.640']]
    > > # maybe we need a bit more:
    > > | >>> cols = list(csv.reader())[0]
    > > | >>> cols
    > > ['123', 'a,b,c""', '5.640']
    > >
    > > I'd guess that the OP is expecting 'a,b,"c"' for the second field.
    > >
    > > Twiddling with the knobs doesn't appear to help:
    > >
    > > | >>> list(csv.reader(, escapechar='\\'))[0]
    > > ['123', 'a,b,c""', '5.640']
    > > | >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
    > > ['123', 'a,b,c""', '5.640']
    > >
    > > Looks like a bug to me; AFAICT from the docs, the last attempt should
    > > have worked.

    >
    > Given Peter Otten's post, looks like
    > (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
    > escapechar in my first twiddle, which should give the same result as
    > Peter's.
    > (2)
    > | >>> csv.excel.doublequote
    > True
    > According to my reading of the docs:
    > """
    > doublequote
    > Controls how instances of quotechar appearing inside a field should be
    > themselves be quoted. When True, the character is doubled. When False,
    > the escapechar is used as a prefix to the quotechar. It defaults to
    > True.
    > """
    > Peter's example should not have worked.


    Doh. The OP's string was a raw string. I need some sleep.
    Scrap bug #1!

    | >>> s=r'"123","a,b,\"c\"",5.640'
    | >>> list(csv.reader())[0]
    ['123', 'a,b,\\c\\""', '5.640']
    # What's that???
    | >>> list(csv.reader(, escapechar='\\'))[0]
    ['123', 'a,b,"c"', '5.640']
    | >>> list(csv.reader(, escapechar='\\', doublequote=False))[0]
    ['123', 'a,b,"c"', '5.640']

    And there's still the problem with doublequote ....

    Goodnight ...
     
    John Machin, Nov 16, 2006
    #9
  10. robert

    Peter Otten Guest

    John Machin wrote:

    > | >>> s='"123","a,b,\"c\"",5.640'


    Note how I fixed the input:

    >>> '"123","a,b,\"c\"",5.640'

    '"123","a,b,"c"",5.640'

    >>> '"123","a,b,\\"c\\"",5.640'

    '"123","a,b,\\"c\\"",5.640'

    Peter
     
    Peter Otten, Nov 16, 2006
    #10
  11. robert

    John Machin Guest

    Fredrik Lundh wrote:
    > John Machin wrote:
    >
    > > Given Peter Otten's post, looks like
    > > (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
    > > escapechar in my first twiddle, which should give the same result as
    > > Peter's.
    > > (2)
    > > | >>> csv.excel.doublequote
    > > True
    > > According to my reading of the docs:
    > > """
    > > doublequote
    > > Controls how instances of quotechar appearing inside a field should be
    > > themselves be quoted. When True, the character is doubled. When False,
    > > the escapechar is used as a prefix to the quotechar. It defaults to
    > > True.
    > > """
    > > Peter's example should not have worked.

    >
    > the documentation also mentions a "quoting" parameter that "controls
    > when quotes should be generated by the writer and recognised by the
    > reader.". not sure how that changes things.


    Hi Fredrik, I read that carefully -- "quoting" appears to have no
    effect in this situation.

    >
    > anyway, it's either unclear documentation or a bug in the code. better
    > submit a bug report so someone can fix one of them.


    Tomorrow :)
    Cheers,
    John
     
    John Machin, Nov 16, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    490
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    735
    Alex Martelli
    Sep 17, 2004
  3. sso
    Replies:
    20
    Views:
    2,773
    Martin Gregorie
    Apr 26, 2009
  4. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    237
    Florian Gross
    Dec 28, 2004
  5. Sam Kong
    Replies:
    5
    Views:
    275
    Rick DeNatale
    Aug 12, 2006
Loading...

Share This Page