Re: split string at commas respecting quotes when string not in csvformat

Discussion in 'Python' started by Terry Reedy, Mar 26, 2009.

  1. Terry Reedy

    Terry Reedy Guest

    R. David Murray wrote:
    > OK, I've got a little problem that I'd like to ask the assembled minds
    > for help with. I can write code to parse this, but I'm thinking it may
    > be possible to do it with regexes. My regex foo isn't that good, so if
    > anyone is willing to help (or offer an alternate parsing suggestion)
    > I would be greatful. (This has to be stdlib only, by the way, I
    > can't introduce any new modules into the application so pyparsing is
    > not an option.)
    >
    > The challenge is to turn a string like this:
    >
    > a=1,b="0234,)#($)@", k="7"
    >
    > into this:
    >
    > [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]


    But the starting string IS is csv format, where the values are strings
    with the format name=string.

    >>> import csv
    >>> myDialect = csv.excel
    >>> myDialect.skipinitialspace = True # needed for space before 'k'
    >>> a=list(csv.reader(['''a=1,b="0234,)#($)@", k="7"'''], myDialect))[0]
    >>> a

    ['a=1', 'b="0234', ')#($)@"', 'k="7"']
    >>> b=[tuple(s.split('=',1)) for s in a]
    >>> b

    [('a', '1'), ('b', '"0234'), (')#($)@"',), ('k', '"7"')]

    Terry Jan Reedy
    Terry Reedy, Mar 26, 2009
    #1
    1. Advertising

  2. Terry Reedy

    John Machin Guest

    On Mar 27, 8:43 am, Terry Reedy <> wrote:
    > R. David Murray wrote:
    > > OK, I've got a little problem that I'd like to ask the assembled minds
    > > for help with.  I can write code to parse this, but I'm thinking it may
    > > be possible to do it with regexes.  My regex foo isn't that good, so if
    > > anyone is willing to help (or offer an alternate parsing suggestion)
    > > I would be greatful.  (This has to be stdlib only, by the way, I
    > > can't introduce any new modules into the application so pyparsing is
    > > not an option.)

    >
    > > The challenge is to turn a string like this:

    >
    > >     a=1,b="0234,)#($)@", k="7"

    >
    > > into this:

    >
    > >     [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]

    >
    > But the starting string IS is csv format, where the values are strings
    > with the format name=string.
    >
    >  >>> import csv
    >  >>> myDialect = csv.excel
    >  >>> myDialect.skipinitialspace = True # needed for space before 'k'
    >  >>> a=list(csv.reader(['''a=1,b="0234,)#($)@", k="7"'''], myDialect))[0]
    >  >>> a
    > ['a=1', 'b="0234', ')#($)@"', 'k="7"']
    >  >>> b=[tuple(s.split('=',1)) for s in a]
    >  >>> b
    > [('a', '1'), ('b', '"0234'), (')#($)@"',), ('k', '"7"')]
    >


    It's in the csv format that Excel accepts on input but this is
    irrelevant. The output does not meet the OP's requirements; it has
    taken the should-have-been-protected comma as a delimiter, and
    produced FOUR elements instead of THREE ... also note '"0234' has a
    leading " and ')#($)@"' has a trailing "
    John Machin, Mar 26, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    869
    GIMME
    Feb 11, 2004
  2. Jim
    Replies:
    8
    Views:
    387
    Raymond Hettinger
    Jul 10, 2006
  3. AviraM
    Replies:
    2
    Views:
    6,332
    Manish Pandit
    Sep 28, 2006
  4. Robert Dodier
    Replies:
    5
    Views:
    661
    mario
    Jun 25, 2008
  5. R. David Murray
    Replies:
    8
    Views:
    584
    Tim Chase
    Mar 27, 2009
Loading...

Share This Page