parsing string into dict

Tim Arnold · Sep 1, 2010

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter. Here's an example:

Code:

I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold

The 'line' is like my example above but it comes in without the ending
bracket, so I append one on the 6th line.

    def parse_options(line):
        options = dict()
        if not line:
            return options
        active  = ['[','=',',','{','}',']']
        line += ']'
        key     = ''
        word    = ''
        inner   = 0
        for c in list(line):
            if c in active:
                if c == '{': inner +=1
                elif c == '}': inner -=1
                if inner:
                    word += c
                else:
                    if c == '=':
                        (key,word) = (word,'')
                        options[key.strip()] = True
                    elif c in [',', ']']:
                        if not key:
                            options[word.strip()] = True
                        else:
                            options[key.strip()] = word.strip()
                        (key,word) = (False, '')
            else:
                word += c
        return options

Arnaud Delobelle · Sep 1, 2010

Tim Arnold said:

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter. Here's an example:

Code:

I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold
[/QUOTE]

FWIW, here's how I would do it:

def parse_key(s, start):
    pos = start
    while s[pos] not in ",=]":
        pos += 1
    return s[start:pos].strip(), pos

def parse_value(s, start):
    pos, nesting = start, 0
    while nesting or s[pos] not in ",]":
        nesting += {"{":1, "}":-1}.get(s[pos], 0)
        pos += 1
    return s[start:pos].strip(), pos

def parse_options(s):
    options, pos = {}, 0
    while s[pos] != "]":
        key, pos = parse_key(s, pos + 1)
        if s[pos] == "=":
            value, pos = parse_value(s, pos + 1)
        else:
            value = 'true'
        options[key] = value
    return options

test = "[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued]"
{'caption': '{My Analysis for \textbf{t}, Version 1}', 'code': 'one', 'continued': True}

Aleksey · Sep 2, 2010

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter. Here's an example:

Code:

I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold

The 'line' is like my example above but it comes in without the ending
bracket, so I append one on the 6th line.
[/QUOTE]


You can use regular expression (also you not need adding ending
bracket):

import re
patt = re.compile(ur'\[code=(?P<CODE>\w+),\scaption=(?P<CAPTION>\{.+\})
(?P<CONTINUED>,\scontinued)?\]?')
def parse_options(s):
	g=patt.match(s).groupdict()
	return {'caption' : g['CAPTION'], 'code' : g['CODE'], 'continued' :
g['CONTINUED'] and True or False}


Test is next:

[QUOTE][QUOTE][QUOTE]
s=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued]'
s1=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}]'
s2=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued'
s3=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}' 
parse_options(s)[/QUOTE][/QUOTE][/QUOTE]
{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}[QUOTE][/QUOTE]

Aleksey · Sep 2, 2010

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter. Here's an example:

Code:

I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold

The 'line' is like my example above but it comes in without the ending
bracket, so I append one on the 6th line.
[/QUOTE]


You can use regular expression (also you not need adding ending
bracket):

import re
patt = re.compile(ur'\[code=(?P<CODE>\w+),\scaption=(?P<CAPTION>\{.+\})
(?P<CONTINUED>,\scontinued)?\]?')
def parse_options(s):
	g=patt.match(s).groupdict()
	return {'caption' : g['CAPTION'], 'code' : g['CODE'], 'continued' :
g['CONTINUED'] and True or False}


Test is next:

[QUOTE][QUOTE][QUOTE]
s=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued]'
s1=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}]'
s2=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued'
s3=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}' 
parse_options(s)[/QUOTE][/QUOTE][/QUOTE]
{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}[QUOTE][/QUOTE]

best way to create a dict from string	1	Feb 18, 2010
a little parsing challenge â˜º	70	Jul 17, 2011
xml.parsers.expat loading xml into a dict and whitespace	6	May 23, 2007
Code: Rolling a Container Into a String	7	Jun 25, 2004
Better way to do parsing?	3	May 4, 2005
[SUMMARY] Parsing JSON (#155)	12	Feb 7, 2008
does this exception mean something else?	0	May 7, 2010
Iterating command switches from a data file - have a working solutionbut it seems inefficient	17	Apr 13, 2006

parsing string into dict

Tim Arnold

Arnaud Delobelle

Aleksey

Aleksey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads