parsing string into dict

T

Tim Arnold

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter. Here's an example:

Code:
I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold

The 'line' is like my example above but it comes in without the ending
bracket, so I append one on the 6th line.

    def parse_options(line):
        options = dict()
        if not line:
            return options
        active  = ['[','=',',','{','}',']']
        line += ']'
        key     = ''
        word    = ''
        inner   = 0
        for c in list(line):
            if c in active:
                if c == '{': inner +=1
                elif c == '}': inner -=1
                if inner:
                    word += c
                else:
                    if c == '=':
                        (key,word) = (word,'')
                        options[key.strip()] = True
                    elif c in [',', ']']:
                        if not key:
                            options[word.strip()] = True
                        else:
                            options[key.strip()] = word.strip()
                        (key,word) = (False, '')
            else:
                word += c
        return options
 
A

Arnaud Delobelle

Tim Arnold said:
Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter. Here's an example:

Code:
I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold
[/QUOTE]

FWIW, here's how I would do it:

def parse_key(s, start):
    pos = start
    while s[pos] not in ",=]":
        pos += 1
    return s[start:pos].strip(), pos

def parse_value(s, start):
    pos, nesting = start, 0
    while nesting or s[pos] not in ",]":
        nesting += {"{":1, "}":-1}.get(s[pos], 0)
        pos += 1
    return s[start:pos].strip(), pos

def parse_options(s):
    options, pos = {}, 0
    while s[pos] != "]":
        key, pos = parse_key(s, pos + 1)
        if s[pos] == "=":
            value, pos = parse_value(s, pos + 1)
        else:
            value = 'true'
        options[key] = value
    return options

test = "[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued]"
{'caption': '{My Analysis for \textbf{t}, Version 1}', 'code': 'one', 'continued': True}
 
A

Aleksey

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter.  Here's an example:

Code:
I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold

The 'line' is like my example above but it comes in without the ending
bracket, so I append one on the 6th line.
[/QUOTE]


You can use regular expression (also you not need adding ending
bracket):

import re
patt = re.compile(ur'\[code=(?P<CODE>\w+),\scaption=(?P<CAPTION>\{.+\})
(?P<CONTINUED>,\scontinued)?\]?')
def parse_options(s):
	g=patt.match(s).groupdict()
	return {'caption' : g['CAPTION'], 'code' : g['CODE'], 'continued' :
g['CONTINUED'] and True or False}


Test is next:

[QUOTE][QUOTE][QUOTE]
s=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued]'
s1=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}]'
s2=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued'
s3=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}' 
parse_options(s)[/QUOTE][/QUOTE][/QUOTE]
{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}[QUOTE][/QUOTE]
 
A

Aleksey

Hi,
I have a set of strings that are *basically* comma separated, but with
the exception that if a comma occur insides curly braces it is not a
delimiter.  Here's an example:

Code:
I'd like to parse that into a dictionary (note that 'continued' gets
the value 'true'):
{'code':'one', 'caption':'{My Analysis for \textbf{t}, Version
1}','continued':'true'}

I know and love pyparsing, but for this particular code I need to rely
only on the standard library (I'm running 2.7). Here's what I've got,
and it works. I wonder if there's a simpler way?
thanks,
--Tim Arnold

The 'line' is like my example above but it comes in without the ending
bracket, so I append one on the 6th line.
[/QUOTE]


You can use regular expression (also you not need adding ending
bracket):

import re
patt = re.compile(ur'\[code=(?P<CODE>\w+),\scaption=(?P<CAPTION>\{.+\})
(?P<CONTINUED>,\scontinued)?\]?')
def parse_options(s):
	g=patt.match(s).groupdict()
	return {'caption' : g['CAPTION'], 'code' : g['CODE'], 'continued' :
g['CONTINUED'] and True or False}


Test is next:

[QUOTE][QUOTE][QUOTE]
s=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued]'
s1=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}]'
s2=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}, continued'
s3=u'[code=one, caption={My Analysis for \textbf{t}, Version 1}' 
parse_options(s)[/QUOTE][/QUOTE][/QUOTE]
{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': True}{'caption': u'{My Analysis for \textbf{t}, Version 1}', 'code':
u'one', 'continued': False}[QUOTE][/QUOTE]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top