How to split a string containing nested commas-separated substrings

R

Robert Dodier

Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' => 'foo' 'bar' 'baz'
'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

Thank you very much for your help.

Robert
 
M

Matimus

Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' => 'foo' 'bar' 'baz'
'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

Thank you very much for your help.

Robert

You might look at the shlex module. It doesn't get you 100%, but its
close:
shlex.split('foo, bar, baz') ['foo,', 'bar,', 'baz']
shlex.split( 'foo, "bar, baz", blurf') ['foo,', 'bar, baz,', 'blurf']
shlex.split('foo, bar(baz, blurf), mumble')
['foo,', 'bar(baz,', 'blurf),', 'mumble']

Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.

pyparsing is one: http://pyparsing.wikispaces.com/

Matt
 
C

Cédric Lucantis

Hi,

Le Wednesday 18 June 2008 19:19:57 Robert Dodier, vous avez écrit :
Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' => 'foo' 'bar' 'baz'
'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

I'd do something like this (note that it doesn't check for quote/parenthesis
mismatch and removes _all_ the quotes) :

def mysplit (string) :
pardepth = 0
quote = False
ret = ['']

for car in string :

if car == '(' : pardepth += 1
elif car == ')' : pardepth -= 1
elif car in ('"', "'") :
quote = not quote
car = '' # just if you don't want to keep the quotes

if car in ', ' and not (pardepth or quote) :
if ret[-1] != '' : ret.append('')
else :
ret[-1] += car

return ret

# test
for s in ('foo, bar, baz',
'foo, "bar, baz", blurf',
'foo, bar(baz, blurf), mumble') :
print "'%s' => '%s'" % (s, mysplit(s))

# result
'foo, bar, baz' => '['foo', 'bar', 'baz']'
'foo, "bar, baz", blurf' => '['foo', 'bar, baz', 'blurf']'
'foo, bar(baz, blurf), mumble' => '['foo', 'bar(baz, blurf)', 'mumble']'
 
P

Paul McGuire

Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' => 'foo' 'bar' 'baz'
'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

Thank you very much for your help.

Robert

tests = """\
foo, bar, baz
foo, "bar, baz", blurf
foo, bar(baz, blurf), mumble""".splitlines()


from pyparsing import Word, alphas, alphanums, Optional, \
Group, delimitedList, quotedString

ident = Word(alphas+"_",alphanums+"_")
func_call = Group(ident + "(" + Optional(Group(delimitedList(ident)))
+ ")")

listItem = func_call | ident | quotedString

for t in tests:
print delimitedList(listItem).parseString(t).asList()


Prints:

['foo', 'bar', 'baz']
['foo', '"bar, baz"', 'blurf']
['foo', ['bar', '(', ['baz', 'blurf'], ')'], 'mumble']


-- Paul
 
M

Matimus

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

'foo, bar, baz' => 'foo' 'bar' 'baz'
'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
Thank you very much for your help.

You might look at the shlex module. It doesn't get you 100%, but its
close:

['foo,', 'bar,', 'baz']>>> shlex.split( 'foo, "bar, baz", blurf')

['foo,', 'bar, baz,', 'blurf']>>> shlex.split('foo, bar(baz, blurf), mumble')

['foo,', 'bar(baz,', 'blurf),', 'mumble']

Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.

pyparsing is one:http://pyparsing.wikispaces.com/

Matt

Following up to my own post, Here is a working example that uses the
built-in _ast module. I posted something similar the other day. This
uses pythons own internal parser to do it for you. It works in this
case because, at least from what you have posted, your syntax doesn't
violate python syntax.

Code:
import _ast

def eval_tuple(text):
    """ Evaluate a string representing a tuple of strings, names and
calls,
    returns a tuple of strings.
    """

    ast = compile(text, "<string>", 'eval', _ast.PyCF_ONLY_AST)
    return _traverse(ast.body)

def _traverse(ast):
    """ Traverse the AST returning string representations of tuples
strings
    names and calls.
    """
    if isinstance(ast, _ast.Tuple):
        return tuple(_traverse(el) for el in ast.elts)
    elif isinstance(ast, _ast.Str):
        return ast.s
    elif isinstance(ast, _ast.Name):
        return ast.id
    elif isinstance(ast, _ast.Call):
        name = ast.func.id
        args = [_traverse(x) for x in ast.args]
        return "%s(%s)"%(name, ", ".join(args))
    raise SyntaxError()

examples = [
    ('foo, bar, baz', ('foo', 'bar', 'baz')),
    ('foo, "bar, baz", blurf', ('foo', 'bar, baz', 'blurf')),
    ('foo, bar(baz, blurf), mumble', ('foo', 'bar(baz, blurf)',
'mumble')),
    ]

def test():
    for text, expected in examples:
        print "trying %r => %r"%(text, expected)
        result = eval_tuple(text)
        if result == expected:
            print "PASS"
        else:
            print "FAIL, GOT: %r"%result

if __name__ == "__main__":
    test()

Matt
 
M

mario

I have actually highlighted a small neat recipe for doing such
unpacking, that I use for parsing arbitrary parameters in Evoque
Templating. I never needed to handle "callable" parameters though, as
you do in your 3rd string example, so the little "unpack_symbol"
recipe I have publiched earlier does not handle it... anyhow, what I
referring to are:

Evoque Templating: http://evoque.gizmojo.org/
Code highlight: http://gizmojo.org/code/unpack_symbol/

However, a little variation of the aboverecipe can do what you are
looking for, in a rather cute way. The difference is to make the
original recipe handle "callable strings", and I achieve this by
modifying the recipe like so:


class callable_str(str):
def __call__(s, *args):
return s+str(args)

class _UnpackGlobals(dict):
def __getitem__(self, name):
return callable_str(name)

def unpack_symbol(symbol, globals=_UnpackGlobals()):
""" If compound symbol (list, tuple, nested) unpack to atomic
symbols """
return eval(symbol, globals, None)


Now, calling unpack_symbol() on each sample string gives the following
tuple of strings:


Mario Ruggier
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top