Quote-aware string splitting

J

J. W. McCall

Hello,

I need to split a string as per string.strip(), but with a modification:
I want it to recognize quoted strings and return them as one list item,
regardless of any whitespace within the quoted string.

For example, given the string:

'spam "the life of brian" 42'

I'd want it to return:

['spam', 'the life of brian', '42']

I see no standard library function to do this, so what would be the most
simple way to achieve this? This should be simple, but I must be tired
as I'm not currently able to think of an elegant way to do this.

Any ideas?

Thanks,

J. W. McCall
 
T

Tim Heaney

J. W. McCall said:
I need to split a string as per string.strip(), but with a
modification: I want it to recognize quoted strings and return them as
one list item, regardless of any whitespace within the quoted string.

For example, given the string:

'spam "the life of brian" 42'

I'd want it to return:

['spam', 'the life of brian', '42']

I see no standard library function to do this, so what would be the
most simple way to achieve this? This should be simple, but I must be
tired as I'm not currently able to think of an elegant way to do this.

Any ideas?

How about the csv module? It seems like it might be overkill, but it
does already handle that sort of quoting
>>> import csv
>>> csv.reader(['spam "the life of brian" 42'], delimiter=' ').next()
['spam', 'the life of brian', '42']
 
G

George Sakkis

J. W. McCall said:
I need to split a string as per string.strip(), but with a
modification: I want it to recognize quoted strings and return them as
one list item, regardless of any whitespace within the quoted string.

For example, given the string:

'spam "the life of brian" 42'

I'd want it to return:

['spam', 'the life of brian', '42']

I see no standard library function to do this, so what would be the
most simple way to achieve this? This should be simple, but I must be
tired as I'm not currently able to think of an elegant way to do this.

Any ideas?

How about the csv module? It seems like it might be overkill, but it
does already handle that sort of quoting
import csv
csv.reader(['spam "the life of brian" 42'], delimiter='
').next()
['spam', 'the life of brian', '42']


I don't know if this is as good as CSV's splitter, but it works
reasonably well for me:

import re
regex = re.compile(r'''
'.*?' | # single quoted substring
".*?" | # double quoted substring
\S+ # all the rest
''', re.VERBOSE)

print regex.findall('''
This is 'single "quoted" string'
followed by a "double 'quoted' string"
''')

George
 
J

Jeffrey Froman

J. W. McCall said:
For example, given the string:

'spam "the life of brian" 42'

I'd want it to return:

['spam', 'the life of brian', '42']

The .split() method of strings can take a substring, such as a quotation
mark, as a delimiter. So a simple solution is:
x = 'spam "the life of brian" 42'
[z.strip() for z in x.split('"')]
['spam', 'the life of brian', '42']


Jeffrey
 
G

George Sakkis

import re
regex = re.compile(r'''
'.*?' | # single quoted substring
".*?" | # double quoted substring
\S+ # all the rest
''', re.VERBOSE)

Oh, and if your strings may span more than one line, replace re.VERBOSE
with re.VERBOSE | re.DOTALL.

George
 
B

Bengt Richter

J. W. McCall said:
For example, given the string:

'spam "the life of brian" 42'

I'd want it to return:

['spam', 'the life of brian', '42']

The .split() method of strings can take a substring, such as a quotation
mark, as a delimiter. So a simple solution is:
x = 'spam "the life of brian" 42'
[z.strip() for z in x.split('"')]
['spam', 'the life of brian', '42']
>>> x = ' sspam " ssthe life of brianss " 42'
>>> [z.strip() for z in x.split('"')]
['sspam', 'ssthe life of brianss', '42']

Oops, note some spaces inside quotes near ss and missing double quotes in result.
Maybe (not tested beyond what you see):
>>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.split('"'))] if r] or [''] ['sspam', '" ssthe life of brianss "', '42']
>>> x = ' "" "" '
>>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.split('"'))] ifr] or [''] ['""', '""']
>>> x='""'
>>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.split('"'))] ifr] or [''] ['""']
>>> x=''
>>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.split('"'))] ifr] or ['']
['']
>>> [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.split('"'))]
['sspam', '" ssthe life of brianss "', '42']


Regards,
Bengt Richter
 
P

Paul McGuire

Quoted strings are surprisingly stateful, so that using a parser isn't
totally out of line. Here is a pyparsing example with some added test
cases. Pyparsing's quotedString built-in handles single or double
quotes (if you don't want to be this permissive, there are also
sglQuotedString and dblQuotedString to choose from), plus escaped quote
characters.

The snippet below includes two samples. The first 3 lines give the
equivalent to other suggestions on this thread. It is followed by a
slightly enhanced version that strips quotation marks from any quoted
entries.

-- Paul
(get pyparsing at http://pyparsing.sourceforge.net)
==========
from pyparsing import *
test = r'''spam 'it don\'t mean a thing' "the life of brian"
42 'the meaning of "life"' grail'''
print OneOrMore( quotedString | Word(printables) ).parseString( test )

# strip quotes during parsing
def stripQuotes(s,l,toks):
return toks[0][1:-1]
quotedString.setParseAction( stripQuotes )
print OneOrMore( quotedString | Word(printables) ).parseString( test )
==========

returns:
['spam', "'it don\\'t mean a thing'", '"the life of brian"', '42',
'\'the meaning of "life"\'', 'grail']
['spam', "it don\\'t mean a thing", 'the life of brian', '42', 'the
meaning of "life"', 'grail']
 
J

Jeffrey Froman

Bengt said:
Oops, note some spaces inside quotes near ss and missing double quotes in
result.

And here I thought the main problem with my answer was that it didn't split
unquoted segments into separate words at all! Clearly I missed the
generalization being sought, and a more robust solution is in order.
Fortunately, others have been forthcoming with them.

Thank you,
Jeffrey
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,149
Latest member
Vinay Kumar Nevatia0
Top