Splitting a quoted string.

mosscliffe · May 16, 2007

I am looking for a simple split function to create a list of entries
from a string which contains quoted elements. Like in 'google'
search.

eg string = 'bob john "johnny cash" 234 june'

and I want to have a list of ['bob', 'john, 'johnny cash', '234',
'june']

I wondered about using the csv routines, but I thought I would ask the
experts first.

There maybe a simple function, but as yet I have not found it.

Thanks

Richard

Paul Melis · May 16, 2007

Hi,

I am looking for a simple split function to create a list of entries
from a string which contains quoted elements. Like in 'google'
search.

eg string = 'bob john "johnny cash" 234 june'

and I want to have a list of ['bob', 'john, 'johnny cash', '234',
'june']

I wondered about using the csv routines, but I thought I would ask the
experts first.

There maybe a simple function, but as yet I have not found it.

Here a not-so-simple-function using regular expressions. It repeatedly
matched two regexps, one that matches any sequence of characters except
a space and one that matches a double-quoted string. If there are two
matches the one occurring first in the string is taken and the matching
part of the string cut off. This is repeated until the whole string is
matched. If there are two matches at the same point in the string the
longer of the two matches is taken. (This can't be done with a single
regexp using the A|B operator, as it uses lazy evaluation. If A matches
then it is returned even if B would match a longer string).

import re

def split_string(s):

pat1 = re.compile('[^ ]+')
pat2 = re.compile('"[^"]*"')

parts = []

m1 = pat1.search(s)
m2 = pat2.search(s)
while m1 or m2:

if m1 and m2:
# Both match, take match occurring earliest in the string
p1 = m1.group(0)
p2 = m2.group(0)
if m1.start(0) < m2.start(0):
part = p1
s = s[m1.end(0):]
elif m2.start(0) < m1.start(0):
part = p2
s = s[m2.end(0):]
else:
# Both match at the same string position, take longest match
if len(p1) > len(p2):
part = p1
s = s[m1.end(0):]
else:
part = p2
s = s[m2.end(0):]
elif m1:
part = m1.group(0)
s = s[m1.end(0):]
else:
part = m2.group(0)
s = s[m2.end(0):]

parts.append(part)

m1 = pat1.search(s)
m2 = pat2.search(s)

return parts

>>> s = 'bob john "johnny cash" 234 june'
>>> split_string(s) ['bob', 'john', '"johnny cash"', '234', 'june']
>>>

Click to expand...

Click to expand...

Paul

Paul Melis · May 16, 2007

Paul said:
Hi,

I am looking for a simple split function to create a list of entries
from a string which contains quoted elements. Like in 'google'
search.

eg string = 'bob john "johnny cash" 234 june'

and I want to have a list of ['bob', 'john, 'johnny cash', '234',
'june']

I wondered about using the csv routines, but I thought I would ask the
experts first.

There maybe a simple function, but as yet I have not found it.

Click to expand...

Here a not-so-simple-function using regular expressions. It repeatedly
matched two regexps, one that matches any sequence of characters except
a space and one that matches a double-quoted string. If there are two
matches the one occurring first in the string is taken and the matching
part of the string cut off. This is repeated until the whole string is
matched. If there are two matches at the same point in the string the
longer of the two matches is taken. (This can't be done with a single
regexp using the A|B operator, as it uses lazy evaluation. If A matches
then it is returned even if B would match a longer string).

Here a slightly improved version which is a bit more compact and which
removes the quotes on the matched output quoted string.

import re

def split_string(s):

pat1 = re.compile('[^" ]+')
pat2 = re.compile('"([^"]*)"')

parts = []

m1 = pat1.search(s)
m2 = pat2.search(s)
while m1 or m2:

if m1 and m2:
if m1.start(0) < m2.start(0):
match = 1
elif m2.start(0) < m1.start(0):
match = 2
else:
if len(m1.group(0)) > len(m2.group(0)):
match = 1
else:
match = 2
elif m1:
match = 1
else:
match = 2

if match == 1:
part = m1.group(0)
s = s[m1.end(0):]
else:
part = m2.group(1)
s = s[m2.end(0):]

parts.append(part)

m1 = pat1.search(s)
m2 = pat2.search(s)

return parts

print split_string('bob john "johnny cash" 234 june')
print split_string('"abc""abc"')

Duncan Booth · May 16, 2007

mosscliffe said:
I am looking for a simple split function to create a list of entries
from a string which contains quoted elements. Like in 'google'
search.

eg string = 'bob john "johnny cash" 234 june'

and I want to have a list of ['bob', 'john, 'johnny cash', '234',
'june']

I wondered about using the csv routines, but I thought I would ask the
experts first.

There maybe a simple function, but as yet I have not found it.

You probably need to specify the problem more completely. e.g. Can the
quoted parts of the strings contain quote marks? If so how what are the
rules for escaping them. Do two spaces between a word mean an empty field
or still a single string delimiter.

Once you've worked that out you can either use re.split with a suitable
regular expression, or use the csv module specifying your desired dialect:
delimiter = ' '
quotechar = '"'
doublequote = False
skipinitialspace = False
lineterminator = '\r\n'
quoting = csv.QUOTE_MINIMAL

csv.register_dialect("mosscliffe", mosscliffe)
string = 'bob john "johnny cash" 234 june'
for row in csv.reader([string], dialect="mosscliffe"):

Click to expand...

Click to expand...

print row

['bob', 'john', 'johnny cash', '234', 'june']

mosscliffe · May 16, 2007

Thank you very much for all for your replies.

I am now much wiser to using regex and CSV.

As I am quite a newbie, I have had my 'class' education improved as
well.

Many thanks again

Richard

mosscliffe said:
mosscliffe said:

I am looking for a simple split function to create a list of entries
from a string which contains quoted elements. Like in 'google'
search.

Click to expand...

eg string = 'bob john "johnny cash" 234 june'

Click to expand...

and I want to have a list of ['bob', 'john, 'johnny cash', '234',
'june']

Click to expand...

I wondered about using the csv routines, but I thought I would ask the
experts first.

Click to expand...

There maybe a simple function, but as yet I have not found it.

Click to expand...

You probably need to specify the problem more completely. e.g. Can the
quoted parts of the strings contain quote marks? If so how what are the
rules for escaping them. Do two spaces between a word mean an empty field
or still a single string delimiter.

Once you've worked that out you can either use re.split with a suitable
regular expression, or use the csv module specifying your desired dialect:

delimiter = ' '
quotechar = '"'
doublequote = False
skipinitialspace = False
lineterminator = '\r\n'
quoting = csv.QUOTE_MINIMAL

csv.register_dialect("mosscliffe", mosscliffe)
string = 'bob john "johnny cash" 234 june'
for row in csv.reader([string], dialect="mosscliffe"):

Click to expand...

Click to expand...

print row

['bob', 'john', 'johnny cash', '234', 'june']

Gerard Flanagan · May 16, 2007

I am looking for a simple split function to create a list of entries
from a string which contains quoted elements. Like in 'google'
search.

eg string = 'bob john "johnny cash" 234 june'

and I want to have a list of ['bob', 'john, 'johnny cash', '234',
'june']

I wondered about using the csv routines, but I thought I would ask the
experts first.

There maybe a simple function, but as yet I have not found it.

See 'split' from 'shlex' module:

s = 'bob john "johnny cash" 234 june'
import shlex
shlex.split(s) ['bob', 'john', 'johnny cash', '234', 'june']

Click to expand...

Click to expand...

Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Quote-aware string splitting	7	Apr 26, 2005
Problem splitting a string	8	Oct 15, 2005
Splitting a string	5	Feb 14, 2006
splitting a long string into a list	8	Nov 28, 2006
Splitting a string into characters - not bytes	6	Nov 4, 2010
split a string with quoted parts into list	5	Mar 10, 2005
Help splitting a simple date string	7	Mar 4, 2008

Splitting a quoted string.

mosscliffe

Paul Melis

Paul Melis

Duncan Booth

mosscliffe

Gerard Flanagan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads