Quote aware split

O

Ondrej Baudys

Hi,

After trawling through the archives for a simple quote aware split
implementation (ie string.split-alike that only splits outside of
matching quote) and coming up short, I implemented a quick and dirty
function that suits my purposes.

It's ugly and it doesn't use a stack, it only supports a single
character as a 'sep' function, only supports one type of quote (ie '
or " but not both), but it does the job, and since there have been a
few appeals over the years for something of this sort I have decided
to post what I have:

--- BEGIN ---
#!/usr/bin/env python

def qsplit(chars, sep, quote="'"):
""" Quote aware split """
qcount = 0
splitpoints = [-1] # ie. seperator char found before first letter ;)
for index, c in enumerate(chars):
if c is quote:
qcount += 1
if c is sep and qcount % 2 == 0:
splitpoints.append(index)

# slice chars by splitpoints *omitting the separator*
slices = [chars[splitpoints+1:splitpoints[i+1]]
for i in range(len(splitpoints)-1)]

# last slice will be of the form chars[last:] which we couldnt do above
slices.append(chars[splitpoints[-1]+1:])
return slices


if __name__ == "__main__":
test = "This is gonna be in quotes ';' and this is not; lets see
how we split"

test2 = """
A more complex example; try this on for size:

create function blah '
split me once;
split me twice; '
end;
'one more time;'
and again;
"""
print "*--split--*".join(qsplit(test, ';'))
print "*--split--*".join(qsplit(test2, ';'))

# vim:tabstop=4:shiftwidth=4:expandtab
--- END ---

Regards,
Ondrej Baudys
 
D

Diez B. Roggisch

Ondrej said:
Hi,

After trawling through the archives for a simple quote aware split
implementation (ie string.split-alike that only splits outside of
matching quote) and coming up short, I implemented a quick and dirty
function that suits my purposes.

<snip/>

Maybe using the csv module together with cStringIO would be more
straightforward.

Diez
 
J

John Machin

# last slice will be of the form chars[last:] which we couldnt do above

Who are "we"? Here's another version with the "couldn't do" problem
fixed and a few minor enhancements:

def qsplit2(chars, sep=",", quote="'"):
""" Quote aware split """
assert sep != quote
can_split = True
splitpoints = [-1] # ie. separator char found before first
letter ;)
for index, c in enumerate(chars):
if c == quote:
can_split = not can_split
elif c == sep and can_split:
splitpoints.append(index)
if not can_split:
raise ValueError("Unterminated quote")
splitpoints.append(len(chars))
# slice chars by splitpoints *omitting the separator*
slices = [chars[splitpoints+1:splitpoints[i+1]]
for i in range(len(splitpoints)-1)]
return slices

Cheers,
John
 
T

Tim Arnold

Ondrej Baudys said:
Hi,

After trawling through the archives for a simple quote aware split
implementation (ie string.split-alike that only splits outside of
matching quote) and coming up short, I implemented a quick and dirty
function that suits my purposes.

Take a look at pyparsing--you'll like it I think.
esp. http://pyparsing.wikispaces.com/Examples

--Tim Arnold
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top