split a string with quoted parts into list



hi there

i'm experimanting with imaplib and came across stringts like
(\HasNoChildren) "." "INBOX.Sent Items"
in which the quotes are part of the string.

now i try to convert this into a list. assume the string is in the variable
f, then i tried
but i end up with
['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
so due to the sapce in "Sent Items" its is sepearted in two entries, what i
don't want.

is there another way to convert a string with quoted sub entries into a list
of strings?

thanks a lot, olli

Daniel Dittmar

Try the standard module shlex
(http://www.python.org/dev/doc/devel/lib/module-shlex.html). It might be
that the quoting rules are not exactly the ones you need, though.


Max M

In Twisteds protocols/imap4.py module there is a function called
parseNestedParens() that can be ripped out of the module.

I have used it for another project and put it into this attachment.


hilsen/regards Max M, Denmark

IT's Mad Science

This code was stolen from Twisteds protocols/imap4.py module

import types, string

class IMAP4Exception(Exception):
def __init__(self, *args):
Exception.__init__(self, *args)

class MismatchedNesting(IMAP4Exception):

class MismatchedQuoting(IMAP4Exception):

def wildcardToRegexp(wildcard, delim=None):
wildcard = wildcard.replace('*', '(?:.*?)')
if delim is None:
wildcard = wildcard.replace('%', '(?:.*?)')
wildcard = wildcard.replace('%', '(?:(?:[^%s])*?)' % re.escape(delim))
return re.compile(wildcard, re.I)

def splitQuoted(s):
"""Split a string into whitespace delimited tokens

Tokens that would otherwise be separated but are surrounded by \"
remain as a single token. Any token that is not quoted and is
equal to \"NIL\" is tokenized as C{None}.

@type s: C{str}
@param s: The string to be split

@rtype: C{list} of C{str}
@return: A list of the resulting tokens

@raise MismatchedQuoting: Raised if an odd number of quotes are present
s = s.strip()
result = []
inQuote = inWord = start = 0
for (i, c) in zip(range(len(s)), s):
if c == '"' and not inQuote:
inQuote = 1
start = i + 1
elif c == '"' and inQuote:
inQuote = 0
start = i + 1
elif not inWord and not inQuote and c not in ('"' + string.whitespace):
inWord = 1
start = i
elif inWord and not inQuote and c in string.whitespace:
if s[start:i] == 'NIL':
start = i
inWord = 0
if inQuote:
raise MismatchedQuoting(s)
if inWord:
if s[start:] == 'NIL':
return result

def splitOn(sequence, predicate, transformers):
result = []
mode = predicate(sequence[0])
tmp = [sequence[0]]
for e in sequence[1:]:
p = predicate(e)
if p != mode:
tmp = [e]
mode = p
return result

def collapseStrings(results):
Turns a list of length-one strings and lists into a list of longer
strings and lists. For example,

['a', 'b', ['c', 'd']] is returned as ['ab', ['cd']]

@type results: C{list} of C{str} and C{list}
@param results: The list to be collapsed

@rtype: C{list} of C{str} and C{list}
@return: A new list which is the collapsed form of C{results}
copy = []
begun = None
listsList = [isinstance(s, types.ListType) for s in results]

pred = lambda e: isinstance(e, types.TupleType)
tran = {
0: lambda e: splitQuoted(''.join(e)),
1: lambda e: [''.join([i[0] for i in e])]
for (i, c, isList) in zip(range(len(results)), results, listsList):
if isList:
if begun is not None:
copy.extend(splitOn(results[begun:i], pred, tran))
begun = None
elif begun is None:
begun = i
if begun is not None:
copy.extend(splitOn(results[begun:], pred, tran))
return copy

def parseNestedParens(s, handleLiteral = 1):
"""Parse an s-exp-like string into a more useful data structure.

@type s: C{str}
@param s: The s-exp-like string to parse

@rtype: C{list} of C{str} and C{list}
@return: A list containing the tokens present in the input.

@raise MismatchedNesting: Raised if the number or placement
of opening or closing parenthesis is invalid.
s = s.strip()
inQuote = 0
contentStack = [[]]
i = 0
L = len(s)
while i < L:
c = s
if inQuote:
if c == '\\':
i += 2
elif c == '"':
inQuote = not inQuote
i += 1
if c == '"':
inQuote = not inQuote
i += 1
elif handleLiteral and c == '{':
end = s.find('}', i)
if end == -1:
raise ValueError, "Malformed literal"
literalSize = int(s[i+1:end])
i = end + 3 + literalSize
elif c == '(' or c == '[':
i += 1
elif c == ')' or c == ']':
i += 1
i += 1
except IndexError:
raise MismatchedNesting(s)
if len(contentStack) != 1:
raise MismatchedNesting(s)
return collapseStrings(contentStack[0])

if __name__=='__main__':

r = '(\Noinferiors \Unmarked) "/" "INBOX"(\Unmarked) "/" "test"(\Noinferiors \Unmarked) "/" "Sent Items"(\Noinferiors \Unmarked) "/" "Calendar"(\Noinferiors \Unmarked) "/" "Checklist"(\Unmarked) "/" "Cabinet"(\Noinferiors \Marked) "/" "Trash"(\Unmarked) "/" "INBOX.Sent"(\Unmarked) "/" "Sent"'

parsedParens = parseNestedParens(r)
print parsedParens
for i in range(0, len(parsedParens), 3):
(flags, seperator, folderName) = parsedParens[i:i+3]
print flags
print seperator
print folderName

Scott David Daniels

First break into strings, then space-split the non-strings.

def splitup(somestring):
gen = iter(somestring.split('"'))
for unquoted in gen:
for part in unquoted.split():
yield part
yield gen.next().join('""')

--Scott David Daniels
(e-mail address removed)

Paul McGuire

Oliver -

Here is a simpler approach, hopefully more readable, using pyparsing
(at http://pyparsing.sourceforge.net). I also added another test word
to your sample input line, one consisting of a lone pair of double
quotes, signifying an empty string. (Be sure to remove leading '.'s
from Python text - necessary to retain program indentation which Google
Groups otherwise collapses.)

-- Paul

..data = r"""
..(\HasNoChildren) "." "INBOX.Sent Items" ""
..from pyparsing import printables,Word,dblQuotedString,OneOrMore
..nonQuoteChars = "".join( [ c for c in printables if c not in '"'] )
..word = Word(nonQuoteChars) | dblQuotedString
..words = OneOrMore(word)
..for s in words.parseString(data):
.. print ">%s<" % s
"INBOX.Sent Items"<

But really, I'm guessing that you'd rather not have the quote
characters in there either. It's simple enough to have pyparsing
remove them when a dblQuotedString is found:

..# add a parse action to remove the double quote characters
..# one of the beauties of parse actions is that there is no need to
..# verify that the first and last characters are "'s - this function
..# is never called unless the tokens in tokenslist match the
..# required expression
..def removeDblQuotes(st,loc,tokenslist):
.. return tokenslist[0][1:-1]
..dblQuotedString.setParseAction( removeDblQuotes )
..for s in words.parseString(data):
.. print ">%s<" % s

