split a string with quoted parts into list

Discussion in 'Python' started by oliver, Mar 10, 2005.

  1. oliver

    oliver Guest

    hi there

    i'm experimanting with imaplib and came across stringts like
    (\HasNoChildren) "." "INBOX.Sent Items"
    in which the quotes are part of the string.

    now i try to convert this into a list. assume the string is in the variable
    f, then i tried
    f.split()
    but i end up with
    ['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
    so due to the sapce in "Sent Items" its is sepearted in two entries, what i
    don't want.

    is there another way to convert a string with quoted sub entries into a list
    of strings?

    thanks a lot, olli
     
    oliver, Mar 10, 2005
    #1
    1. Advertising

  2. oliver wrote:
    > i'm experimanting with imaplib and came across stringts like
    > (\HasNoChildren) "." "INBOX.Sent Items"
    > in which the quotes are part of the string.
    >
    > now i try to convert this into a list. assume the string is in the variable
    > f, then i tried
    > f.split()
    > but i end up with
    > ['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
    > so due to the sapce in "Sent Items" its is sepearted in two entries, what i
    > don't want.
    >
    > is there another way to convert a string with quoted sub entries into a list
    > of strings?


    Try the standard module shlex
    (http://www.python.org/dev/doc/devel/lib/module-shlex.html). It might be
    that the quoting rules are not exactly the ones you need, though.

    Daniel
     
    Daniel Dittmar, Mar 10, 2005
    #2
    1. Advertising

  3. > is there another way to convert a string with quoted sub entries into a
    > list of strings?


    try the csv-module.
    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Mar 10, 2005
    #3
  4. oliver

    Max M Guest

    oliver wrote:
    > hi there
    >
    > i'm experimanting with imaplib and came across stringts like
    > (\HasNoChildren) "." "INBOX.Sent Items"
    > in which the quotes are part of the string.
    >
    > now i try to convert this into a list. assume the string is in the variable
    > f, then i tried
    > f.split()
    > but i end up with
    > ['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
    > so due to the sapce in "Sent Items" its is sepearted in two entries, what i
    > don't want.
    >
    > is there another way to convert a string with quoted sub entries into a list
    > of strings?



    In Twisteds protocols/imap4.py module there is a function called
    parseNestedParens() that can be ripped out of the module.

    I have used it for another project and put it into this attachment.

    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science

    """
    This code was stolen from Twisteds protocols/imap4.py module
    """

    import types, string

    class IMAP4Exception(Exception):
    def __init__(self, *args):
    Exception.__init__(self, *args)

    class MismatchedNesting(IMAP4Exception):
    pass

    class MismatchedQuoting(IMAP4Exception):
    pass

    def wildcardToRegexp(wildcard, delim=None):
    wildcard = wildcard.replace('*', '(?:.*?)')
    if delim is None:
    wildcard = wildcard.replace('%', '(?:.*?)')
    else:
    wildcard = wildcard.replace('%', '(?:(?:[^%s])*?)' % re.escape(delim))
    return re.compile(wildcard, re.I)

    def splitQuoted(s):
    """Split a string into whitespace delimited tokens

    Tokens that would otherwise be separated but are surrounded by \"
    remain as a single token. Any token that is not quoted and is
    equal to \"NIL\" is tokenized as C{None}.

    @type s: C{str}
    @param s: The string to be split

    @rtype: C{list} of C{str}
    @return: A list of the resulting tokens

    @raise MismatchedQuoting: Raised if an odd number of quotes are present
    """
    s = s.strip()
    result = []
    inQuote = inWord = start = 0
    for (i, c) in zip(range(len(s)), s):
    if c == '"' and not inQuote:
    inQuote = 1
    start = i + 1
    elif c == '"' and inQuote:
    inQuote = 0
    result.append(s[start:i])
    start = i + 1
    elif not inWord and not inQuote and c not in ('"' + string.whitespace):
    inWord = 1
    start = i
    elif inWord and not inQuote and c in string.whitespace:
    if s[start:i] == 'NIL':
    result.append(None)
    else:
    result.append(s[start:i])
    start = i
    inWord = 0
    if inQuote:
    raise MismatchedQuoting(s)
    if inWord:
    if s[start:] == 'NIL':
    result.append(None)
    else:
    result.append(s[start:])
    return result


    def splitOn(sequence, predicate, transformers):
    result = []
    mode = predicate(sequence[0])
    tmp = [sequence[0]]
    for e in sequence[1:]:
    p = predicate(e)
    if p != mode:
    result.extend(transformers[mode](tmp))
    tmp = [e]
    mode = p
    else:
    tmp.append(e)
    result.extend(transformers[mode](tmp))
    return result


    def collapseStrings(results):
    """
    Turns a list of length-one strings and lists into a list of longer
    strings and lists. For example,

    ['a', 'b', ['c', 'd']] is returned as ['ab', ['cd']]

    @type results: C{list} of C{str} and C{list}
    @param results: The list to be collapsed

    @rtype: C{list} of C{str} and C{list}
    @return: A new list which is the collapsed form of C{results}
    """
    copy = []
    begun = None
    listsList = [isinstance(s, types.ListType) for s in results]

    pred = lambda e: isinstance(e, types.TupleType)
    tran = {
    0: lambda e: splitQuoted(''.join(e)),
    1: lambda e: [''.join([i[0] for i in e])]
    }
    for (i, c, isList) in zip(range(len(results)), results, listsList):
    if isList:
    if begun is not None:
    copy.extend(splitOn(results[begun:i], pred, tran))
    begun = None
    copy.append(collapseStrings(c))
    elif begun is None:
    begun = i
    if begun is not None:
    copy.extend(splitOn(results[begun:], pred, tran))
    return copy




    def parseNestedParens(s, handleLiteral = 1):
    """Parse an s-exp-like string into a more useful data structure.

    @type s: C{str}
    @param s: The s-exp-like string to parse

    @rtype: C{list} of C{str} and C{list}
    @return: A list containing the tokens present in the input.

    @raise MismatchedNesting: Raised if the number or placement
    of opening or closing parenthesis is invalid.
    """
    s = s.strip()
    inQuote = 0
    contentStack = [[]]
    try:
    i = 0
    L = len(s)
    while i < L:
    c = s
    if inQuote:
    if c == '\\':
    contentStack[-1].append(s[i+1])
    i += 2
    continue
    elif c == '"':
    inQuote = not inQuote
    contentStack[-1].append(c)
    i += 1
    else:
    if c == '"':
    contentStack[-1].append(c)
    inQuote = not inQuote
    i += 1
    elif handleLiteral and c == '{':
    end = s.find('}', i)
    if end == -1:
    raise ValueError, "Malformed literal"
    literalSize = int(s[i+1:end])
    contentStack[-1].append((s[end+3:end+3+literalSize],))
    i = end + 3 + literalSize
    elif c == '(' or c == '[':
    contentStack.append([])
    i += 1
    elif c == ')' or c == ']':
    contentStack[-2].append(contentStack.pop())
    i += 1
    else:
    contentStack[-1].append(c)
    i += 1
    except IndexError:
    raise MismatchedNesting(s)
    if len(contentStack) != 1:
    raise MismatchedNesting(s)
    return collapseStrings(contentStack[0])


    if __name__=='__main__':

    r = '(\Noinferiors \Unmarked) "/" "INBOX"(\Unmarked) "/" "test"(\Noinferiors \Unmarked) "/" "Sent Items"(\Noinferiors \Unmarked) "/" "Calendar"(\Noinferiors \Unmarked) "/" "Checklist"(\Unmarked) "/" "Cabinet"(\Noinferiors \Marked) "/" "Trash"(\Unmarked) "/" "INBOX.Sent"(\Unmarked) "/" "Sent"'

    parsedParens = parseNestedParens(r)
    print parsedParens
    for i in range(0, len(parsedParens), 3):
    (flags, seperator, folderName) = parsedParens[i:i+3]
    print flags
    print seperator
    print folderName
     
    Max M, Mar 10, 2005
    #4
  5. oliver wrote:
    > i'm experimanting with imaplib and came across stringts like
    > (\HasNoChildren) "." "INBOX.Sent Items"
    > in which the quotes are part of the string.
    >
    > now i try to convert this into a list. assume the string is in the variable
    > f, then i tried
    > f.split()
    > but i end up with
    > ['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
    > so due to the sapce in "Sent Items" its is sepearted in two entries, what i
    > don't want.


    > is there another way to convert a string with quoted sub entries into a list
    > of strings?


    First break into strings, then space-split the non-strings.

    def splitup(somestring):
    gen = iter(somestring.split('"'))
    for unquoted in gen:
    for part in unquoted.split():
    yield part
    yield gen.next().join('""')

    --Scott David Daniels
     
    Scott David Daniels, Mar 10, 2005
    #5
  6. oliver

    Paul McGuire Guest

    Oliver -

    Here is a simpler approach, hopefully more readable, using pyparsing
    (at http://pyparsing.sourceforge.net). I also added another test word
    to your sample input line, one consisting of a lone pair of double
    quotes, signifying an empty string. (Be sure to remove leading '.'s
    from Python text - necessary to retain program indentation which Google
    Groups otherwise collapses.)

    -- Paul


    ..data = r"""
    ..(\HasNoChildren) "." "INBOX.Sent Items" ""
    .."""
    ..
    ..from pyparsing import printables,Word,dblQuotedString,OneOrMore
    ..
    ..nonQuoteChars = "".join( [ c for c in printables if c not in '"'] )
    ..word = Word(nonQuoteChars) | dblQuotedString
    ..
    ..words = OneOrMore(word)
    ..
    ..for s in words.parseString(data):
    .. print ">%s<" % s
    ..
    Gives:

    >(\HasNoChildren)<
    >"."<
    >"INBOX.Sent Items"<
    >""<


    But really, I'm guessing that you'd rather not have the quote
    characters in there either. It's simple enough to have pyparsing
    remove them when a dblQuotedString is found:

    ..# add a parse action to remove the double quote characters
    ..# one of the beauties of parse actions is that there is no need to
    ..# verify that the first and last characters are "'s - this function
    ..# is never called unless the tokens in tokenslist match the
    ..# required expression
    ..def removeDblQuotes(st,loc,tokenslist):
    .. return tokenslist[0][1:-1]
    ..dblQuotedString.setParseAction( removeDblQuotes )
    ..
    ..for s in words.parseString(data):
    .. print ">%s<" % s
    ..
    Gives:
    >(\HasNoChildren)<
    >.<
    >INBOX.Sent Items<
    ><
     
    Paul McGuire, Mar 11, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Oschler
    Replies:
    2
    Views:
    123
    peterS.
    Aug 2, 2005
  2. Roy Smith
    Replies:
    35
    Views:
    340
    Oscar Benjamin
    Jun 13, 2013
  3. Replies:
    0
    Views:
    107
  4. Replies:
    1
    Views:
    110
    Roy Smith
    Jul 13, 2013
  5. Roy Smith
    Replies:
    0
    Views:
    111
    Roy Smith
    Jul 13, 2013
Loading...

Share This Page