Discussion in 'Python' started by hubritic, May 19, 2009.

  1. hubritic

    hubritic Guest

    I want to parse a log that has entries like this:

    [2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail
    cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0
    size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid=
    qid=n2HCS4ks025832 subject="I want to interview you" duration=0.236

    the keywords will not always be the same. Also differing log levels
    will provide a different mix of keywords.

    This is good enough to get the majority of cases where there is a
    keyword, a "=" and then a value with no spaces:

    Group(Word(alphas + "+_-.").setResultsName("keyword") + Suppress
    (Literal ("=")) + Optional(Word(printables)))

    Sometimes there is a subject, which is a quoted string. That is easy
    enough to get with this:
    dblQuotedString(ZeroOrMore(Word(printables) ) )

    My problem is combining them into one expression. Either I wind up
    with just the subject or I wind up with they keywords and their
    values, one of which is:

    subject, '"I'

    which is clearly not what I want.

    Do I scan each line twice, first looking for quotes ?

    hubritic, May 19, 2009
  2. Use the MatchFirst (|)

    I have also split it up to make it more readable

    kw = Word(alphas + "+_-.").setResultsName("keyword")
    eq = Suppress(Literal ("="))
    value = dblQuotedString | Optional(Word(printables))

    pattern = Group(kw + eq + value)
    Piet van Oostrum, May 27, 2009
