Doing both regex match and assignment within a If loop?

Discussion in 'Python' started by Victor Hooi, Mar 29, 2013.

  1. Victor Hooi

    Victor Hooi Guest

    Hi,

    I have logline that I need to test against multiple regexes. E.g.:

    import re

    expression1 = re.compile(r'....')
    expression2 = re.compile(r'....')

    with open('log.txt') as f:
    for line in f:
    if expression1.match(line):
    # Do something - extract fields from line.
    elif expression2.match(line):
    # Do something else - extract fields from line.
    else:
    # Oh noes! Raise exception.

    However, in the "Do something" section - I need access to the match object itself, so that I can strip out certain fields from the line.

    Is it possible to somehow test for a match, as well as do assignment of the re match object to a variable?

    if expression1.match(line) = results:
    results.groupsdict()...

    Obviously the above won't work - however, is there a Pythonic way to tackle this?

    What I'm trying to avoid is this:

    if expression1.match(line):
    results = expression1.match(line)

    which I assume would call the regex match against the line twice - and when I'm dealing with a huge amount of log lines, slow things down.

    Cheers,
    Victor
    Victor Hooi, Mar 29, 2013
    #1
    1. Advertising

  2. Victor Hooi

    Chris Rebert Guest

    On Thu, Mar 28, 2013 at 9:00 PM, Victor Hooi <> wrote:
    > Hi,
    >
    > I have logline that I need to test against multiple regexes. E.g.:
    >
    > import re
    >
    > expression1 = re.compile(r'....')
    > expression2 = re.compile(r'....')
    >
    > with open('log.txt') as f:
    > for line in f:
    > if expression1.match(line):
    > # Do something - extract fields from line.
    > elif expression2.match(line):
    > # Do something else - extract fields from line.
    > else:
    > # Oh noes! Raise exception.
    >
    > However, in the "Do something" section - I need access to the match object itself, so that I can strip out certain fields from the line.
    >
    > Is it possible to somehow test for a match, as well as do assignment of the re match object to a variable?
    >
    > if expression1.match(line) = results:
    > results.groupsdict()...


    AFAIK, not without hacks and/or being unidiomatic.

    > Obviously the above won't work - however, is there a Pythonic way to tackle this?
    >
    > What I'm trying to avoid is this:
    >
    > if expression1.match(line):
    > results = expression1.match(line)
    >
    > which I assume would call the regex match against the line twice - and when I'm dealing with a huge amount of log lines, slow things down.


    def process(line):
    match = expr1.match(line)
    if match:
    # ...extract fields…
    return something
    match = expr2.match(line)
    if match:
    # ...extract fields…
    return something
    # etc…
    raise SomeError() # Oh noes!

    with open('log.txt') as f:
    for line in f:
    results = process(line)


    If you choose to further move the extractor snippets into their own
    functions, then you can do:


    # these could be lambdas if they're simple enough
    def case1(match):
    # ...
    def case2(match):
    # …
    # etc...

    REGEX_EXTRACTOR_PAIRS = [
    (re.compile(r'....'), case1),
    (re.compile(r'....'), case2),
    # etc...
    ]

    def process(line):
    for regex, extractor in REGEX_EXTRACTOR_PAIRS:
    match = regex.match(line)
    if match:
    return extractor(match)
    raise SomeError()

    Although this second option is likely somewhat less performant, but it
    definitely saves on repetition.

    Cheers,
    Chris
    Chris Rebert, Mar 29, 2013
    #2
    1. Advertising

  3. On Thu, 28 Mar 2013 21:00:44 -0700, Victor Hooi wrote:


    > Is it possible to somehow test for a match, as well as do assignment of
    > the re match object to a variable?



    mo = expression.match(line)
    if mo:
    ...


    Many problems become trivial when we stop trying to fit everything into a
    single line :)


    > if expression1.match(line) = results:
    > results.groupsdict()...
    >
    > Obviously the above won't work - however, is there a Pythonic way to
    > tackle this?


    Yes. Stop trying to fit everything into a single line :)

    I would approach the problem like this:


    LOOKUP_TABLE = {expression1: do_something,
    expression2: do_something_else,
    expression3: function3,
    expression4: function4, # etc.
    }

    with open('log.txt') as f:
    for line in f:
    for expr, func in LOOKUP_TABLE.items():
    mo = expr.match(line)
    if mo:
    func(line, mo)
    break
    else:
    # If we get here, we never reached the break.
    raise SomeException


    If you don't like having that many top level functions, you could make
    them methods of a class.


    If you only have two or three expressions to test, and the body of each
    if clause is small, it's probably too much effort to write functions for
    each one. In that case, I'd stick to the slightly more verbose form:

    with open('log.txt') as f:
    for line in f:
    mo = expression1.match(line)
    if mo:
    do_this()
    do_that()
    mo = expression2.match(line)
    if mo:
    do_something_else()
    mo = expression3.match(line)
    if mo:
    fe()
    fi()
    fo()
    fum()
    else:
    raise SomeException





    > What I'm trying to avoid is this:
    >
    > if expression1.match(line):
    > results = expression1.match(line)
    >
    > which I assume would call the regex match against the line twice


    Correct.



    --
    Steven
    Steven D'Aprano, Mar 29, 2013
    #3
  4. Victor Hooi

    Peter Otten Guest

    Victor Hooi wrote:

    > Hi,
    >
    > I have logline that I need to test against multiple regexes. E.g.:
    >
    > import re
    >
    > expression1 = re.compile(r'....')
    > expression2 = re.compile(r'....')
    >
    > with open('log.txt') as f:
    > for line in f:
    > if expression1.match(line):
    > # Do something - extract fields from line.
    > elif expression2.match(line):
    > # Do something else - extract fields from line.
    > else:
    > # Oh noes! Raise exception.
    >
    > However, in the "Do something" section - I need access to the match object
    > itself, so that I can strip out certain fields from the line.
    >
    > Is it possible to somehow test for a match, as well as do assignment of
    > the re match object to a variable?
    >
    > if expression1.match(line) = results:
    > results.groupsdict()...
    >
    > Obviously the above won't work - however, is there a Pythonic way to
    > tackle this?
    >
    > What I'm trying to avoid is this:
    >
    > if expression1.match(line):
    > results = expression1.match(line)
    >
    > which I assume would call the regex match against the line twice - and
    > when I'm dealing with a huge amount of log lines, slow things down.


    (1)
    for line in f:
    match = expression1.match(line)
    if match:
    # ...
    continue
    match = expression2.match(line)
    if match:
    # ...
    continue
    raise NothingMatches

    (2)
    import re

    class Matcher:
    def __call__(self, expr, line):
    result = self.match = expr.match(line)
    return result
    def __getattr__(self, name):
    return getattr(self.match, name)

    match = Matcher()

    for line in f:
    if match(expression1, line):
    print(match.groupdict())
    elif match(expression2, line):
    print(match.group(1))
    else:
    raise NothingMatches
    Peter Otten, Mar 29, 2013
    #4
  5. Victor Hooi <> writes:

    > expression1 = re.compile(r'....')
    > expression2 = re.compile(r'....')

    [...]

    Just a quick remark: regular expressions are pretty powerful at
    representing alternatives. You could just stick everything inside a
    single re, as in '...|...'

    Then use the returned match to check which alternative was recognized
    (make sure you have at least one group in each alternative).

    > Is it possible to somehow test for a match, as well as do assignment
    > of the re match object to a variable?


    Yes, use '...(...)...' and MatchObject.group(). See the other messages.

    -- Alain.
    Alain Ketterlin, Mar 29, 2013
    #5
  6. On Friday, 29 March 2013, Alain Ketterlin wrote:

    > Victor Hooi < <javascript:;>> writes:
    >
    > > expression1 = re.compile(r'....')
    > > expression2 = re.compile(r'....')

    > [...]
    >
    > Just a quick remark: regular expressions are pretty powerful at
    > representing alternatives. You could just stick everything inside a
    > single re, as in '...|...'
    >
    >

    Then use the returned match to check which alternative was recognized
    > (make sure you have at least one group in each alternative).
    >
    >

    Yes, and for extra ease/clarity you can name these alternatives (
    '(?P<name>pattern)'). Then you can do

    if m.group('case1'):
    ...
    elif m.group('case2'):
    ...

    --
    Arnaud
    Arnaud Delobelle, Mar 29, 2013
    #6
  7. Victor Hooi

    Neil Cerutti Guest

    On 2013-03-29, Alain Ketterlin <-strasbg.fr> wrote:
    > Victor Hooi <> writes:
    >
    >> expression1 = re.compile(r'....')
    >> expression2 = re.compile(r'....')

    > [...]
    >
    > Just a quick remark: regular expressions are pretty powerful at
    > representing alternatives. You could just stick everything
    > inside a single re, as in '...|...'
    >
    > Then use the returned match to check which alternative was
    > recognized (make sure you have at least one group in each
    > alternative).


    Yes, but in a Python program it's more straightforward to program
    in Python. ;)

    But this is from a grade A regex avoider, so take it with a small
    chunk of sodium.

    >> Is it possible to somehow test for a match, as well as do assignment
    >> of the re match object to a variable?


    One way to attack this problem that's not yet been explicitly
    mentioned is to match using a generator function:

    def match_each(s, re_seq):
    for r in re_seq:
    yield r.match(s)

    And later something like:

    for match in match_each(s, (expression1, expression2, expression3)):
    if match:
    print(match.groups()) # etc...

    --
    Neil Cerutti
    Neil Cerutti, Mar 29, 2013
    #7
  8. On 03/29/2013 04:27 AM, Peter Otten wrote:
    > (2)
    > import re
    >
    > class Matcher:
    > def __call__(self, expr, line):
    > result = self.match = expr.match(line)
    > return result
    > def __getattr__(self, name):
    > return getattr(self.match, name)



    Perhaps it's a little simpler to do this?

    > self.match = expr.match(line)
    > return self.match



    -m


    --
    Lark's Tongue Guide to Python: http://lightbird.net/larks/

    Frisbeetarianism is the belief that when you die, your soul goes up on
    the roof and gets stuck. George Carlin
    Mitya Sirenef, Mar 29, 2013
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    627
  2. hokieghal99

    variable assignment within a loop

    hokieghal99, Nov 3, 2003, in forum: Python
    Replies:
    4
    Views:
    530
    Alex Martelli
    Nov 4, 2003
  3. ABCL
    Replies:
    0
    Views:
    522
  4. Neil Shadrach

    Detecting last match from within loop

    Neil Shadrach, Jan 23, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    81
    Neil Shadrach
    Jan 23, 2004
  5. Isaac Won
    Replies:
    9
    Views:
    348
    Ulrich Eckhardt
    Mar 4, 2013
Loading...

Share This Page