how to handle repetitive regexp match checks

Discussion in 'Python' started by Matt Wette, Mar 18, 2005.

  1. Matt Wette

    Matt Wette Guest

    Over the last few years I have converted from Perl and Scheme to
    Python. There one task that I do often that is really slick in Perl
    but escapes me in Python. I read in a text line from a file and check
    it against several regular expressions and do something once I find a match.
    For example, in perl ...

    if ($line =~ /struct {/) {
    do something
    } elsif ($line =~ /typedef struct {/) {
    do something else
    } elsif ($line =~ /something else/) {
    } ...

    I am having difficulty doing this cleanly in python. Can anyone help?

    rx1 = re.compile(r'struct {')
    rx2 = re.compile(r'typedef struct {')
    rx3 = re.compile(r'something else')

    m = rx1.match(line)
    if m:
    do something
    else:
    m = rx2.match(line)
    if m:
    do something
    else:
    m = rx3.match(line)
    if m:
    do something
    else:
    error

    (In Scheme I was able to do this cleanly with macros.)

    Matt
     
    Matt Wette, Mar 18, 2005
    #1
    1. Advertising

  2. Matt Wette <> writes:

    > Over the last few years I have converted from Perl and Scheme to
    > Python. There one task that I do often that is really slick in Perl
    > but escapes me in Python. I read in a text line from a file and check
    > it against several regular expressions and do something once I find a match.
    > For example, in perl ...
    >
    > if ($line =~ /struct {/) {
    > do something
    > } elsif ($line =~ /typedef struct {/) {
    > do something else
    > } elsif ($line =~ /something else/) {
    > } ...
    >
    > I am having difficulty doing this cleanly in python. Can anyone help?
    >
    > rx1 = re.compile(r'struct {')
    > rx2 = re.compile(r'typedef struct {')
    > rx3 = re.compile(r'something else')
    >
    > m = rx1.match(line)
    > if m:
    > do something
    > else:
    > m = rx2.match(line)
    > if m:
    > do something
    > else:
    > m = rx3.match(line)
    > if m:
    > do something
    > else:
    > error


    I usually define a class like this:

    class Matcher:
    def __init__(self, text):
    self.m = None
    self.text = text
    def match(self, pat):
    self.m = pat.match(self.text)
    return self.m
    def __getitem__(self, name):
    return self.m.group(name)

    Then, use it like

    for line in fo:
    m = Matcher(line)
    if m.match(rx1):
    do something
    elif m.match(rx2):
    do something
    else:
    error

    --
    |>|\/|<
    David M. Cooke
    cookedm(at)physics(dot)mcmaster(dot)ca
     
    David M. Cooke, Mar 18, 2005
    #2
    1. Advertising

  3. Matt Wette

    Duncan Booth Guest

    Matt Wette wrote:

    > I am having difficulty doing this cleanly in python. Can anyone help?
    >
    > rx1 = re.compile(r'struct {')
    > rx2 = re.compile(r'typedef struct {')
    > rx3 = re.compile(r'something else')
    >
    > m = rx1.match(line)
    > if m:
    > do something
    > else:
    > m = rx2.match(line)
    > if m:
    > do something
    > else:
    > m = rx3.match(line)
    > if m:
    > do something
    > else:
    > error
    >
    > (In Scheme I was able to do this cleanly with macros.)


    My preferred way to do this is something like this:

    import re

    RX = re.compile(r'''
    (?P<rx1> struct\s{ )|
    (?P<rx2> typedef\sstruct\s{ )|
    (?P<rx3> something\selse )
    ''', re.VERBOSE)

    class Matcher:
    def rx1(self, m):
    print "rx1 matched", m.group(0)

    def rx2(self, m):
    print "rx2 matched", m.group(0)

    def rx3(self, m):
    print "rx3 matched", m.group(0)

    def processLine(self, line):
    m = RX.match(line)
    if m:
    getattr(self, m.lastgroup)(m)
    else:
    print "error",repr(line),"did not match"

    matcher = Matcher()
    matcher.processLine('struct { something')
    matcher.processLine('typedef struct { something')
    matcher.processLine('something else')
    matcher.processLine('will not match')
     
    Duncan Booth, Mar 18, 2005
    #3
  4. Matt Wette

    GiddyJP Guest

    Matt Wette wrote:
    >
    > Over the last few years I have converted from Perl and Scheme to
    > Python. There one task that I do often that is really slick in Perl
    > but escapes me in Python. I read in a text line from a file and check
    > it against several regular expressions and do something once I find a
    > match.
    > For example, in perl ...
    >
    > if ($line =~ /struct {/) {
    > do something
    > } elsif ($line =~ /typedef struct {/) {
    > do something else
    > } elsif ($line =~ /something else/) {
    > } ...
    >
    > I am having difficulty doing this cleanly in python. Can anyone help?


    I had a similar situation along with the requirement that the text to be
    scanned was being read in chunks. After looking at the Python re module
    and various other regex packages, I eventually wrote my own multiple
    pattern scanning matcher.

    However, since then I've discovered that the sre Python module has a
    Scanner class that does something similar.

    Anyway, you can see my code at:
    http://users.cs.cf.ac.uk/J.P.Giddy/python/Trespass/2.0.0/

    Using it, your code could look like:

    # do this once
    import Trespass
    pattern = Trespass.Pattern()
    pattern.addRegExp(r'struct {', 1)
    pattern.addRegExp(r'typedef struct {', 2)
    pattern.addRegExp(r'something else', 3)

    # do this for each line
    match = pattern.match(line)
    if match:
    value = match.value()
    if value == 1:
    # struct
    do something
    elif value == 2:
    # typedef
    do something
    elif value == 3:
    # something else
    do something
    else:
    error
     
    GiddyJP, Mar 18, 2005
    #4
  5. GiddyJP wrote:
    >
    > # do this once
    > import Trespass
    > pattern = Trespass.Pattern()
    > pattern.addRegExp(r'struct {', 1)
    > pattern.addRegExp(r'typedef struct {', 2)
    > pattern.addRegExp(r'something else', 3)


    Minor correction... in this module { always needs to be escaped if not
    indicating a bounded repeat:
    pattern.addRegExp(r'struct \{', 1)
    pattern.addRegExp(r'typedef struct \{', 2)
    pattern.addRegExp(r'something else', 3)
     
    Jonathan Giddy, Mar 18, 2005
    #5
  6. Matt Wette

    Paul McGuire Guest

    Matt -

    Pyparsing may be of interest to you. One of its core features is the
    ability to associate an action method with a parsing pattern. During
    parsing, the action is called with the original source string, the
    location within the string of the match, and the matched tokens.

    Your code would look something like :

    lbrace = Literal('{')
    typedef = Literal('typedef')
    struct = Literal('struct')
    rx1 = struct + lbrace
    rx2 = typedef + struct + lbrace
    rx3 = Literal('something') + Literal('else')

    def rx1Action(strg, loc, tokens):
    .... put stuff to do here...

    rx1.setParseAction( rx1Action )
    rx2.setParseAction( rx2Action )
    rx3.setParseAction( rx3Action )

    # read code into Python string variable 'code'
    patterns = (rx1 | rx2 | rx3)
    patterns.scanString( code )

    (I've broken up some of your literals, which allows for intervening
    variable whitespace - that is Literal('struct') +Literal('{') will
    accommodate one, two, or more blanks (even line breaks) between the
    'struct' and the '{'.)

    Get pyparsing at http://pyparsing.sourceforge.net.

    -- Paul
     
    Paul McGuire, Mar 18, 2005
    #6
  7. Matt Wette

    Jeff Shannon Guest

    Matt Wette wrote:

    >
    > Over the last few years I have converted from Perl and Scheme to
    > Python. There one task that I do often that is really slick in Perl
    > but escapes me in Python. I read in a text line from a file and check
    > it against several regular expressions and do something once I find a
    > match.
    > For example, in perl ...
    >
    > if ($line =~ /struct {/) {
    > do something
    > } elsif ($line =~ /typedef struct {/) {
    > do something else
    > } elsif ($line =~ /something else/) {
    > } ...
    >
    > I am having difficulty doing this cleanly in python. Can anyone help?
    >
    > rx1 = re.compile(r'struct {')
    > rx2 = re.compile(r'typedef struct {')
    > rx3 = re.compile(r'something else')
    >
    > m = rx1.match(line)
    > if m:
    > do something
    > else:
    > m = rx2.match(line)
    > if m:
    > do something
    > else:
    > m = rx3.match(line)
    > if m:
    > do something
    > else:
    > error


    If you don't need the match object as part of "do something", you
    could do a fairly literal translation of the Perl:

    if rx1.match(line):
    do something
    elif rx2.match(line):
    do something else
    elif rx3.match(line):
    do other thing
    else:
    raise ValueError("...")

    Alternatively, if each of the "do something" phrases can be easily
    reduced to a function call, then you could do something like:

    def do_something(line, match): ...
    def do_something_else(line, match): ...
    def do_other_thing(line, match): ...

    table = [ (re.compile(r'struct {'), do_something),
    (re.compile(r'typedef struct {'), do_something_else),
    (re.compile(r'something else'), do_other_thing) ]

    for pattern, func in table:
    m = pattern.match(line)
    if m:
    func(line, m)
    break
    else:
    raise ValueError("...")

    The for/else pattern may look a bit odd, but the key feature here is
    that the else clause only runs if the for loop terminates normally --
    if you break out of the loop, the else does *not* run.

    Jeff Shannon
     
    Jeff Shannon, Mar 18, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Fanty
    Replies:
    8
    Views:
    349
    Mark Fanty
    Jan 25, 2005
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    508
    Mikel Lindsaar
    Mar 31, 2008
  3. Old Echo
    Replies:
    1
    Views:
    187
    Adam Shelly
    Sep 4, 2008
  4. Joao Silva
    Replies:
    16
    Views:
    379
    7stud --
    Aug 21, 2009
  5. Uldis  Bojars
    Replies:
    2
    Views:
    196
    Janwillem Borleffs
    Dec 17, 2006
Loading...

Share This Page