how to handle repetitive regexp match checks

M

Matt Wette

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)

Matt
 
D

David M. Cooke

Matt Wette said:
Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

I usually define a class like this:

class Matcher:
def __init__(self, text):
self.m = None
self.text = text
def match(self, pat):
self.m = pat.match(self.text)
return self.m
def __getitem__(self, name):
return self.m.group(name)

Then, use it like

for line in fo:
m = Matcher(line)
if m.match(rx1):
do something
elif m.match(rx2):
do something
else:
error
 
D

Duncan Booth

Matt said:
I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)

My preferred way to do this is something like this:

import re

RX = re.compile(r'''
(?P<rx1> struct\s{ )|
(?P<rx2> typedef\sstruct\s{ )|
(?P<rx3> something\selse )
''', re.VERBOSE)

class Matcher:
def rx1(self, m):
print "rx1 matched", m.group(0)

def rx2(self, m):
print "rx2 matched", m.group(0)

def rx3(self, m):
print "rx3 matched", m.group(0)

def processLine(self, line):
m = RX.match(line)
if m:
getattr(self, m.lastgroup)(m)
else:
print "error",repr(line),"did not match"

matcher = Matcher()
matcher.processLine('struct { something')
matcher.processLine('typedef struct { something')
matcher.processLine('something else')
matcher.processLine('will not match')
 
G

GiddyJP

Matt said:
Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a
match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

I had a similar situation along with the requirement that the text to be
scanned was being read in chunks. After looking at the Python re module
and various other regex packages, I eventually wrote my own multiple
pattern scanning matcher.

However, since then I've discovered that the sre Python module has a
Scanner class that does something similar.

Anyway, you can see my code at:
http://users.cs.cf.ac.uk/J.P.Giddy/python/Trespass/2.0.0/

Using it, your code could look like:

# do this once
import Trespass
pattern = Trespass.Pattern()
pattern.addRegExp(r'struct {', 1)
pattern.addRegExp(r'typedef struct {', 2)
pattern.addRegExp(r'something else', 3)

# do this for each line
match = pattern.match(line)
if match:
value = match.value()
if value == 1:
# struct
do something
elif value == 2:
# typedef
do something
elif value == 3:
# something else
do something
else:
error
 
J

Jonathan Giddy

GiddyJP said:
# do this once
import Trespass
pattern = Trespass.Pattern()
pattern.addRegExp(r'struct {', 1)
pattern.addRegExp(r'typedef struct {', 2)
pattern.addRegExp(r'something else', 3)

Minor correction... in this module { always needs to be escaped if not
indicating a bounded repeat:
pattern.addRegExp(r'struct \{', 1)
pattern.addRegExp(r'typedef struct \{', 2)
pattern.addRegExp(r'something else', 3)
 
P

Paul McGuire

Matt -

Pyparsing may be of interest to you. One of its core features is the
ability to associate an action method with a parsing pattern. During
parsing, the action is called with the original source string, the
location within the string of the match, and the matched tokens.

Your code would look something like :

lbrace = Literal('{')
typedef = Literal('typedef')
struct = Literal('struct')
rx1 = struct + lbrace
rx2 = typedef + struct + lbrace
rx3 = Literal('something') + Literal('else')

def rx1Action(strg, loc, tokens):
.... put stuff to do here...

rx1.setParseAction( rx1Action )
rx2.setParseAction( rx2Action )
rx3.setParseAction( rx3Action )

# read code into Python string variable 'code'
patterns = (rx1 | rx2 | rx3)
patterns.scanString( code )

(I've broken up some of your literals, which allows for intervening
variable whitespace - that is Literal('struct') +Literal('{') will
accommodate one, two, or more blanks (even line breaks) between the
'struct' and the '{'.)

Get pyparsing at http://pyparsing.sourceforge.net.

-- Paul
 
J

Jeff Shannon

Matt said:
Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a
match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

If you don't need the match object as part of "do something", you
could do a fairly literal translation of the Perl:

if rx1.match(line):
do something
elif rx2.match(line):
do something else
elif rx3.match(line):
do other thing
else:
raise ValueError("...")

Alternatively, if each of the "do something" phrases can be easily
reduced to a function call, then you could do something like:

def do_something(line, match): ...
def do_something_else(line, match): ...
def do_other_thing(line, match): ...

table = [ (re.compile(r'struct {'), do_something),
(re.compile(r'typedef struct {'), do_something_else),
(re.compile(r'something else'), do_other_thing) ]

for pattern, func in table:
m = pattern.match(line)
if m:
func(line, m)
break
else:
raise ValueError("...")

The for/else pattern may look a bit odd, but the key feature here is
that the else clause only runs if the for loop terminates normally --
if you break out of the loop, the else does *not* run.

Jeff Shannon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top