best way to do a series of regexp checks with groups

M

Mark Fanty

In perl, I might do (made up example just to illustrate the point):

if(/add (\d+) (\d+)/) {
do_add($1, $2);
} elsif (/mult (\d+) (\d+)/) {
do_mult($1,$2);
} elsif(/help (\w+)/) {
show_help($1);
}

or even

do_add($1,$2) if /add (\d+) (\d+)/;
do_mult($1,$2) if /mult (\d+) (\d+)/;
show_help($1) if /help (\w+)/;

How can I best do this in pyhon? Attempt 1:

m = re.search(r'add (\d+) (\d+)', $line)
if m:
do_add(m.group(1), m.group(2))
else:
m = re.search(r'mult (\d+) (\d+)', $line)
if m:
do_mult(m.group(1), m.group(2))
else:
m = re.search(r'help (\w+)', $line)
show_help(m.group(1))

The increasing nesting is a problem. I could put them in a while loop just
so I can use break

while 1:
m = re.search(r'add (\d+) (\d+)', $line)
if m:
do_add(m.group(1), m.group(2))
break
m = re.search(r'mult (\d+) (\d+)', $line)
if m:
do_mult(m.group(1), m.group(2))
break
m = re.search(r'help (\w+)', $line)
if m:
show_help(m.group(1))
break

No nesting, but the while is misleading since I'm not looping and this is a
bit awkward. I don't mind a few more key strokes, but I'd like clarity. I
wish I could do

if m = re.search(r'add (\d+) (\d+)', $line):
do_add(m.group(1), m.group(2))
elif m = re.search(r'mult (\d+) (\d+)', $line):
do_mult(m.group(1), m.group(2))
else m = re.search(r'help (\w+)', $line):
show_help(m.group(1))

Now that's what I'm looking for, but I can't put the assignment in an
expression. Any recommendations? Less "tricky" is better. Not having to
import some personal module with a def to help would be better (e.g. for
sharing)..

Thanks
 
N

Nick Craig-Wood

Mark Fanty said:
In perl, I might do (made up example just to illustrate the point):

if(/add (\d+) (\d+)/) {
do_add($1, $2);
} elsif (/mult (\d+) (\d+)/) {
do_mult($1,$2);
} elsif(/help (\w+)/) {
show_help($1);
}

There was a thread about this recently under the title

"regular expression: perl ==> python"

Here is a different solution...

class Result:
def set(self, value):
self.value = value
return value

m = Result()

if m.set(re.search(r'add (\d+) (\d+)', line)):
do_add(m.value.group(1), m.value.group(2))
elif m.set(re.search(r'mult (\d+) (\d+)', line)):
do_mult(m.value.group(1), m.value.group(2))
elif m.set(re.search(r'help (\w+)', line)):
show_help(m.value.group(1))
 
A

Alex Martelli

Nick Craig-Wood said:
Here is a different solution...

class Result:
def set(self, value):
self.value = value
return value

m = Result()

if m.set(re.search(r'add (\d+) (\d+)', line)):
do_add(m.value.group(1), m.value.group(2))
elif m.set(re.search(r'mult (\d+) (\d+)', line)):
do_mult(m.value.group(1), m.value.group(2))
elif m.set(re.search(r'help (\w+)', line)):
show_help(m.value.group(1))

This is roughly the same as my Cookbook recipe for test-and-set, but if
all you're using it for is RE search and MO access you might be better
off giving more responsibilities to your auxiliary class, such as:

class ReWithMemory(object):
def search(self, are, aline):
self.mo = re.search(are, aline)
return self.mo
def group(self, n):
return self.mo.group(n)

m = ReWithMemory()

if m.search(r'add (\d+) (\d+)', line):
do_add(m.group(1), m.group(2))
elif m.search(r'mult (\d+) (\d+)', line):
do_mult(m.group(1), m.group(2))
elif m.search(r'help (\w+)', line):
show_help(m.group(1))

Demeter's Law suggests that the 'm.value.group' accesses in your
approach are better handled by having m delegate to its `value'; and the
repeated m.set(re.search( ... seem to be a slight code smell, violating
"once and only once", which suggests merging into a single `set' method.
Your approach is more general, of course.


Alex
 
D

Duncan Booth

Mark said:
No nesting, but the while is misleading since I'm not looping and this
is a bit awkward. I don't mind a few more key strokes, but I'd like
clarity. I wish I could do

if m = re.search(r'add (\d+) (\d+)', $line):
do_add(m.group(1), m.group(2))
elif m = re.search(r'mult (\d+) (\d+)', $line):
do_mult(m.group(1), m.group(2))
else m = re.search(r'help (\w+)', $line):
show_help(m.group(1))

Now that's what I'm looking for, but I can't put the assignment in an
expression. Any recommendations? Less "tricky" is better.

Try thinking along the following lines. It is longer, but clearer and
easily extended to more commands. For more complete command processing use
the 'cmd' module.

import sys

class Command:
def do_add(self, a, b):
'''add <number> <number>'''
return int(a)+int(b)

def do_mult(self, a, b):
'''mult <number> <number>'''
return int(a)*int(b)

def do_help(self, *what):
'''help [words] - give some help'''
if not what:
what = sorted(s[3:] for s in dir(self) if s.startswith('do_'))
def error(): '''Unknown command'''
for w in what:
cmd = getattr(self, 'do_'+w, error)
print "Help for %r:\n%s\n" % (w, cmd.__doc__)

def do_exit(self):
'''exit - the program'''
sys.exit(0)

def __call__(self, line):
words = line.split()
if not words:
return
command = words.pop(0)
cmdfn = getattr(self, 'do_'+command, None)
if not cmdfn:
print "Unknown command %r. Use 'help' for help" % command
return

result = None
try:
result = cmdfn(*words)
except TypeError, msg:
print msg
if result is not None:
print "result is",result

cmd = Command()

while 1:
cmd(sys.stdin.readline())
 
S

Steven Bethard

Alex said:
class ReWithMemory(object):
def search(self, are, aline):
self.mo = re.search(are, aline)
return self.mo
def group(self, n):
return self.mo.group(n)

m = ReWithMemory()

if m.search(r'add (\d+) (\d+)', line):
do_add(m.group(1), m.group(2))
elif m.search(r'mult (\d+) (\d+)', line):
do_mult(m.group(1), m.group(2))
elif m.search(r'help (\w+)', line):
show_help(m.group(1))

Demeter's Law suggests that the 'm.value.group' accesses in your
approach are better handled by having m delegate to its `value'; and the
repeated m.set(re.search( ... seem to be a slight code smell, violating
"once and only once", which suggests merging into a single `set' method.
Your approach is more general, of course.

I get a bit uneasy from the repeated calls to m.group... If I was going
to build a class around the re, I think I might lean towards something like:

class ReWithMemory(object):
def search(self, are, aline):
self.mo = re.search(are, aline)
return self.mo
def groups(self, *indices):
return [self.mo.group(i) for i in indices]

m = ReWithMemory()

if m.search(r'add (\d+) (\d+)', line):
do_add(*m.groups(1, 2))
elif m.search(r'mult (\d+) (\d+)', line):
do_mult(*m.groups(1, 2))
elif m.search(r'help (\w+)', line):
show_help(*m.groups(1))

Of course, this is even less general-purpose than yours...

(And if I saw myself using this much regex code, I'd probably reconsider
my strategy anyway.) ;)

Steve
 
A

Alex Martelli

Steven Bethard said:
I get a bit uneasy from the repeated calls to m.group... If I was going
to build a class around the re, I think I might lean towards something like:

class ReWithMemory(object):
def search(self, are, aline):
self.mo = re.search(are, aline)
return self.mo
def groups(self, *indices):
return [self.mo.group(i) for i in indices]

m = ReWithMemory()

if m.search(r'add (\d+) (\d+)', line):
do_add(*m.groups(1, 2))
elif m.search(r'mult (\d+) (\d+)', line):
do_mult(*m.groups(1, 2))
elif m.search(r'help (\w+)', line):
show_help(*m.groups(1))

Of course, this is even less general-purpose than yours...

I'm not sure what advantage it's supposed to give. Would you have any
problems writing, say, somecall(X[1], X[2]) ...? Python normally relies
on indexing one thing at a time, and I see calling m.group(1) etc as
just the same kind of approach.

(And if I saw myself using this much regex code, I'd probably reconsider
my strategy anyway.) ;)

Surely joining all the regexp's into one big one with | would be faster
and more compact, but, with variable numbers of groups per sub-regexp,
determining which regexp matched can perhaps be tricky (issues with
matching mo.lastindex to the correct sub-regexp). So, I can understand
the desire to do it sequentially, regexp by regexp.


Alex
 
J

Jonathan Fine

Mark said:
In perl, I might do (made up example just to illustrate the point):

if(/add (\d+) (\d+)/) {
do_add($1, $2);
} elsif (/mult (\d+) (\d+)/) {
do_mult($1,$2);
} elsif(/help (\w+)/) {
show_help($1);
}

or even

do_add($1,$2) if /add (\d+) (\d+)/;
do_mult($1,$2) if /mult (\d+) (\d+)/;
show_help($1) if /help (\w+)/;


Here's some Python code (tested).

It is not as concise as the Perl code.
Which might or might not be a disadvantage.

Sometimes, regular expressions are not the right thing.
For example, a simple str.startswith() might be better.

What about "add 9999999999999999999999999 99999999999999999999999"?
Maybe we want to catch the error before we get to the do_add.
Can't easily do that with regular expressions.
And what about a variable number of arguments.

If regular expressions are no longer used, the Perl code seems
to loose some of its elegance.


I've been arguing for writing small, simple functions that do something.
This should make testing much easier.
These functions might happen to use regular expressions.


The code below is clearly more flexible.
It's easy, for example, to add a new command.
Just add an entry to dispatch.

The thing I like best about it is the passing of a dict.

===
#!/usr/bin/python

import re

# here we know about functions and patterns
def do_add(arg1, arg2): print "+ %s %s" % (arg1, arg2)
def do_times(arg1, arg2): print "* %s %s" % (arg1, arg2)

add_re = re.compile(r'add (?P<arg1>.*) (?P<arg2>.*)')
times_re = re.compile(r'times (?P<arg1>.*) (?P<arg2>.*)')

def find_add(str):
match = add_re.match(str)
if match is None:
return match
return match.groupdict()

def find_times(str):
match = times_re.match(str)
if match is None:
return match
return match.groupdict()


# here we bind everything together
dispatch = [
(find_add, do_add),
(find_times, do_times),
]

def doit(str):
for (find, do) in dispatch:
d = find(str)
if d is not None:
return do(**d)
return None # or error

if __name__ == '__main__':

doit('add this that')
doit('times this that')

===


Jonathan
 
M

Mark Fanty

This is the kind of thing I meant. I think I have to get used to writing
small, light-weight classes. You inspired this variation which is a little
more verbose in the class definition, but less so in the use:

class Matcher:
def search(self, r,s):
self.value = re.search(r,s)
return self.value
def __getitem__(self, i):
return self.value.group(i)

m = Matcher()

if m.search(r'add (\d+) (\d+)', line):
do_add(m[1], m[2])
elif m.search(r'mult (\d+) (\d+)', line):
do_mult(m[1], m[2])
elif m.search(r'help (\w+)', line):
show_help(m[1])

As for using regular expressions too much... they are why I've liked perl so
much for quick file processing for years. I don't like perl objects at all,
which is why I'm trying python, but the re package has not been my favorite
so far...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,832
Latest member
GlennSmall

Latest Threads

Top