Multiple regex match idiom

H

Hrvoje Niksic

I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
.....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?
 
C

Charles Sanders

Hrvoje said:
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
.... [snip]

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?

Only just learning Python, but to me this seems better.
Completely untested.

re_list = [ re1, re2, re3, ... ]
for re in re_list:
matchob = re.match(line)
if matchob:
....
break

Of course this only works it the "do something" is the same
for all matches. If not, maybe a function for each case,
something like

re1 = re.compile(....)
def fn1( s, m ):
....
re2 = ....
def fn2( s, m ):
....

re_list = [ (re1, fn1), (re2, fn2), ... ]

for (r,f) in re_list:
matchob = r.match(line)
if matchob:
f( line, matchob )
break
f(line,m)

Probably better ways than this exist.


Charles
 
N

Nick Vatamaniuc

I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):

... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?

Hrvoje,

To make it more elegant I would do this:

1. Put all the ...do somethings... in functions like
re1_do_something(), re2_do_something(),...

2. Create a list of pairs of (re,func) in other words:
dispatch=[ (re1, re1_do_something), (re2, re2_do_something), ... ]

3. Then do:
for regex,func in dispatch:
if regex.match(line):
func(...)


Hope this helps,
-Nick Vatamaniuc
 
S

Steffen Oschatz

I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?

Instead of scanning the same input over and over again with different,
maybe complex, regexes and ugly looking, nested ifs, i would suggest
defining a grammar and do parsing the input once with registered hooks
for your matching expressions.

SimpleParse (http://simpleparse.sourceforge.net) with a
DispatchProcessor or pyparsing (http://pyparsing.wikispaces.com/) in
combination with setParseAction or something similar are your friends
for such a task.

Steffen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top