checking a string against multiple patterns

T

tomasz

Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.

In Python, a single match of this kind can be done as follows:

str = 'abc'
match = re.search( '(b)' , str )
if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
this way of accessing 'pre-match'
# is
optimal, but let's ignore it now

The problem is that you you can't extend this example to multiple
matches with 'elif'
because the match must be performed separately from the conditional.

This obviously won't work in Python:

if match=re.search( pattern1 , str ):
...
elif match=re.search( pattern2 , str ):
...

So the only way seems to be:

match = re.search( pattern1 , str ):
if match:
....
else:
match = re.search( pattern2 , str ):
if match:
....
else:
match = re.search( pattern3 , str ):
if match:
....

and we end up having a very nasty, multiply-nested code.

Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.

I'd appreciate any hints.

Tomasz
 
K

kib

tomasz a écrit :
Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.
Hi Thomasz,

See ie :

http://www.regular-expressions.info/python.html [Search and Replace section]

And you'll see that Python supports numbered groups and even named
groups in regular expressions.

Christophe K.
 
G

Gabriel Genellina

Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.

In Python, a single match of this kind can be done as follows:

str = 'abc'
match = re.search( '(b)' , str )
if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
this way of accessing 'pre-match'
# is
optimal, but let's ignore it now

The problem is that you you can't extend this example to multiple
matches with 'elif'
because the match must be performed separately from the conditional.

This obviously won't work in Python:

if match=re.search( pattern1 , str ):
...
elif match=re.search( pattern2 , str ):
...

So the only way seems to be:

match = re.search( pattern1 , str ):
if match:
....
else:
match = re.search( pattern2 , str ):
if match:
....
else:
match = re.search( pattern3 , str ):
if match:
....

and we end up having a very nasty, multiply-nested code.

Define a small function with each test+action, and iterate over them
until a match is found:

def check1(input):
match = re.search(pattern1, input)
if match:
return input[:match.end(1)]

def check2(input):
match = re.search(pattern2, input)
if match:
return ...

def check3(input):
match = ...
if match:
return ...

for check in check1, check2, check3:
result = check(input)
if result is not None:
break
else:
# no match found
 
G

grflanagan

Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.

In the `re.sub` function (and `sub` method of regex object), the
`repl` parameter can be a callback function as well as a string:

http://docs.python.org/lib/node46.html

Does that help?

Eg.

def multireplace(text, mapping):
rx = re.compile('|'.join(re.escape(key) for key in mapping))
def callback(match):
key = match.group(0)
repl = mapping[key]
log.info("Replacing '%s' with '%s'", key, repl)
return repl
return rx.subn(callback, text)

(I'm not sure, but I think I adapted this from: http://effbot.org/zone/python-replace.htm)

Gerard
 
T

Tim Chase

Define a small function with each test+action, and iterate over them
until a match is found:

def check1(input):
match = re.search(pattern1, input)
if match:
return input[:match.end(1)]

def check2(input):
match = re.search(pattern2, input)
if match:
return ...

for check in check1, check2, check3:
result = check(input)
if result is not None:
break
else:
# no match found

Or, one could even create a mapping of regexps->functions:

def function1(match):
do_something_with(match)

def function2(match):
do_something_with(match)

def default_function(input):
do_something_with(input)

function_mapping = (
(re.compile(pattern1), function1),
(re.compile(pattern2), function2),
(re.compile(pattern3), function1),
)

def match_and_do(input, mapping):
for regex, func in mapping:
m = regex.match(input)
if m: return func(m)
return default_function(input)

result = match_and_do("Hello world", function_mapping)

In addition to having a clean separation between patterns and
functions, and the mapping between them, this also allows wiring
multiple patterns to the same function (e.g. pattern3->function1)
and also allows specification of the mapping evaluation order.

-tkc
 
H

Hrvoje Niksic

tomasz said:
here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:
[...]

I asked the very same question in
http://groups.google.com/group/comp.lang.python/browse_frm/thread/3e8da954ff2265e/4deb5631ade8b393
It seems that people either write more elaborate constructs or learn
to tolerate the nesting.
Is there an alternative to it?

A simple workaround is to write a trivial function that returns a
boolean, and also stores the match object in either a global storage
or an object. It's not really elegant, especially in smaller scripts,
but it works:

def search(pattern, s, store):
match = re.search(pattern, s)
store.match = match
return match is not None

class MatchStore(object):
pass # irrelevant, any object with a 'match' attr would do

where = MatchStore()
if search(pattern1, s, where):
pattern1 matched, matchobj in where.match
elif search(pattern2, s, where):
pattern2 matched, matchobj in where.match
....
 
D

Duncan Booth

tomasz said:
Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.

Look for repetition in your code and remove it. That will almost always
remove the nesting. Or, combine your regular expressions into one large
expression and branch on the existence of relevant groups. Using named
groups stops all your code breaking just because you need to change one
part of the regex.

e.g. This would handle your example, but it is just one way to do it:

import re
from string import Template

def sub(patterns, s):
for pat, repl in patterns:
m = re.match(pat, s)
if m:
return Template(repl).substitute(m.groupdict())
return s

PATTERNS = [
(r'(?P<start>.*?)(?P<b>b+)', 'start=$start, b=$b'),
(r'(?P<a>a+)(?P<tail>.*)$', 'Got a: $a, tail=$tail'),
'Got a: a, tail= something'
 
J

Jonathan Gardner

Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.

I'd appreciate any hints.

Don't use regexes for something as simple as this. Try find().

Most of the time I use regexes in perl (90%+) I am doing something
that can be done much better using the string methods and some simple
operations. Plus, it turns out to be faster than perl usually.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top