checking a string against multiple patterns

Discussion in 'Python' started by tomasz, Dec 18, 2007.

  1. tomasz

    tomasz Guest

    Hi,

    here is a piece of pseudo-code (taken from Ruby) that illustrates the
    problem I'd like to solve in Python:

    str = 'abc'
    if str =~ /(b)/ # Check if str matches a pattern
    str = $` + $1 # Perform some action
    elsif str =~ /(a)/ # Check another pattern
    str = $1 + $' # Perform some other action
    elsif str =~ /(c)/
    str = $1
    end

    The task is to check a string against a number of different patterns
    (containing groupings).
    For each pattern, different actions need to be taken.

    In Python, a single match of this kind can be done as follows:

    str = 'abc'
    match = re.search( '(b)' , str )
    if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
    this way of accessing 'pre-match'
    # is
    optimal, but let's ignore it now

    The problem is that you you can't extend this example to multiple
    matches with 'elif'
    because the match must be performed separately from the conditional.

    This obviously won't work in Python:

    if match=re.search( pattern1 , str ):
    ...
    elif match=re.search( pattern2 , str ):
    ...

    So the only way seems to be:

    match = re.search( pattern1 , str ):
    if match:
    ....
    else:
    match = re.search( pattern2 , str ):
    if match:
    ....
    else:
    match = re.search( pattern3 , str ):
    if match:
    ....

    and we end up having a very nasty, multiply-nested code.

    Is there an alternative to it? Am I missing something? Python doesn't
    have special variables $1, $2 (right?) so you must assign the result
    of a match to a variable, to be able to access the groups.

    I'd appreciate any hints.

    Tomasz
    tomasz, Dec 18, 2007
    #1
    1. Advertising

  2. tomasz

    kib Guest

    tomasz a écrit :

    > Is there an alternative to it? Am I missing something? Python doesn't
    > have special variables $1, $2 (right?) so you must assign the result
    > of a match to a variable, to be able to access the groups.
    >

    Hi Thomasz,

    See ie :

    http://www.regular-expressions.info/python.html [Search and Replace section]

    And you'll see that Python supports numbered groups and even named
    groups in regular expressions.

    Christophe K.
    kib, Dec 18, 2007
    #2
    1. Advertising

  3. On 18 dic, 09:41, tomasz <> wrote:

    > Hi,
    >
    > here is a piece of pseudo-code (taken from Ruby) that illustrates the
    > problem I'd like to solve in Python:
    >
    > str = 'abc'
    > if str =~ /(b)/ # Check if str matches a pattern
    > str = $` + $1 # Perform some action
    > elsif str =~ /(a)/ # Check another pattern
    > str = $1 + $' # Perform some other action
    > elsif str =~ /(c)/
    > str = $1
    > end
    >
    > The task is to check a string against a number of different patterns
    > (containing groupings).
    > For each pattern, different actions need to be taken.
    >
    > In Python, a single match of this kind can be done as follows:
    >
    > str = 'abc'
    > match = re.search( '(b)' , str )
    > if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
    > this way of accessing 'pre-match'
    > # is
    > optimal, but let's ignore it now
    >
    > The problem is that you you can't extend this example to multiple
    > matches with 'elif'
    > because the match must be performed separately from the conditional.
    >
    > This obviously won't work in Python:
    >
    > if match=re.search( pattern1 , str ):
    > ...
    > elif match=re.search( pattern2 , str ):
    > ...
    >
    > So the only way seems to be:
    >
    > match = re.search( pattern1 , str ):
    > if match:
    > ....
    > else:
    > match = re.search( pattern2 , str ):
    > if match:
    > ....
    > else:
    > match = re.search( pattern3 , str ):
    > if match:
    > ....
    >
    > and we end up having a very nasty, multiply-nested code.


    Define a small function with each test+action, and iterate over them
    until a match is found:

    def check1(input):
    match = re.search(pattern1, input)
    if match:
    return input[:match.end(1)]

    def check2(input):
    match = re.search(pattern2, input)
    if match:
    return ...

    def check3(input):
    match = ...
    if match:
    return ...

    for check in check1, check2, check3:
    result = check(input)
    if result is not None:
    break
    else:
    # no match found

    --
    Gabriel Genellina
    Gabriel Genellina, Dec 18, 2007
    #3
  4. tomasz

    grflanagan Guest

    On Dec 18, 1:41 pm, tomasz <> wrote:
    > Hi,
    >
    > here is a piece of pseudo-code (taken from Ruby) that illustrates the
    > problem I'd like to solve in Python:
    >
    > str = 'abc'
    > if str =~ /(b)/ # Check if str matches a pattern
    > str = $` + $1 # Perform some action
    > elsif str =~ /(a)/ # Check another pattern
    > str = $1 + $' # Perform some other action
    > elsif str =~ /(c)/
    > str = $1
    > end
    >
    > The task is to check a string against a number of different patterns
    > (containing groupings).
    > For each pattern, different actions need to be taken.
    >


    In the `re.sub` function (and `sub` method of regex object), the
    `repl` parameter can be a callback function as well as a string:

    http://docs.python.org/lib/node46.html

    Does that help?

    Eg.

    def multireplace(text, mapping):
    rx = re.compile('|'.join(re.escape(key) for key in mapping))
    def callback(match):
    key = match.group(0)
    repl = mapping[key]
    log.info("Replacing '%s' with '%s'", key, repl)
    return repl
    return rx.subn(callback, text)

    (I'm not sure, but I think I adapted this from: http://effbot.org/zone/python-replace.htm)

    Gerard
    grflanagan, Dec 18, 2007
    #4
  5. tomasz

    Tim Chase Guest

    > Define a small function with each test+action, and iterate over them
    > until a match is found:
    >
    > def check1(input):
    > match = re.search(pattern1, input)
    > if match:
    > return input[:match.end(1)]
    >
    > def check2(input):
    > match = re.search(pattern2, input)
    > if match:
    > return ...
    >
    > for check in check1, check2, check3:
    > result = check(input)
    > if result is not None:
    > break
    > else:
    > # no match found


    Or, one could even create a mapping of regexps->functions:

    def function1(match):
    do_something_with(match)

    def function2(match):
    do_something_with(match)

    def default_function(input):
    do_something_with(input)

    function_mapping = (
    (re.compile(pattern1), function1),
    (re.compile(pattern2), function2),
    (re.compile(pattern3), function1),
    )

    def match_and_do(input, mapping):
    for regex, func in mapping:
    m = regex.match(input)
    if m: return func(m)
    return default_function(input)

    result = match_and_do("Hello world", function_mapping)

    In addition to having a clean separation between patterns and
    functions, and the mapping between them, this also allows wiring
    multiple patterns to the same function (e.g. pattern3->function1)
    and also allows specification of the mapping evaluation order.

    -tkc
    Tim Chase, Dec 18, 2007
    #5
  6. tomasz <> writes:

    > here is a piece of pseudo-code (taken from Ruby) that illustrates the
    > problem I'd like to solve in Python:

    [...]

    I asked the very same question in
    http://groups.google.com/group/comp.lang.python/browse_frm/thread/3e8da954ff2265e/4deb5631ade8b393
    It seems that people either write more elaborate constructs or learn
    to tolerate the nesting.

    > Is there an alternative to it?


    A simple workaround is to write a trivial function that returns a
    boolean, and also stores the match object in either a global storage
    or an object. It's not really elegant, especially in smaller scripts,
    but it works:

    def search(pattern, s, store):
    match = re.search(pattern, s)
    store.match = match
    return match is not None

    class MatchStore(object):
    pass # irrelevant, any object with a 'match' attr would do

    where = MatchStore()
    if search(pattern1, s, where):
    pattern1 matched, matchobj in where.match
    elif search(pattern2, s, where):
    pattern2 matched, matchobj in where.match
    ....
    Hrvoje Niksic, Dec 18, 2007
    #6
  7. tomasz

    Duncan Booth Guest

    tomasz <> wrote:

    > Is there an alternative to it? Am I missing something? Python doesn't
    > have special variables $1, $2 (right?) so you must assign the result
    > of a match to a variable, to be able to access the groups.


    Look for repetition in your code and remove it. That will almost always
    remove the nesting. Or, combine your regular expressions into one large
    expression and branch on the existence of relevant groups. Using named
    groups stops all your code breaking just because you need to change one
    part of the regex.

    e.g. This would handle your example, but it is just one way to do it:

    import re
    from string import Template

    def sub(patterns, s):
    for pat, repl in patterns:
    m = re.match(pat, s)
    if m:
    return Template(repl).substitute(m.groupdict())
    return s

    PATTERNS = [
    (r'(?P<start>.*?)(?P<b>b+)', 'start=$start, b=$b'),
    (r'(?P<a>a+)(?P<tail>.*)$', 'Got a: $a, tail=$tail'),
    (r'(?P<c>c+)', 'starts with c: $c'),
    ]

    >>> sub(PATTERNS, 'abc')

    'start=a, b=b'
    >>> sub(PATTERNS, 'is a something')

    'is a something'
    >>> sub(PATTERNS, 'a something')

    'Got a: a, tail= something'
    Duncan Booth, Dec 18, 2007
    #7
  8. On Dec 18, 4:41 am, tomasz <> wrote:
    > Is there an alternative to it? Am I missing something? Python doesn't
    > have special variables $1, $2 (right?) so you must assign the result
    > of a match to a variable, to be able to access the groups.
    >
    > I'd appreciate any hints.
    >


    Don't use regexes for something as simple as this. Try find().

    Most of the time I use regexes in perl (90%+) I am doing something
    that can be done much better using the string methods and some simple
    operations. Plus, it turns out to be faster than perl usually.
    Jonathan Gardner, Dec 18, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. thorsten
    Replies:
    1
    Views:
    433
  2. crichmon
    Replies:
    4
    Views:
    471
    Mabden
    Jul 7, 2004
  3. Xah Lee
    Replies:
    0
    Views:
    617
    Xah Lee
    Jun 14, 2006
  4. Aaron Scott
    Replies:
    7
    Views:
    362
    alex23
    Dec 2, 2008
  5. George George

    matching against a zillion patterns

    George George, Oct 15, 2009, in forum: Ruby
    Replies:
    17
    Views:
    210
    Simon Krahnke
    Oct 18, 2009
Loading...

Share This Page