best way to do a series of regexp checks with groups

Discussion in 'Python' started by Mark Fanty, Jan 23, 2005.

  1. Mark Fanty

    Mark Fanty Guest

    In perl, I might do (made up example just to illustrate the point):

    if(/add (\d+) (\d+)/) {
    do_add($1, $2);
    } elsif (/mult (\d+) (\d+)/) {
    do_mult($1,$2);
    } elsif(/help (\w+)/) {
    show_help($1);
    }

    or even

    do_add($1,$2) if /add (\d+) (\d+)/;
    do_mult($1,$2) if /mult (\d+) (\d+)/;
    show_help($1) if /help (\w+)/;

    How can I best do this in pyhon? Attempt 1:

    m = re.search(r'add (\d+) (\d+)', $line)
    if m:
    do_add(m.group(1), m.group(2))
    else:
    m = re.search(r'mult (\d+) (\d+)', $line)
    if m:
    do_mult(m.group(1), m.group(2))
    else:
    m = re.search(r'help (\w+)', $line)
    show_help(m.group(1))

    The increasing nesting is a problem. I could put them in a while loop just
    so I can use break

    while 1:
    m = re.search(r'add (\d+) (\d+)', $line)
    if m:
    do_add(m.group(1), m.group(2))
    break
    m = re.search(r'mult (\d+) (\d+)', $line)
    if m:
    do_mult(m.group(1), m.group(2))
    break
    m = re.search(r'help (\w+)', $line)
    if m:
    show_help(m.group(1))
    break

    No nesting, but the while is misleading since I'm not looping and this is a
    bit awkward. I don't mind a few more key strokes, but I'd like clarity. I
    wish I could do

    if m = re.search(r'add (\d+) (\d+)', $line):
    do_add(m.group(1), m.group(2))
    elif m = re.search(r'mult (\d+) (\d+)', $line):
    do_mult(m.group(1), m.group(2))
    else m = re.search(r'help (\w+)', $line):
    show_help(m.group(1))

    Now that's what I'm looking for, but I can't put the assignment in an
    expression. Any recommendations? Less "tricky" is better. Not having to
    import some personal module with a def to help would be better (e.g. for
    sharing)..

    Thanks
    Mark Fanty, Jan 23, 2005
    #1
    1. Advertising

  2. Mark Fanty

    Guest

    what about something like this?
    >>> import re
    >>> m = re.match(r"""(?P<operator>add|mult) (?P<int_1>\d+)

    (?P<int_2>\d+)""", 'add 3 5')
    >>> from operator import add, mul
    >>> op = {'add': add, 'mult: mul}
    >>> op[m.groupdict()['operator']](int(m.groupdict()['int_1']),

    int(m.groupdict()['int_2']))
    8
    , Jan 23, 2005
    #2
    1. Advertising

  3. Mark Fanty <> wrote:
    > In perl, I might do (made up example just to illustrate the point):
    >
    > if(/add (\d+) (\d+)/) {
    > do_add($1, $2);
    > } elsif (/mult (\d+) (\d+)/) {
    > do_mult($1,$2);
    > } elsif(/help (\w+)/) {
    > show_help($1);
    > }


    There was a thread about this recently under the title

    "regular expression: perl ==> python"

    Here is a different solution...

    class Result:
    def set(self, value):
    self.value = value
    return value

    m = Result()

    if m.set(re.search(r'add (\d+) (\d+)', line)):
    do_add(m.value.group(1), m.value.group(2))
    elif m.set(re.search(r'mult (\d+) (\d+)', line)):
    do_mult(m.value.group(1), m.value.group(2))
    elif m.set(re.search(r'help (\w+)', line)):
    show_help(m.value.group(1))

    --
    Nick Craig-Wood <> -- http://www.craig-wood.com/nick
    Nick Craig-Wood, Jan 24, 2005
    #3
  4. Nick Craig-Wood <> wrote:

    > Here is a different solution...
    >
    > class Result:
    > def set(self, value):
    > self.value = value
    > return value
    >
    > m = Result()
    >
    > if m.set(re.search(r'add (\d+) (\d+)', line)):
    > do_add(m.value.group(1), m.value.group(2))
    > elif m.set(re.search(r'mult (\d+) (\d+)', line)):
    > do_mult(m.value.group(1), m.value.group(2))
    > elif m.set(re.search(r'help (\w+)', line)):
    > show_help(m.value.group(1))


    This is roughly the same as my Cookbook recipe for test-and-set, but if
    all you're using it for is RE search and MO access you might be better
    off giving more responsibilities to your auxiliary class, such as:

    class ReWithMemory(object):
    def search(self, are, aline):
    self.mo = re.search(are, aline)
    return self.mo
    def group(self, n):
    return self.mo.group(n)

    m = ReWithMemory()

    if m.search(r'add (\d+) (\d+)', line):
    do_add(m.group(1), m.group(2))
    elif m.search(r'mult (\d+) (\d+)', line):
    do_mult(m.group(1), m.group(2))
    elif m.search(r'help (\w+)', line):
    show_help(m.group(1))

    Demeter's Law suggests that the 'm.value.group' accesses in your
    approach are better handled by having m delegate to its `value'; and the
    repeated m.set(re.search( ... seem to be a slight code smell, violating
    "once and only once", which suggests merging into a single `set' method.
    Your approach is more general, of course.


    Alex
    Alex Martelli, Jan 24, 2005
    #4
  5. Mark Fanty

    Duncan Booth Guest

    Mark Fanty wrote:

    > No nesting, but the while is misleading since I'm not looping and this
    > is a bit awkward. I don't mind a few more key strokes, but I'd like
    > clarity. I wish I could do
    >
    > if m = re.search(r'add (\d+) (\d+)', $line):
    > do_add(m.group(1), m.group(2))
    > elif m = re.search(r'mult (\d+) (\d+)', $line):
    > do_mult(m.group(1), m.group(2))
    > else m = re.search(r'help (\w+)', $line):
    > show_help(m.group(1))
    >
    > Now that's what I'm looking for, but I can't put the assignment in an
    > expression. Any recommendations? Less "tricky" is better.


    Try thinking along the following lines. It is longer, but clearer and
    easily extended to more commands. For more complete command processing use
    the 'cmd' module.

    import sys

    class Command:
    def do_add(self, a, b):
    '''add <number> <number>'''
    return int(a)+int(b)

    def do_mult(self, a, b):
    '''mult <number> <number>'''
    return int(a)*int(b)

    def do_help(self, *what):
    '''help [words] - give some help'''
    if not what:
    what = sorted(s[3:] for s in dir(self) if s.startswith('do_'))
    def error(): '''Unknown command'''
    for w in what:
    cmd = getattr(self, 'do_'+w, error)
    print "Help for %r:\n%s\n" % (w, cmd.__doc__)

    def do_exit(self):
    '''exit - the program'''
    sys.exit(0)

    def __call__(self, line):
    words = line.split()
    if not words:
    return
    command = words.pop(0)
    cmdfn = getattr(self, 'do_'+command, None)
    if not cmdfn:
    print "Unknown command %r. Use 'help' for help" % command
    return

    result = None
    try:
    result = cmdfn(*words)
    except TypeError, msg:
    print msg
    if result is not None:
    print "result is",result

    cmd = Command()

    while 1:
    cmd(sys.stdin.readline())
    Duncan Booth, Jan 24, 2005
    #5
  6. Alex Martelli wrote:
    > class ReWithMemory(object):
    > def search(self, are, aline):
    > self.mo = re.search(are, aline)
    > return self.mo
    > def group(self, n):
    > return self.mo.group(n)
    >
    > m = ReWithMemory()
    >
    > if m.search(r'add (\d+) (\d+)', line):
    > do_add(m.group(1), m.group(2))
    > elif m.search(r'mult (\d+) (\d+)', line):
    > do_mult(m.group(1), m.group(2))
    > elif m.search(r'help (\w+)', line):
    > show_help(m.group(1))
    >
    > Demeter's Law suggests that the 'm.value.group' accesses in your
    > approach are better handled by having m delegate to its `value'; and the
    > repeated m.set(re.search( ... seem to be a slight code smell, violating
    > "once and only once", which suggests merging into a single `set' method.
    > Your approach is more general, of course.


    I get a bit uneasy from the repeated calls to m.group... If I was going
    to build a class around the re, I think I might lean towards something like:

    class ReWithMemory(object):
    def search(self, are, aline):
    self.mo = re.search(are, aline)
    return self.mo
    def groups(self, *indices):
    return [self.mo.group(i) for i in indices]

    m = ReWithMemory()

    if m.search(r'add (\d+) (\d+)', line):
    do_add(*m.groups(1, 2))
    elif m.search(r'mult (\d+) (\d+)', line):
    do_mult(*m.groups(1, 2))
    elif m.search(r'help (\w+)', line):
    show_help(*m.groups(1))

    Of course, this is even less general-purpose than yours...

    (And if I saw myself using this much regex code, I'd probably reconsider
    my strategy anyway.) ;)

    Steve
    Steven Bethard, Jan 24, 2005
    #6
  7. Steven Bethard <> wrote:

    > I get a bit uneasy from the repeated calls to m.group... If I was going
    > to build a class around the re, I think I might lean towards something like:
    >
    > class ReWithMemory(object):
    > def search(self, are, aline):
    > self.mo = re.search(are, aline)
    > return self.mo
    > def groups(self, *indices):
    > return [self.mo.group(i) for i in indices]
    >
    > m = ReWithMemory()
    >
    > if m.search(r'add (\d+) (\d+)', line):
    > do_add(*m.groups(1, 2))
    > elif m.search(r'mult (\d+) (\d+)', line):
    > do_mult(*m.groups(1, 2))
    > elif m.search(r'help (\w+)', line):
    > show_help(*m.groups(1))
    >
    > Of course, this is even less general-purpose than yours...


    I'm not sure what advantage it's supposed to give. Would you have any
    problems writing, say, somecall(X[1], X[2]) ...? Python normally relies
    on indexing one thing at a time, and I see calling m.group(1) etc as
    just the same kind of approach.


    > (And if I saw myself using this much regex code, I'd probably reconsider
    > my strategy anyway.) ;)


    Surely joining all the regexp's into one big one with | would be faster
    and more compact, but, with variable numbers of groups per sub-regexp,
    determining which regexp matched can perhaps be tricky (issues with
    matching mo.lastindex to the correct sub-regexp). So, I can understand
    the desire to do it sequentially, regexp by regexp.


    Alex
    Alex Martelli, Jan 24, 2005
    #7
  8. Mark Fanty wrote:
    > In perl, I might do (made up example just to illustrate the point):
    >
    > if(/add (\d+) (\d+)/) {
    > do_add($1, $2);
    > } elsif (/mult (\d+) (\d+)/) {
    > do_mult($1,$2);
    > } elsif(/help (\w+)/) {
    > show_help($1);
    > }
    >
    > or even
    >
    > do_add($1,$2) if /add (\d+) (\d+)/;
    > do_mult($1,$2) if /mult (\d+) (\d+)/;
    > show_help($1) if /help (\w+)/;



    Here's some Python code (tested).

    It is not as concise as the Perl code.
    Which might or might not be a disadvantage.

    Sometimes, regular expressions are not the right thing.
    For example, a simple str.startswith() might be better.

    What about "add 9999999999999999999999999 99999999999999999999999"?
    Maybe we want to catch the error before we get to the do_add.
    Can't easily do that with regular expressions.
    And what about a variable number of arguments.

    If regular expressions are no longer used, the Perl code seems
    to loose some of its elegance.


    I've been arguing for writing small, simple functions that do something.
    This should make testing much easier.
    These functions might happen to use regular expressions.


    The code below is clearly more flexible.
    It's easy, for example, to add a new command.
    Just add an entry to dispatch.

    The thing I like best about it is the passing of a dict.

    ===
    #!/usr/bin/python

    import re

    # here we know about functions and patterns
    def do_add(arg1, arg2): print "+ %s %s" % (arg1, arg2)
    def do_times(arg1, arg2): print "* %s %s" % (arg1, arg2)

    add_re = re.compile(r'add (?P<arg1>.*) (?P<arg2>.*)')
    times_re = re.compile(r'times (?P<arg1>.*) (?P<arg2>.*)')

    def find_add(str):
    match = add_re.match(str)
    if match is None:
    return match
    return match.groupdict()

    def find_times(str):
    match = times_re.match(str)
    if match is None:
    return match
    return match.groupdict()


    # here we bind everything together
    dispatch = [
    (find_add, do_add),
    (find_times, do_times),
    ]

    def doit(str):
    for (find, do) in dispatch:
    d = find(str)
    if d is not None:
    return do(**d)
    return None # or error

    if __name__ == '__main__':

    doit('add this that')
    doit('times this that')

    ===


    Jonathan
    Jonathan Fine, Jan 24, 2005
    #8
  9. Mark Fanty

    Mark Fanty Guest

    This is the kind of thing I meant. I think I have to get used to writing
    small, light-weight classes. You inspired this variation which is a little
    more verbose in the class definition, but less so in the use:

    class Matcher:
    def search(self, r,s):
    self.value = re.search(r,s)
    return self.value
    def __getitem__(self, i):
    return self.value.group(i)

    m = Matcher()

    if m.search(r'add (\d+) (\d+)', line):
    do_add(m[1], m[2])
    elif m.search(r'mult (\d+) (\d+)', line):
    do_mult(m[1], m[2])
    elif m.search(r'help (\w+)', line):
    show_help(m[1])

    As for using regular expressions too much... they are why I've liked perl so
    much for quick file processing for years. I don't like perl objects at all,
    which is why I'm trying python, but the re package has not been my favorite
    so far...

    "Nick Craig-Wood" <> wrote in message
    news:-wood.com...
    >
    > There was a thread about this recently under the title
    >
    > "regular expression: perl ==> python"
    >
    > Here is a different solution...
    >
    > class Result:
    > def set(self, value):
    > self.value = value
    > return value
    >
    > m = Result()
    >
    > if m.set(re.search(r'add (\d+) (\d+)', line)):
    > do_add(m.value.group(1), m.value.group(2))
    > elif m.set(re.search(r'mult (\d+) (\d+)', line)):
    > do_mult(m.value.group(1), m.value.group(2))
    > elif m.set(re.search(r'help (\w+)', line)):
    > show_help(m.value.group(1))
    >
    > --
    > Nick Craig-Wood <> -- http://www.craig-wood.com/nick
    Mark Fanty, Jan 25, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anders K. Jacobsen [DK]

    "Pattern" or "best practice" in security checks

    Anders K. Jacobsen [DK], Dec 5, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    348
    Johann MacDonagh
    Dec 6, 2004
  2. Matt Wette
    Replies:
    6
    Views:
    367
    Jeff Shannon
    Mar 18, 2005
  3. Anders K. Jacobsen [DK]

    "Pattern" or "best practice" in security checks

    Anders K. Jacobsen [DK], Dec 5, 2004, in forum: ASP .Net Security
    Replies:
    0
    Views:
    143
    Anders K. Jacobsen [DK]
    Dec 5, 2004
  4. Martin DeMello
    Replies:
    1
    Views:
    80
    Sylvain Joyeux
    Feb 25, 2008
  5. Joao Silva
    Replies:
    16
    Views:
    354
    7stud --
    Aug 21, 2009
Loading...

Share This Page