regex: multiple matching for one string

Discussion in 'Python' started by scriptlearner, Jul 23, 2009.

  1. For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    will like to take out the values (valuea, valueb, and valuec). How do
    I do that in Python? The group method will only return the matched
    part. Thanks.

    p = re.compile('#a=*;b=*;c=*;')
    m = p.match(line)
    if m:
    scriptlearner, Jul 23, 2009
    1. Advertisements

  2. scriptlearner

    rurpy Guest

    p = re.compile('#a=([^;]*);b=([^;]*);c=([^;]*);')
    m = p.match(line)
    if m:

    Note that "=*;" in your regex will match
    zero or more "=" characters -- probably not
    what you intended.

    "[^;]* will match any string up to the next
    ";" character which will be a value (assuming
    you don't have or care about embedded whitespace.)

    You might also want to consider using a r'...'
    string for the regex, which will make including
    backslash characters easier if you need them
    at some future time.
    rurpy, Jul 23, 2009
    1. Advertisements

  3. IMHO a regex for this is overkill, a combination of string methods such
    as split and find should suffice.

    Mark Lawrence, Jul 23, 2009
  4. scriptlearner

    Bill Davy Guest

    For the OP, it can be done with regex by grouping:

    p = re.compile(r'#a=(*);b=(*);c=(*);')
    m = p.match(line)
    if m:
    print, has valuea in it, etc.

    But this may not be the best way, but it is reasonably terse.
    Bill Davy, Jul 23, 2009
  5. scriptlearner

    tiefeng wu Guest

    maybe like this:....

    tiefeng wu
    tiefeng wu, Jul 23, 2009
  6. scriptlearner

    rurpy Guest

    You're saying that something like the following
    is better than the simple regex used by the OP?

    values = []
    parts = line.split(';')
    if len(parts) != 4: raise SomeError()
    for p, expected in zip (parts[-1], ('#a','b','c')):
    name, x, value = p.partition ('=')
    if name != expected or x != '=':
    raise SomeError()
    values.append (value)
    print values[0], values[1], values[2]

    Blech, not in my book. The regex checks the
    format of the string, extracts the values, and
    does so very clearly. Further, it is easily
    adapted to other similar formats, or evolutionary
    changes in format. It is also (once one is
    familiar with regexes -- a useful skill outside
    of Python too) easier to get right (at least in
    a simple case like this.)

    The only reason I can think of to prefer
    a split-based solution is if this code were
    performance-critical in that I would expect
    the split code to be faster (although I don't
    know that for sure.)

    This is a perfectly fine use of a regex.
    rurpy, Jul 24, 2009
  7. scriptlearner

    rurpy Guest

    OK, that seems like a good solution. It certainly
    wasn't an obvious solution to me. I still have no
    problem maintaining that

    line = "#a=valuea;b=valueb;c=valuec;"
    m = re.match ('#a=(.*);b=(.*);c=(.*);', line)
    (If you want checking code, nothing else required.)

    is still simpler and clearer (with the obvious
    caveat that one is familiar with regexes.)
    Fact? Maybe you should have read the whole thread before
    spewing claims that I did not see the regex problem.
    The fact that you did not bother to weakens any claims
    you make in this thread.
    (Of course this line of argumentation is stupid anyway --
    even had I not noticed the problem, it would say nothing
    about the general case. My advice to you is not to try
    to extrapolate when the sample size is one.)
    rurpy, Jul 25, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.