regex: multiple matching for one string

Discussion in 'Python' started by scriptlearner@gmail.com, Jul 23, 2009.

  1. Guest

    For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    will like to take out the values (valuea, valueb, and valuec). How do
    I do that in Python? The group method will only return the matched
    part. Thanks.

    p = re.compile('#a=*;b=*;c=*;')
    m = p.match(line)
    if m:
    print m.group(),
     
    , Jul 23, 2009
    #1
    1. Advertising

  2. Guest

    On Jul 22, 7:45 pm, ""
    <> wrote:
    > For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    > will like to take out the values (valuea, valueb, and valuec). How do
    > I do that in Python? The group method will only return the matched
    > part. Thanks.
    >
    > p = re.compile('#a=*;b=*;c=*;')
    > m = p.match(line)
    > if m:
    > print m.group(),


    p = re.compile('#a=([^;]*);b=([^;]*);c=([^;]*);')
    m = p.match(line)
    if m:
    print m.group(1),m.group(2),m.group(3),

    Note that "=*;" in your regex will match
    zero or more "=" characters -- probably not
    what you intended.

    "[^;]* will match any string up to the next
    ";" character which will be a value (assuming
    you don't have or care about embedded whitespace.)

    You might also want to consider using a r'...'
    string for the regex, which will make including
    backslash characters easier if you need them
    at some future time.
     
    , Jul 23, 2009
    #2
    1. Advertising

  3. wrote:
    > For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    > will like to take out the values (valuea, valueb, and valuec). How do
    > I do that in Python? The group method will only return the matched
    > part. Thanks.
    >
    > p = re.compile('#a=*;b=*;c=*;')
    > m = p.match(line)
    > if m:
    > print m.group(),


    IMHO a regex for this is overkill, a combination of string methods such
    as split and find should suffice.

    Regards.
     
    Mark Lawrence, Jul 23, 2009
    #3
  4. Bill Davy Guest

    "Mark Lawrence" <> wrote in message
    news:...
    > wrote:
    >> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    >> will like to take out the values (valuea, valueb, and valuec). How do
    >> I do that in Python? The group method will only return the matched
    >> part. Thanks.
    >>
    >> p = re.compile('#a=*;b=*;c=*;')
    >> m = p.match(line)
    >> if m:
    >> print m.group(),

    >
    > IMHO a regex for this is overkill, a combination of string methods such as
    > split and find should suffice.
    >
    > Regards.
    >



    For the OP, it can be done with regex by grouping:

    p = re.compile(r'#a=(*);b=(*);c=(*);')
    m = p.match(line)
    if m:
    print m.group(1),

    m.group(1) has valuea in it, etc.

    But this may not be the best way, but it is reasonably terse.
     
    Bill Davy, Jul 23, 2009
    #4
  5. tiefeng wu Guest

    2009/7/23 <>:
    > For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    > will like to take out the values (valuea, valueb, and valuec).  How do
    > I do that in Python?  The group method will only return the matched
    > part.  Thanks.
    >
    > p = re.compile('#a=*;b=*;c=*;')
    > m = p.match(line)
    >        if m:
    >             print m.group(),
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >


    maybe like this:
    >>> p = re.compile(r'#?\w+=(\w+);')
    >>> l = re.findall(p, '#a=valuea;b=valueb;c=valuec;')
    >>> for r in l: print(r)

    ....
    valuea
    valueb
    valuec

    tiefeng wu
    2009-07-23
     
    tiefeng wu, Jul 23, 2009
    #5
  6. Guest

    Nick Dumas wrote:
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > Agreed. Two string.split()s, first at the semi-colon and then at the
    > equal sign, will yield you your value, without having to fool around
    > with regexes.
    >
    > On 7/23/2009 9:23 AM, Mark Lawrence wrote:
    >> wrote:
    >>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    >>> will like to take out the values (valuea, valueb, and valuec). How do
    >>> I do that in Python? The group method will only return the matched
    >>> part. Thanks.
    >>>
    >>> p = re.compile('#a=*;b=*;c=*;')
    >>> m = p.match(line)
    >>> if m:
    >>> print m.group(),

    >>
    >> IMHO a regex for this is overkill, a combination of string methods such
    >> as split and find should suffice.


    You're saying that something like the following
    is better than the simple regex used by the OP?

    [untested]
    values = []
    parts = line.split(';')
    if len(parts) != 4: raise SomeError()
    for p, expected in zip (parts[-1], ('#a','b','c')):
    name, x, value = p.partition ('=')
    if name != expected or x != '=':
    raise SomeError()
    values.append (value)
    print values[0], values[1], values[2]

    Blech, not in my book. The regex checks the
    format of the string, extracts the values, and
    does so very clearly. Further, it is easily
    adapted to other similar formats, or evolutionary
    changes in format. It is also (once one is
    familiar with regexes -- a useful skill outside
    of Python too) easier to get right (at least in
    a simple case like this.)

    The only reason I can think of to prefer
    a split-based solution is if this code were
    performance-critical in that I would expect
    the split code to be faster (although I don't
    know that for sure.)

    This is a perfectly fine use of a regex.
     
    , Jul 24, 2009
    #6
  7. Guest

    Scott David Daniels wrote:
    > wrote:
    >> Nick Dumas wrote:
    >>> On 7/23/2009 9:23 AM, Mark Lawrence wrote:
    >>>> wrote:
    >>>>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
    >>>>> will like to take out the values (valuea, valueb, and valuec). How do
    >>>>> I do that in Python? The group method will only return the matched
    >>>>> part. Thanks.
    >>>>>
    >>>>> p = re.compile('#a=*;b=*;c=*;')
    >>>>> m = p.match(line)
    >>>>> if m:
    >>>>> print m.group(),
    >>>> IMHO a regex for this is overkill, a combination of string methods such
    >>>> as split and find should suffice.

    >>
    >> You're saying that something like the following
    >> is better than the simple regex used by the OP?
    >> [untested]
    >> values = []
    >> parts = line.split(';')
    >> if len(parts) != 4: raise SomeError()
    >> for p, expected in zip (parts[-1], ('#a','b','c')):
    >> name, x, value = p.partition ('=')
    >> if name != expected or x != '=':
    >> raise SomeError()
    >> values.append (value)
    >> print values[0], values[1], values[2]

    >
    > I call straw man: [tested]
    > line = "#a=valuea;b=valueb;c=valuec;"
    > d = dict(single.split('=', 1)
    > for single in line.split(';') if single)
    > d['#a'], d['b'], d['c']
    > If you want checking code, add:
    > if len(d) != 3:
    > raise ValueError('Too many keys: %s in %r)' % (
    > sorted(d), line))


    OK, that seems like a good solution. It certainly
    wasn't an obvious solution to me. I still have no
    problem maintaining that

    [tested]
    line = "#a=valuea;b=valueb;c=valuec;"
    m = re.match ('#a=(.*);b=(.*);c=(.*);', line)
    m.groups((1,2,3))
    (If you want checking code, nothing else required.)

    is still simpler and clearer (with the obvious
    caveat that one is familiar with regexes.)

    >> Blech, not in my book. The regex checks the
    >> format of the string, extracts the values, and
    >> does so very clearly. Further, it is easily
    >> adapted to other similar formats, or evolutionary
    >> changes in format. It is also (once one is
    >> familiar with regexes -- a useful skill outside
    >> of Python too) easier to get right (at least in
    >> a simple case like this.)

    > The posted regex doesn't work; this might be homework, so
    > I'll not fix the two problems. The fact that you did not
    > see the failure weakens your claim of "does so very clearly."


    Fact? Maybe you should have read the whole thread before
    spewing claims that I did not see the regex problem.
    The fact that you did not bother to weakens any claims
    you make in this thread.
    (Of course this line of argumentation is stupid anyway --
    even had I not noticed the problem, it would say nothing
    about the general case. My advice to you is not to try
    to extrapolate when the sample size is one.)
     
    , Jul 25, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xah Lee
    Replies:
    1
    Views:
    969
    Ilias Lazaridis
    Sep 22, 2006
  2. Xah Lee
    Replies:
    8
    Views:
    482
    Ilias Lazaridis
    Sep 26, 2006
  3. Xah Lee
    Replies:
    2
    Views:
    238
    Xah Lee
    Sep 25, 2006
  4. Bobby Chamness
    Replies:
    2
    Views:
    261
    Xicheng Jia
    May 3, 2007
  5. Replies:
    2
    Views:
    414
Loading...

Share This Page