regex: multiple matching for one string

S

scriptlearner

For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),
 
R

rurpy

For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),

p = re.compile('#a=([^;]*);b=([^;]*);c=([^;]*);')
m = p.match(line)
if m:
print m.group(1),m.group(2),m.group(3),

Note that "=*;" in your regex will match
zero or more "=" characters -- probably not
what you intended.

"[^;]* will match any string up to the next
";" character which will be a value (assuming
you don't have or care about embedded whitespace.)

You might also want to consider using a r'...'
string for the regex, which will make including
backslash characters easier if you need them
at some future time.
 
M

Mark Lawrence

For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),

IMHO a regex for this is overkill, a combination of string methods such
as split and find should suffice.

Regards.
 
B

Bill Davy

Mark Lawrence said:
IMHO a regex for this is overkill, a combination of string methods such as
split and find should suffice.

Regards.


For the OP, it can be done with regex by grouping:

p = re.compile(r'#a=(*);b=(*);c=(*);')
m = p.match(line)
if m:
print m.group(1),

m.group(1) has valuea in it, etc.

But this may not be the best way, but it is reasonably terse.
 
T

tiefeng wu

2009/7/23 [email protected] said:
For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec).  How do
I do that in Python?  The group method will only return the matched
part.  Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
       if m:
            print m.group(),

maybe like this:....
valuea
valueb
valuec

tiefeng wu
2009-07-23
 
R

rurpy

Nick said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Agreed. Two string.split()s, first at the semi-colon and then at the
equal sign, will yield you your value, without having to fool around
with regexes.

You're saying that something like the following
is better than the simple regex used by the OP?

[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
name, x, value = p.partition ('=')
if name != expected or x != '=':
raise SomeError()
values.append (value)
print values[0], values[1], values[2]

Blech, not in my book. The regex checks the
format of the string, extracts the values, and
does so very clearly. Further, it is easily
adapted to other similar formats, or evolutionary
changes in format. It is also (once one is
familiar with regexes -- a useful skill outside
of Python too) easier to get right (at least in
a simple case like this.)

The only reason I can think of to prefer
a split-based solution is if this code were
performance-critical in that I would expect
the split code to be faster (although I don't
know that for sure.)

This is a perfectly fine use of a regex.
 
R

rurpy

Scott said:
Nick said:
On 7/23/2009 9:23 AM, Mark Lawrence wrote:
(e-mail address removed) wrote:
For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),
IMHO a regex for this is overkill, a combination of string methods such
as split and find should suffice.

You're saying that something like the following
is better than the simple regex used by the OP?
[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
name, x, value = p.partition ('=')
if name != expected or x != '=':
raise SomeError()
values.append (value)
print values[0], values[1], values[2]

I call straw man: [tested]
line = "#a=valuea;b=valueb;c=valuec;"
d = dict(single.split('=', 1)
for single in line.split(';') if single)
d['#a'], d['b'], d['c']
If you want checking code, add:
if len(d) != 3:
raise ValueError('Too many keys: %s in %r)' % (
sorted(d), line))

OK, that seems like a good solution. It certainly
wasn't an obvious solution to me. I still have no
problem maintaining that

[tested]
line = "#a=valuea;b=valueb;c=valuec;"
m = re.match ('#a=(.*);b=(.*);c=(.*);', line)
m.groups((1,2,3))
(If you want checking code, nothing else required.)

is still simpler and clearer (with the obvious
caveat that one is familiar with regexes.)
The posted regex doesn't work; this might be homework, so
I'll not fix the two problems. The fact that you did not
see the failure weakens your claim of "does so very clearly."

Fact? Maybe you should have read the whole thread before
spewing claims that I did not see the regex problem.
The fact that you did not bother to weakens any claims
you make in this thread.
(Of course this line of argumentation is stupid anyway --
even had I not noticed the problem, it would say nothing
about the general case. My advice to you is not to try
to extrapolate when the sample size is one.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top