regex: multiple matching for one string

scriptlearner · Jul 23, 2009

For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),

rurpy · Jul 23, 2009

For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),

p = re.compile('#a=([^;]*);b=([^;]*);c=([^;]*);')
m = p.match(line)
if m:
print m.group(1),m.group(2),m.group(3),

Note that "=*;" in your regex will match
zero or more "=" characters -- probably not
what you intended.

"[^;]* will match any string up to the next
";" character which will be a value (assuming
you don't have or care about embedded whitespace.)

You might also want to consider using a r'...'
string for the regex, which will make including
backslash characters easier if you need them
at some future time.

Mark Lawrence · Jul 23, 2009

For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),

IMHO a regex for this is overkill, a combination of string methods such
as split and find should suffice.

Regards.

Bill Davy · Jul 23, 2009

Mark Lawrence said:
IMHO a regex for this is overkill, a combination of string methods such as
split and find should suffice.

Regards.

For the OP, it can be done with regex by grouping:

p = re.compile(r'#a=(*);b=(*);c=(*);')
m = p.match(line)
if m:
print m.group(1),

m.group(1) has valuea in it, etc.

But this may not be the best way, but it is reasonably terse.

tiefeng wu · Jul 23, 2009

2009/7/23 [email protected] said:
For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),

maybe like this:....
valuea
valueb
valuec

tiefeng wu
2009-07-23

rurpy · Jul 24, 2009

Nick said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Agreed. Two string.split()s, first at the semi-colon and then at the
equal sign, will yield you your value, without having to fool around
with regexes.

You're saying that something like the following
is better than the simple regex used by the OP?

[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
name, x, value = p.partition ('=')
if name != expected or x != '=':
raise SomeError()
values.append (value)
print values[0], values[1], values[2]

Blech, not in my book. The regex checks the
format of the string, extracts the values, and
does so very clearly. Further, it is easily
adapted to other similar formats, or evolutionary
changes in format. It is also (once one is
familiar with regexes -- a useful skill outside
of Python too) easier to get right (at least in
a simple case like this.)

The only reason I can think of to prefer
a split-based solution is if this code were
performance-critical in that I would expect
the split code to be faster (although I don't
know that for sure.)

This is a perfectly fine use of a regex.

rurpy · Jul 25, 2009

Scott said:
Nick said:

On 7/23/2009 9:23 AM, Mark Lawrence wrote:
(e-mail address removed) wrote:
For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),
IMHO a regex for this is overkill, a combination of string methods such
as split and find should suffice.

Click to expand...

You're saying that something like the following
is better than the simple regex used by the OP?
[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
name, x, value = p.partition ('=')
if name != expected or x != '=':
raise SomeError()
values.append (value)
print values[0], values[1], values[2]

Click to expand...

I call straw man: [tested]
line = "#a=valuea;b=valueb;c=valuec;"
d = dict(single.split('=', 1)
for single in line.split(';') if single)
d['#a'], d['b'], d['c']
If you want checking code, add:
if len(d) != 3:
raise ValueError('Too many keys: %s in %r)' % (
sorted(d), line))

OK, that seems like a good solution. It certainly
wasn't an obvious solution to me. I still have no
problem maintaining that

[tested]
line = "#a=valuea;b=valueb;c=valuec;"
m = re.match ('#a=(.*);b=(.*);c=(.*);', line)
m.groups((1,2,3))
(If you want checking code, nothing else required.)

is still simpler and clearer (with the obvious
caveat that one is familiar with regexes.)

The posted regex doesn't work; this might be homework, so
I'll not fix the two problems. The fact that you did not
see the failure weakens your claim of "does so very clearly."

Fact? Maybe you should have read the whole thread before
spewing claims that I did not see the regex problem.
The fact that you did not bother to weakens any claims
you make in this thread.
(Of course this line of argumentation is stupid anyway --
even had I not noticed the problem, it would say nothing
about the general case. My advice to you is not to try
to extrapolate when the sample size is one.)

Regex not matching a string	2	Jan 9, 2013
Creating a regex to get multiple values and print	0	Jan 10, 2021
regex line by line over file	8	Mar 27, 2014
Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023
help with regex matching multiple %e	0	Mar 3, 2011
RegEx for matching brackets	5	May 2, 2008
Find and replace multiple RegEx search expressions	0	Mar 18, 2014
Regex Matching on Readline()	3	Dec 20, 2007

regex: multiple matching for one string

scriptlearner

rurpy

Mark Lawrence

Bill Davy

tiefeng wu

rurpy

rurpy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads