Capturing repeating group matches in regular expressions

James Collier · Aug 11, 2004

Is it possible to capture the results of repeating group matches in the
python regular expression module?

To illustrate, what I want is:

>>> re1 = re.compile("([a-z]W)([a-z]X)+([a-z]Y)");
>>> mo1 = re.match("aWbXcXdXeXfY");
>>> print mo1.groupsButNotAsWeKnowIt()

Click to expand...

Click to expand...

('aW','bX','cX','dX','eX','fY')

instead of
("aW", "eX", "fY")

.... which captures only the last match from the second group.

Of course, one option is to break out the substring containing the
repeating group and then use split() or findall() within the substring,
but, but, but ... I'd like to do it in one hit if possible.

I believe someone has raised a similar question before, but I can't find
a definitive answer. It may be so stunningly obvious that nobody ever
bothers to answer - if so, could some kind soul please humour me and at
least point me to what I'm not seeing in the Fine Manual.

Thanks,
James.

Michael Hudson · Aug 11, 2004

James Collier said:
Is it possible to capture the results of repeating group matches in
the python regular expression module?

Not easily; there's a small dicussion on python-dev at the moment
about this. Erik Heneryd hacked up something that might be useful:

http://mail.python.org/pipermail/python-dev/attachments/20040810/a5e602ab/structmatch.py

And there's always the "use a real parser" option

Cheers,
mwh

Paul McGuire · Aug 11, 2004

It's a bit wordy, but perhaps the ability to easily structure and retrieve
your returned tokens may sway you.

Download pyparsing at http://pyparsing.sourceforge.net

-- Paul

from pyparsing import Word,OneOrMore

# define parse grammar
lowers = "abcdefghijklmnopqrstuvwxyz"
endsWithW = Word(lowers,"W",exact=2)
endsWithX = Word(lowers,"X",exact=2)
endsWithY = Word(lowers,"Y",exact=2)

patt = endsWithW.setResultsName("W") + \
OneOrMore( endsWithX ).setResultsName("X") + \
endsWithY.setResultsName("Y")

# extract tokens from input string
tokens = patt.parseString("aWbXcXdXeXfY")

# tokens can be accessed as a list
print "tokens:",tokens

# tokens can be coerced to be a true list
print "tokens.asList():",tokens.asList()

# tokens can be a dictionary, if results names specified
print "tokens.keys():",tokens.keys()
print "tokens['W']:",tokens['W']
print "tokens['X']:",tokens['X']
print "tokens['Y']:",tokens['Y']

# if results names are valid attribute names, can even look like attribute
print "tokens.W:",tokens.W
print "tokens.X:",tokens.X
print "tokens.Y:",tokens.Y

Gives:

tokens: ['aW', 'bX', 'cX', 'dX', 'eX', 'fY']
tokens.asList(): ['aW', 'bX', 'cX', 'dX', 'eX', 'fY']
tokens.keys(): ['Y', 'X', 'W']
tokens['W']: aW
tokens['X']: ['bX', 'cX', 'dX', 'eX']
tokens['Y']: fY
tokens.W: aW
tokens.X: ['bX', 'cX', 'dX', 'eX']
tokens.Y: fY

James Collier · Aug 12, 2004

Michael said:
Not easily; there's a small dicussion on python-dev at the moment
about this. Erik Heneryd hacked up something that might be useful:

http://mail.python.org/pipermail/python-dev/attachments/20040810/a5e602ab/structmatch.py

And there's always the "use a real parser" option

Cheers,
mwh

Many thanks for the answer Michael - I take your point on the "real parser"
option, but I don't feel that the nut I'm cracking has that thick a shell.

It is some coincidence that this should be under current discussion on
python-dev. For what it's worth, I'd support Mike Coleman's PEP.

To give some more background, I'm tweaking someone else's code and therefore
I want to keep the change as concise as is reasonable. structmatch() is exactly
what I'm looking for - but for now I'll just split the task into two parts.

Thanks again -- James.

Erik Heneryd · Aug 12, 2004

James said:
Michael Hudson [ mwh at python.net ] writes:

Not easily; there's a small dicussion on python-dev at the moment
about this. Erik Heneryd hacked up something that might be useful:

http://mail.python.org/pipermail/python-dev/attachments/20040810/a5e602ab/structmatch.py

Click to expand...

Should be added that this is a hack in the true sense of the word. I
wouldn't use it for anything other than for what it was written -
experimenting.

Erik

Groups in regular expressions don't repeat as expected	7	Apr 20, 2011
Searching for Regular Expressions in a string WITH overlap	1	Nov 21, 2008
Regular expression intricacies: why do REs skip some matches?	5	Apr 11, 2006
reusing parts of a string in RE matches?	26	May 10, 2006
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
Must be a bug in the re module [was: Why this result with the remodule]	0	Nov 3, 2010
Possible bug with Range#include?	6	Jun 9, 2007

Capturing repeating group matches in regular expressions

James Collier

Michael Hudson

Paul McGuire

James Collier

Erik Heneryd

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads