Regular Expression Grouping

linnewbie · Aug 12, 2007

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups()

Click to expand...

Click to expand...

('c',)

Grateful for any clarity.

Fabio Z Tessitore · Aug 12, 2007

Il Sun, 12 Aug 2007 17:21:02 +0000, linnewbie ha scritto:

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups()

Click to expand...

Click to expand...

('c',)

Grateful for any clarity.

thera are () outer [], maybe you don't know what do [] mean? or you want
to know why 'c' and not 'a' or 'b'
bye

Duncan Booth · Aug 12, 2007

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups()

Click to expand...

Click to expand...

('c',)

Grateful for any clarity.

The group matches a single letter a, b, or c. That group must match one or
more times for the entire expression to match: in this case it matches 3
times once for the a, once for the b and once for the c. When a group
matches more than once, only the last match is available, i.e. the 'c'. The
matches against the a and b are discarded.

Its a bit like having some code:

x = 'a'
x = 'b'
x = 'c'
print x

and asking why x isn't 'a' and 'b' as well as 'c'.

Michael J. Fromberger · Aug 12, 2007

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups()

Click to expand...

Click to expand...

('c',)

Grateful for any clarity.

Hello!

I believe your confusion arises from the placement of the "+" operator
in your expression. You wrote:

'([abc])+'

This means, in plain language, "one or more groups in which each group
contains a string of one character from the set {a, b, c}."

Contrast this with what you probably intended, to wit:

'([abc]+)'

The latter means, in plain language, "a single group containing a string
of one or more characters from the set {a, b, c}."

In the former case, the greedy property of matching attempts to maximize
the number of times the quantified expression is matched -- thus, you
match the group three times, once for each character of "abc", and the
result shows you only the last occurrence of the matching.

Compare this with the following:

] import re
] m = re.match('([abc]+)', 'abc')
] m.groups()
=> ('abc',)

I suspect the latter is what you are after.

Cheers,
-M

linnewbie · Aug 12, 2007

Il Sun, 12 Aug 2007 17:21:02 +0000, linnewbie ha scritto:

Fairly new to this regex thing, so this might be very juvenile but
important.

Click to expand...

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups() ('c',)

Click to expand...

Grateful for any clarity.

Click to expand...

thera are () outer [], maybe you don't know what do [] mean? or you want
to know why 'c' and not 'a' or 'b'
bye

I sort of get what the metacharacters "(", ")" and "[" ,"]" , groups
are marked by the "(", ")" no?

So I get this:
('c',)

I can see clearly here that 'c' is group(1), because of the "..(c)..
". I cannot see how 'c' is a inner group in the expressions "([abc])
+" above?

Steve Holden · Aug 12, 2007

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups()

Click to expand...

Click to expand...

('c',)

Grateful for any clarity.

What's happening there is that the same group is being used three times
to complete the match, but a group can only be represented once in the
output, so you are seeing the last substring that the group matched.
Contrast with:

>>> m = re.match("([abc]+)", 'abc')
>>> m.groups() ('abc',)
>>>

Click to expand...

Click to expand...

I don't *think* there's any way to introduce a variable number of groups
into your match, but I don't use re's that much so someone may be able
to help if that's what you want. Is it?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Paul McGuire · Aug 12, 2007

I cannot understand why 'c' constitutes a group here without being
surrounded by "(" ,")" ?

import re
m = re.match("([abc])+", "abc")
m.groups()

Click to expand...

Click to expand...

('c',)

It sounds from the other replies that this is just the way re's work -
if a group is represented multiple times in the matched text, only the
last matching text is returned for that group.

This sounds similar to a behavior in pyparsing, in using a results
name for the parsed results. Here is an annotated session using
pyparsing to extract this data. The explicit OneOrMore and Group
classes and oneOf method give you a little more control over the
collection and structure of the results.

-- Paul

Setup to use pyparsing, and define input string.
Use a simple pyparsing expression - matches and returns each separate
character. Each inner match can be returned as element [0], [1], or
[2] of the parsed results.['a', 'b', 'c']

Add use of Group - each single-character match is wrapped in a
subgroup.[['a'], ['b'], ['c']]

Instead of Group, set a results name on the entire pattern.

pattern = OneOrMore( oneOf("a b c") ).setResultsName("char")
print pattern.parseString(data)['char']

Click to expand...

Click to expand...

['a', 'b', 'c']

Set results name on the inner expression - this behavior seems most
like the regular expression behavior described in the original post.

pattern = OneOrMore( oneOf("a b c").setResultsName("char") )
print pattern.parseString(data)['char']

Click to expand...

Click to expand...

c

Adjust results name to retain all of the matched characters for the
given results name.

pattern = OneOrMore( oneOf("a b c").setResultsName("char",listAllMatches=True) )
print pattern.parseString(data)['char']

Click to expand...

Click to expand...

['a', 'b', 'c']

Regular Expression Non Capturing Grouping Does Not Work.	3	Jun 27, 2009
Question: Optional Regular Expression Grouping	4	Oct 10, 2011
Regular expression problem	13	Mar 10, 2013
help on python regular expression named group	3	Jul 16, 2013
Problem creating a regular expression to parse open-iscsi, iscsiadmoutput (help?)	5	Jun 13, 2013
grimace: a fluent regular expression generator in Python	0	Jul 15, 2013
Regular Expression Help	3	Apr 12, 2009
Regular expression and exception	2	Nov 15, 2008

Regular Expression Grouping

linnewbie

Fabio Z Tessitore

Duncan Booth

Michael J. Fromberger

linnewbie

Steve Holden

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads