Regular Expression Grouping

F

Fabio Z Tessitore

Il Sun, 12 Aug 2007 17:21:02 +0000, linnewbie ha scritto:
Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups()
('c',)

Grateful for any clarity.

thera are () outer [], maybe you don't know what do [] mean? or you want
to know why 'c' and not 'a' or 'b'
bye
 
D

Duncan Booth

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups()
('c',)

Grateful for any clarity.
The group matches a single letter a, b, or c. That group must match one or
more times for the entire expression to match: in this case it matches 3
times once for the a, once for the b and once for the c. When a group
matches more than once, only the last match is available, i.e. the 'c'. The
matches against the a and b are discarded.

Its a bit like having some code:

x = 'a'
x = 'b'
x = 'c'
print x

and asking why x isn't 'a' and 'b' as well as 'c'.
 
M

Michael J. Fromberger

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups()
('c',)

Grateful for any clarity.

Hello!

I believe your confusion arises from the placement of the "+" operator
in your expression. You wrote:

'([abc])+'

This means, in plain language, "one or more groups in which each group
contains a string of one character from the set {a, b, c}."

Contrast this with what you probably intended, to wit:

'([abc]+)'

The latter means, in plain language, "a single group containing a string
of one or more characters from the set {a, b, c}."

In the former case, the greedy property of matching attempts to maximize
the number of times the quantified expression is matched -- thus, you
match the group three times, once for each character of "abc", and the
result shows you only the last occurrence of the matching.

Compare this with the following:

] import re
] m = re.match('([abc]+)', 'abc')
] m.groups()
=> ('abc',)

I suspect the latter is what you are after.

Cheers,
-M
 
L

linnewbie

Il Sun, 12 Aug 2007 17:21:02 +0000, linnewbie ha scritto:
Fairly new to this regex thing, so this might be very juvenile but
important.
I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups() ('c',)

Grateful for any clarity.

thera are () outer [], maybe you don't know what do [] mean? or you want
to know why 'c' and not 'a' or 'b'
bye

I sort of get what the metacharacters "(", ")" and "[" ,"]" , groups
are marked by the "(", ")" no?

So I get this:
('c',)

I can see clearly here that 'c' is group(1), because of the "..(c)..
". I cannot see how 'c' is a inner group in the expressions "([abc])
+" above?
 
S

Steve Holden

Fairly new to this regex thing, so this might be very juvenile but
important.

I cannot understand and why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups()
('c',)

Grateful for any clarity.
What's happening there is that the same group is being used three times
to complete the match, but a group can only be represented once in the
output, so you are seeing the last substring that the group matched.
Contrast with:
>>> m = re.match("([abc]+)", 'abc')
>>> m.groups() ('abc',)
>>>

I don't *think* there's any way to introduce a variable number of groups
into your match, but I don't use re's that much so someone may be able
to help if that's what you want. Is it?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 
P

Paul McGuire

I cannot understand why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups()

('c',)

It sounds from the other replies that this is just the way re's work -
if a group is represented multiple times in the matched text, only the
last matching text is returned for that group.

This sounds similar to a behavior in pyparsing, in using a results
name for the parsed results. Here is an annotated session using
pyparsing to extract this data. The explicit OneOrMore and Group
classes and oneOf method give you a little more control over the
collection and structure of the results.

-- Paul

Setup to use pyparsing, and define input string.
Use a simple pyparsing expression - matches and returns each separate
character. Each inner match can be returned as element [0], [1], or
[2] of the parsed results.['a', 'b', 'c']

Add use of Group - each single-character match is wrapped in a
subgroup.[['a'], ['b'], ['c']]

Instead of Group, set a results name on the entire pattern.
pattern = OneOrMore( oneOf("a b c") ).setResultsName("char")
print pattern.parseString(data)['char']
['a', 'b', 'c']

Set results name on the inner expression - this behavior seems most
like the regular expression behavior described in the original post.
pattern = OneOrMore( oneOf("a b c").setResultsName("char") )
print pattern.parseString(data)['char']
c

Adjust results name to retain all of the matched characters for the
given results name.
pattern = OneOrMore( oneOf("a b c").setResultsName("char",listAllMatches=True) )
print pattern.parseString(data)['char']
['a', 'b', 'c']
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top