I cannot understand why 'c' constitutes a group here without being
surrounded by "(" ,")" ?
import re
m = re.match("([abc])+", "abc")
m.groups()
('c',)
It sounds from the other replies that this is just the way re's work -
if a group is represented multiple times in the matched text, only the
last matching text is returned for that group.
This sounds similar to a behavior in pyparsing, in using a results
name for the parsed results. Here is an annotated session using
pyparsing to extract this data. The explicit OneOrMore and Group
classes and oneOf method give you a little more control over the
collection and structure of the results.
-- Paul
Setup to use pyparsing, and define input string.
Use a simple pyparsing expression - matches and returns each separate
character. Each inner match can be returned as element [0], [1], or
[2] of the parsed results.['a', 'b', 'c']
Add use of Group - each single-character match is wrapped in a
subgroup.[['a'], ['b'], ['c']]
Instead of Group, set a results name on the entire pattern.
pattern = OneOrMore( oneOf("a b c") ).setResultsName("char")
print pattern.parseString(data)['char']
['a', 'b', 'c']
Set results name on the inner expression - this behavior seems most
like the regular expression behavior described in the original post.
pattern = OneOrMore( oneOf("a b c").setResultsName("char") )
print pattern.parseString(data)['char']
c
Adjust results name to retain all of the matched characters for the
given results name.
pattern = OneOrMore( oneOf("a b c").setResultsName("char",listAllMatches=True) )
print pattern.parseString(data)['char']
['a', 'b', 'c']