iterating over sub-matches using std::tr1::regex?

DomoChan · Aug 13, 2008

Given a repeatable group expression

([abc])+

and given its input

cab

will result in nested subgroups, which taken from 'rad software
regular expression tester' looks like

Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1

Id like to use the regex classes found in std::tr1 to iterate over all
the matches in Group1.

Im using regex_search to fill a smatch object. I need to go one more
step to iterate over the matches found in Group1. Can anyone tell me
what I need to do to iterate over the sub-matches?

I've tried the following, but it doesnt seem to work

// note: initialResults is sucessfully filled with a single group
match
regex_search( "cab", initialResults, "([abc])+" );

for ( size_t ii = 1; ii < initialResults.size(); ++ii )
{
ssub_match groupResults;
// note: groupResults.matches is false. groupResults.first is
NULL, as is groupResults.second
groupResults.compare( initialResults[ ii ] );
}

Thanks for any assistance!
-Velik

DomoChan · Aug 13, 2008

* (e-mail address removed):

Given a repeatable group expression

([abc])+

Click to expand...

and given its input

cab

Click to expand...

will result in nested subgroups, which taken from 'rad software
regular expression tester' looks like

Click to expand...

Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1

Click to expand...

Id like to use the regex classes found in std::tr1 to iterate over all
the matches in Group1.

Click to expand...

Im using regex_search to fill a smatch object. I need to go one more
step to iterate over the matches found in Group1. Can anyone tell me
what I need to do to iterate over the sub-matches?

Click to expand...

I've tried the following, but it doesnt seem to work

Click to expand...

// note: initialResults is sucessfully filled with a single group
match
regex_search( "cab", initialResults, "([abc])+" );

Click to expand...

for ( size_t ii = 1; ii < initialResults.size(); ++ii )
{
ssub_match groupResults;
// note: groupResults.matches is false. groupResults.first is
NULL, as is groupResults.second
groupResults.compare( initialResults[ ii ] );
}

Click to expand...

Not sure exactly what you're talking about, but if I understand it correctly you
want all possible matches of a single character from a specific set of chars.

Then why not use ([abc]).

All possible matches of ([abc])+, if I read it correctly as 1 or more successive
characters drawn from the set {a, b, c}, for a string of length N of consisting
of those characters only, well that's N + (N-1) + ... + 1 = (N^2 + N + 1)/2
matches, and surely you don't want that, or do you?

Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

You seem to clearly understand the expression, but perhaps I didnt use
an accurate expression to explain my situation.

If I changed my input string to "cab bat mac", the results would then
contain\

Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1
Match 'ba'
- Group 1
- b at pos 4 length 1
- a at pos 5 length 1
Match 'ac'
- Group 1
- a at pos 9 length 1
- c at pos 10 length 1

so 'cab', 'ba', and 'ac' are stored in initalResults, and I can
iterate over those easily using a for loop and using the 'smatch'
indexer. However, Im interested in the individual results within each
group, so from the first match 'cab' i want to be able to iterate over
that group and read [0] = 'c', [1] = 'a', [2] = 'b'. So, thats what
Im tring to use 'ssub_match' for. but, im sure im not using it
correctly.

Let me know if im still vague.

Thanks again!

DomoChan · Aug 13, 2008

* (e-mail address removed):

* (e-mail address removed):
Given a repeatable group expression
([abc])+
and given its input
cab
will result in nested subgroups, which taken from 'rad software
regular expression tester' looks like
Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1
Id like to use the regex classes found in std::tr1 to iterate over all
the matches in Group1.
Im using regex_search to fill a smatch object. I need to go one more
step to iterate over the matches found in Group1. Can anyone tell me
what I need to do to iterate over the sub-matches?
I've tried the following, but it doesnt seem to work
// note: initialResults is sucessfully filled with a single group
match
regex_search( "cab", initialResults, "([abc])+" );
for ( size_t ii = 1; ii < initialResults.size(); ++ii )
{
ssub_match groupResults;
// note: groupResults.matches is false. groupResults.first is
NULL, as is groupResults.second
groupResults.compare( initialResults[ ii ] );
}
Not sure exactly what you're talking about, but if I understand it correctly you
want all possible matches of a single character from a specific set of chars.
Then why not use ([abc]).
All possible matches of ([abc])+, if I read it correctly as 1 or more successive
characters drawn from the set {a, b, c}, for a string of length N of consisting
of those characters only, well that's N + (N-1) + ... + 1 = (N^2 + N + 1)/2
matches, and surely you don't want that, or do you?

Click to expand...

Click to expand...

Please don't quote signatures.

You seem to clearly understand the expression, but perhaps I didnt use
an accurate expression to explain my situation.

Click to expand...

If I changed my input string to "cab bat mac", the results would then
contain\

Click to expand...

Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1
Match 'ba'
- Group 1
- b at pos 4 length 1
- a at pos 5 length 1
Match 'ac'
- Group 1
- a at pos 9 length 1
- c at pos 10 length 1

Click to expand...

so 'cab', 'ba', and 'ac' are stored in initalResults, and I can
iterate over those easily using a for loop and using the 'smatch'
indexer.

Click to expand...

Can you? I don't see how, if you're using the code shown earlier. Didn't work
for me.

However, Im interested in the individual results within each
group, so from the first match 'cab' i want to be able to iterate over
that group and read [0] = 'c', [1] = 'a', [2] = 'b'. So, thats what
Im tring to use 'ssub_match' for. but, im sure im not using it
correctly.

Click to expand...

Let me know if im still vague.

Click to expand...

No, it seems pretty clear.

I reproduced the output shown above by using a sregex_iterator to iterate over
the matches for "([abc])+", and an inner loop with sregex_iterator to iterate
over the "([abc])" matches in each match (as suggested in my previous reply). It
seems there is also capture functionality that can do this more directly, but
requires recompilation of the regex library with certain switches, and affects
efficiency in general, i.e. not just when it's used. I didn't try that.

Since this might be a school homework assignment, or an exercise you're doing in
order to learn from the experience of doing it, I'm not enclosing the code, but
yes, with this simple expression it's not only possible but simple, as
described, and I'm too lazy to think about whether a more complex expression
might present problems. ;-) I did use some time on it though: building the regex
library (never used) and checking the docs. But well used time, learned some!

Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Can you? I don't see how, if you're using the code shown earlier. Didn't work
for me.

yes... you can. see "http://en.wikipedia.org/wiki/C+
%2B0x#Regular_expressions"

(as suggested in my previous reply)

which reply was that?

I reproduced the output ... using sregex_iterator

This is not an assignment, unless you considerate an assignment to
myself in which
case I hold no rules against cheating : ) kidding aside, this is
just syntax, not
really a logic issue and im waaay past getting any personal
gratification from personal
experience due to the amount of hair i've lost over this issue. at
any rate, im
writing an simple xml parser. see...

Cmn_XmlReader::Cmn_XmlReader( string xml )
{
Cmn_String::StringToList( xml, m_original, "\r\n", true );
Cmn_String::StringToList( xml, m_workingCopy, "\r\n", true );

m_desc[Header] = "Header";
m_desc[SplitTag] = "SplitTag";
m_desc[CombinedTag] = "CombinedTag";
m_desc[CloseTag] = "CloseTag";
m_desc[OpenTag] = "OpenTag";

m_regexDefs[Header] = "(<[\\?].+[\\?]>){1}";
m_regexDefs[SplitTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\

\s*>(.+?) said:
";

m_regexDefs[CloseTag] = "</(\\w+)>";
m_regexDefs[OpenTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\
\s*>";

m_patternDefs[Header] = new regex( m_regexDefs[Header] );
m_patternDefs[SplitTag] = new regex( m_regexDefs[SplitTag] );
m_patternDefs[CombinedTag] = new regex( m_regexDefs[CombinedTag] );
m_patternDefs[CloseTag] = new regex( m_regexDefs[CloseTag] );
m_patternDefs[OpenTag] = new regex( m_regexDefs[OpenTag] );

ValidateHeader();
}

It seems that boost makes it more obvious of how to access its
repeated captures via
smatch.captures()[] which doesn't exist in tr1.

void print_captures(const std::string& regx, const std::string& text)
{
boost::regex e(regx);
boost::smatch what;
std::cout << "Expression: \"" << regx << "\"\n";
std::cout << "Text: \"" << text << "\"\n";
if(boost::regex_match(text, what, e, boost::match_extra))
{
unsigned i, j;
std::cout << "** Match found **\n Sub-Expressions:\n";
for(i = 0; i < what.size(); ++i)
std::cout << " $" << i << " = \"" << what << "\"\n";
std::cout << " Captures:\n";
for(i = 0; i < what.size(); ++i)
{
std::cout << " $" << i << " = {";
for(j = 0; j < what.captures(i).size(); ++j)
{
if(j)
std::cout << ", ";
else
std::cout << " ";
std::cout << "\"" << what.captures(i)[j] << "\"";
}
std::cout << " }\n";
}
}
else
{
std::cout << "** No Match found **\n";
}
}

to make matters more difficult, intellisense has not worked for any
tr1 objects, so viewing
methods and properties involves browsing the lengthy and cluttered
regex header file or
waiting util i start debug up to see whats what.

so, if anyone knows how to access repeated subgroups, please divulge
your knowledge and make
the forums a better place ^_^

DomoChan · Aug 13, 2008

ok, this thread has been kicked dead. please dont reply to any more
of my threads.

Thanks

Match a pattern multiple times, returning matches, captures andoffset?	9	Apr 5, 2011
running a sub inside regex	12	Nov 17, 2003
ANN: pyregex 0.5 - command line tools for Python's regular expression	0	Mar 10, 2006
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
How bad is $'? (Was: "Get substring of line")	4	Jan 18, 2005
Ruby Weekly News 17th - 23rd January 2005	3	Jan 23, 2005
Ruby Weekly News 6th - 12th June 2005	0	Jun 14, 2005

iterating over sub-matches using std::tr1::regex?

DomoChan

DomoChan

DomoChan

DomoChan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads