Does boost::regex_search support branch sentence like (....)|(....)?

¿

¿­ÖÐ

#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;

int main(int argc, char* argv[])
{
string inStr =
"123456"
"abcde"
"1-2";
std::string regstr =
"(\\d+)|"
"(a-z)";
boost::regex _reg(regstr);
boost::smatch what;
std::string::const_iterator _start = inStr.begin();
std::string::const_iterator _end = inStr.end();
while (boost::regex_search(_start, _end, what, _reg))
{
std::string msg1(what[1].first, what[1].second);
cout << msg1.c_str() << endl;
_start = what[1].second;
}
return 0;
}

run result:
123456
1
2

why can't find "abcde"?
 
J

Jerry Coffin

[ ... ]

[ ... ]
why can't find "abcde"?

My guess is that you intended to use "[a-z]" instead...

Oops -- to match 'abcde' it'd need to be something like "[a-z]+" -- by
itself, [a-z] only matches a single lower-case letter. To match more
than one, you need to append something like '+' or '*'.
 
E

Erik Wikström

#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;

int main(int argc, char* argv[])
{
string inStr =
"123456"
"abcde"
"1-2";
std::string regstr =
"(\\d+)|"
"(a-z)";

I think you intended to find a string consisting of either one or more
numbers or one or more lower-case letters. While I'm not familiar with
the Boost::regex dialect the above probably means one or more numbers
followed by a "|" followed by an "a", a "-", and a "z". You probably
want something like this:

((\\d+)|([a-z]+))

Notice that this introduces a new capturing group, there should be a way
to make the added pair of parenthesis non-capturing, probably something
like:

(?:(\\d+)|([a-z]+))
 
J

James Kanze

On 2008-05-11 04:37, 凯中 wrote:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, char* argv[])
{
string inStr =
"123456"
"abcde"
"1-2";
std::string regstr =
"(\\d+)|"
"(a-z)";
I think you intended to find a string consisting of either one
or more numbers or one or more lower-case letters. While I'm
not familiar with the Boost::regex dialect the above probably
means one or more numbers followed by a "|" followed by an
"a", a "-", and a "z".

Why wouldn't the '|' be interpreted as a meta-character here...
You probably want something like this:
((\\d+)|([a-z]+))

Since it is here?
Notice that this introduces a new capturing group, there
should be a way to make the added pair of parenthesis
non-capturing, probably something like:
(?:(\\d+)|([a-z]+))

It's not really clear what he wants to capture, and what not.
Depending on the use, you might want two separate capture groups
(in order to distinguish which side of the or was matched), or
only one---in the latter case, you'll have to put the or in a
capture group encapsulating both sides of it. I tend not to
worry about extra capture groups, since it's easy enough to
ignore them, so something like:

((\\d+)|([a-z]+))

would do the trick. (Except that in this particular case, you
don't need capture groups at all, since all you're interested in
is the complete expression. And of course, since | has the
lowest precedence, just "\\d+|[a-z]+" is really all that would
be needed.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top