iterating over sub-matches using std::tr1::regex?

Discussion in 'C++' started by DomoChan@gmail.com, Aug 13, 2008.

  1. Guest

    Given a repeatable group expression

    ([abc])+

    and given its input

    cab

    will result in nested subgroups, which taken from 'rad software
    regular expression tester' looks like

    Match 'cab'
    - Group 1
    - c at pos 0 length 1
    - a at pos 1 length 1
    - b at pos 2 length 1

    Id like to use the regex classes found in std::tr1 to iterate over all
    the matches in Group1.

    Im using regex_search to fill a smatch object. I need to go one more
    step to iterate over the matches found in Group1. Can anyone tell me
    what I need to do to iterate over the sub-matches?

    I've tried the following, but it doesnt seem to work

    // note: initialResults is sucessfully filled with a single group
    match
    regex_search( "cab", initialResults, "([abc])+" );

    for ( size_t ii = 1; ii < initialResults.size(); ++ii )
    {
    ssub_match groupResults;
    // note: groupResults.matches is false. groupResults.first is
    NULL, as is groupResults.second
    groupResults.compare( initialResults[ ii ] );
    }

    Thanks for any assistance!
    -Velik
    , Aug 13, 2008
    #1
    1. Advertising

  2. Guest

    On Aug 12, 11:13 pm, "Alf P. Steinbach" <> wrote:
    > * :
    >
    >
    >
    > > Given a repeatable group expression

    >
    > > ([abc])+

    >
    > > and given its input

    >
    > > cab

    >
    > > will result in nested subgroups, which taken from 'rad software
    > > regular expression tester' looks like

    >
    > > Match 'cab'
    > >    - Group 1
    > >        - c at pos 0 length 1
    > >        - a at pos 1 length 1
    > >        - b at pos 2 length 1

    >
    > > Id like to use the regex classes found in std::tr1 to iterate over all
    > > the matches in Group1.

    >
    > > Im using regex_search to fill a smatch object.  I need to go one more
    > > step to iterate over the matches found in Group1.  Can anyone tell me
    > > what I need to do to iterate over the sub-matches?

    >
    > > I've tried the following, but it doesnt seem to work

    >
    > > // note: initialResults is sucessfully filled with a single group
    > > match
    > > regex_search( "cab", initialResults, "([abc])+" );

    >
    > > for ( size_t ii = 1; ii < initialResults.size(); ++ii )
    > > {
    > >       ssub_match groupResults;
    > >       // note: groupResults.matches is false.  groupResults.first is
    > > NULL, as is groupResults.second
    > >       groupResults.compare( initialResults[ ii ] );
    > > }

    >
    > Not sure exactly what you're talking about, but if I understand it correctly you
    > want all possible matches of a single character from a specific set of chars.
    >
    > Then why not use ([abc]).
    >
    > All possible matches of ([abc])+, if I read it correctly as 1 or more successive
    > characters drawn from the set {a, b, c}, for a string of length N of consisting
    > of those characters only, well that's N + (N-1) + ... + 1 = (N^2 + N + 1)/2
    > matches, and surely you don't want that, or do you?
    >
    > Cheers, & hth.,
    >
    > - Alf
    >
    > --
    > A: Because it messes up the order in which people normally read text.
    > Q: Why is it such a bad thing?
    > A: Top-posting.
    > Q: What is the most annoying thing on usenet and in e-mail?


    You seem to clearly understand the expression, but perhaps I didnt use
    an accurate expression to explain my situation.

    If I changed my input string to "cab bat mac", the results would then
    contain\

    Match 'cab'
    - Group 1
    - c at pos 0 length 1
    - a at pos 1 length 1
    - b at pos 2 length 1
    Match 'ba'
    - Group 1
    - b at pos 4 length 1
    - a at pos 5 length 1
    Match 'ac'
    - Group 1
    - a at pos 9 length 1
    - c at pos 10 length 1

    so 'cab', 'ba', and 'ac' are stored in initalResults, and I can
    iterate over those easily using a for loop and using the 'smatch'
    indexer. However, Im interested in the individual results within each
    group, so from the first match 'cab' i want to be able to iterate over
    that group and read [0] = 'c', [1] = 'a', [2] = 'b'. So, thats what
    Im tring to use 'ssub_match' for. but, im sure im not using it
    correctly.

    Let me know if im still vague.

    Thanks again!
    , Aug 13, 2008
    #2
    1. Advertising

  3. Guest

    On Aug 13, 2:46 am, "Alf P. Steinbach" <> wrote:
    > * :
    >
    >
    >
    > > On Aug 12, 11:13 pm, "Alf P. Steinbach" <> wrote:
    > >> * :

    >
    > >>> Given a repeatable group expression
    > >>> ([abc])+
    > >>> and given its input
    > >>> cab
    > >>> will result in nested subgroups, which taken from 'rad software
    > >>> regular expression tester' looks like
    > >>> Match 'cab'
    > >>> - Group 1
    > >>> - c at pos 0 length 1
    > >>> - a at pos 1 length 1
    > >>> - b at pos 2 length 1
    > >>> Id like to use the regex classes found in std::tr1 to iterate over all
    > >>> the matches in Group1.
    > >>> Im using regex_search to fill a smatch object. I need to go one more
    > >>> step to iterate over the matches found in Group1. Can anyone tell me
    > >>> what I need to do to iterate over the sub-matches?
    > >>> I've tried the following, but it doesnt seem to work
    > >>> // note: initialResults is sucessfully filled with a single group
    > >>> match
    > >>> regex_search( "cab", initialResults, "([abc])+" );
    > >>> for ( size_t ii = 1; ii < initialResults.size(); ++ii )
    > >>> {
    > >>> ssub_match groupResults;
    > >>> // note: groupResults.matches is false. groupResults.first is
    > >>> NULL, as is groupResults.second
    > >>> groupResults.compare( initialResults[ ii ] );
    > >>> }
    > >> Not sure exactly what you're talking about, but if I understand it correctly you
    > >> want all possible matches of a single character from a specific set of chars.

    >
    > >> Then why not use ([abc]).

    >
    > >> All possible matches of ([abc])+, if I read it correctly as 1 or more successive
    > >> characters drawn from the set {a, b, c}, for a string of length N of consisting
    > >> of those characters only, well that's N + (N-1) + ... + 1 = (N^2 + N + 1)/2
    > >> matches, and surely you don't want that, or do you?

    >
    > Please don't quote signatures.
    >
    >
    >
    > > You seem to clearly understand the expression, but perhaps I didnt use
    > > an accurate expression to explain my situation.

    >
    > > If I changed my input string to "cab bat mac", the results would then
    > > contain\

    >
    > > Match 'cab'
    > > - Group 1
    > > - c at pos 0 length 1
    > > - a at pos 1 length 1
    > > - b at pos 2 length 1
    > > Match 'ba'
    > > - Group 1
    > > - b at pos 4 length 1
    > > - a at pos 5 length 1
    > > Match 'ac'
    > > - Group 1
    > > - a at pos 9 length 1
    > > - c at pos 10 length 1

    >
    > > so 'cab', 'ba', and 'ac' are stored in initalResults, and I can
    > > iterate over those easily using a for loop and using the 'smatch'
    > > indexer.

    >
    > Can you? I don't see how, if you're using the code shown earlier. Didn't work
    > for me.
    >
    > > However, Im interested in the individual results within each
    > > group, so from the first match 'cab' i want to be able to iterate over
    > > that group and read [0] = 'c', [1] = 'a', [2] = 'b'. So, thats what
    > > Im tring to use 'ssub_match' for. but, im sure im not using it
    > > correctly.

    >
    > > Let me know if im still vague.

    >
    > No, it seems pretty clear.
    >
    > I reproduced the output shown above by using a sregex_iterator to iterate over
    > the matches for "([abc])+", and an inner loop with sregex_iterator to iterate
    > over the "([abc])" matches in each match (as suggested in my previous reply). It
    > seems there is also capture functionality that can do this more directly, but
    > requires recompilation of the regex library with certain switches, and affects
    > efficiency in general, i.e. not just when it's used. I didn't try that.
    >
    > Since this might be a school homework assignment, or an exercise you're doing in
    > order to learn from the experience of doing it, I'm not enclosing the code, but
    > yes, with this simple expression it's not only possible but simple, as
    > described, and I'm too lazy to think about whether a more complex expression
    > might present problems. ;-) I did use some time on it though: building the regex
    > library (never used) and checking the docs. But well used time, learned some!
    >
    > Cheers, & hth.,
    >
    > - Alf
    >
    > --
    > A: Because it messes up the order in which people normally read text.
    > Q: Why is it such a bad thing?
    > A: Top-posting.
    > Q: What is the most annoying thing on usenet and in e-mail?


    > Can you? I don't see how, if you're using the code shown earlier. Didn't work
    > for me.


    yes... you can. see "http://en.wikipedia.org/wiki/C%2B
    %2B0x#Regular_expressions"

    > (as suggested in my previous reply)


    which reply was that?

    > I reproduced the output ... using sregex_iterator


    This is not an assignment, unless you considerate an assignment to
    myself in which
    case I hold no rules against cheating : ) kidding aside, this is
    just syntax, not
    really a logic issue and im waaay past getting any personal
    gratification from personal
    experience due to the amount of hair i've lost over this issue. at
    any rate, im
    writing an simple xml parser. see...

    Cmn_XmlReader::Cmn_XmlReader( string xml )
    {
    Cmn_String::StringToList( xml, m_original, "\r\n", true );
    Cmn_String::StringToList( xml, m_workingCopy, "\r\n", true );

    m_desc[Header] = "Header";
    m_desc[SplitTag] = "SplitTag";
    m_desc[CombinedTag] = "CombinedTag";
    m_desc[CloseTag] = "CloseTag";
    m_desc[OpenTag] = "OpenTag";

    m_regexDefs[Header] = "(<[\\?].+[\\?]>){1}";
    m_regexDefs[SplitTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\
    \s*>(.+?)</\\1>";
    m_regexDefs[CombinedTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\\s*/
    >";

    m_regexDefs[CloseTag] = "</(\\w+)>";
    m_regexDefs[OpenTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\
    \s*>";

    m_patternDefs[Header] = new regex( m_regexDefs[Header] );
    m_patternDefs[SplitTag] = new regex( m_regexDefs[SplitTag] );
    m_patternDefs[CombinedTag] = new regex( m_regexDefs[CombinedTag] );
    m_patternDefs[CloseTag] = new regex( m_regexDefs[CloseTag] );
    m_patternDefs[OpenTag] = new regex( m_regexDefs[OpenTag] );

    ValidateHeader();
    }

    It seems that boost makes it more obvious of how to access its
    repeated captures via
    smatch.captures()[] which doesn't exist in tr1.

    void print_captures(const std::string& regx, const std::string& text)
    {
    boost::regex e(regx);
    boost::smatch what;
    std::cout << "Expression: \"" << regx << "\"\n";
    std::cout << "Text: \"" << text << "\"\n";
    if(boost::regex_match(text, what, e, boost::match_extra))
    {
    unsigned i, j;
    std::cout << "** Match found **\n Sub-Expressions:\n";
    for(i = 0; i < what.size(); ++i)
    std::cout << " $" << i << " = \"" << what << "\"\n";
    std::cout << " Captures:\n";
    for(i = 0; i < what.size(); ++i)
    {
    std::cout << " $" << i << " = {";
    for(j = 0; j < what.captures(i).size(); ++j)
    {
    if(j)
    std::cout << ", ";
    else
    std::cout << " ";
    std::cout << "\"" << what.captures(i)[j] << "\"";
    }
    std::cout << " }\n";
    }
    }
    else
    {
    std::cout << "** No Match found **\n";
    }
    }

    to make matters more difficult, intellisense has not worked for any
    tr1 objects, so viewing
    methods and properties involves browsing the lengthy and cluttered
    regex header file or
    waiting util i start debug up to see whats what.

    so, if anyone knows how to access repeated subgroups, please divulge
    your knowledge and make
    the forums a better place ^_^
    , Aug 13, 2008
    #3
  4. Guest

    ok, this thread has been kicked dead. please dont reply to any more
    of my threads.

    Thanks
    , Aug 13, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Gottman

    std::set versus tr1::unordered_set

    Joe Gottman, Jun 3, 2005, in forum: C++
    Replies:
    1
    Views:
    7,148
    Howard Hinnant
    Jun 4, 2005
  2. Emmanuel Deloget
    Replies:
    3
    Views:
    401
    Chris Thomasson
    Mar 3, 2007
  3. Ben
    Replies:
    2
    Views:
    877
  4. carl
    Replies:
    5
    Views:
    2,361
    James Kanze
    Nov 25, 2009
  5. Lawrence D'Oliveiro

    Death To Sub-Sub-Sub-Directories!

    Lawrence D'Oliveiro, May 5, 2011, in forum: Java
    Replies:
    92
    Views:
    2,016
    Lawrence D'Oliveiro
    May 20, 2011
Loading...

Share This Page