Best way to search through STL maps?

Discussion in 'C++' started by Steve, Nov 29, 2009.

  1. Steve

    Steve Guest

    I have the following STL containers:

    (using namespace std;)

    typedef vector<string> TokenVector; // e.g [the] [cat] [sat] [on]
    [the] [mat]

    typedef map<TokenVector, long > NGramMap; // a list of unique n-grams
    (as above), and a freq count.

    My goal is to search for n-grams that match a pattern with wildcards,
    for instance:

    [the] [cat] [sat] [?] [the] [mat]

    and return a list of all matches with all the possibilities for the
    wildcard [?], along with the freq count

    Assuming the map is sorted alphabetically, is there some way to search
    them - like a regular expression - that return pattern matches?

    The only way I can think of at the moment is to search for "[the]
    [cat] [sat]", then sift through those matches.
    This gets messy, though, when the pattern starts with a wildcard, or
    contains 2 or 3 wildcards.

    This seems like it might be a common task, so I thought I'd ask here
    before re-inventing the wheel.
    Does STL have ready-made solutions for this kind of search?

    Does it make the problem any easier if it's constrained to _always_
    being 6-grams, and that there are never more than 3 wildcards?

    Thanks for any clues, even if it's just what topics I need to study in
    order to crack it.

    Steve, Nov 29, 2009
    1. Advertisements

  2. Steve

    er Guest

    Does the question boil down to finding whether

    [the] [cat] [sat] [on] [the] [mat]


    [the] [cat] [sat] [?] [the] [mat]

    where each of the above n-grams are represented by a collection of
    strings and symbol ? is not a question mark but a wildcard?
    er, Nov 29, 2009
    1. Advertisements

  3. Steve

    Steve Guest

    Yes, succinctly put.
    Steve, Nov 29, 2009
  4. Steve

    er Guest

    I have a feeling (yes, nothing more reliable than that as I don't deal
    text processing) that working with strings might be easier than
    breaking them into tokens that are held by a STL container. And do so
    using C++ I would look into Boost.Regex. But probably you already
    thought of this and have a good reason to use tokens.
    er, Nov 29, 2009
  5. Steve

    Steve Guest

    Yes, but having still found no shortcuts, I'm beginning to think that
    keeping composite string copies for searching is the way to go.
    (I really should investigate Boost, but a quick search reveals people
    having nightmares building it for OSX due to the transition to 64-bit,
    so I'll have to hold off for a while.)


    Steve, Nov 30, 2009
  6. Steve

    er Guest

    (I really should investigate Boost, but a quick search  reveals people
    FYI I was able to build a couple of the boost libraries (the bulk of
    boost does not need installing) on Snow Leopard (64-bit). Some of the
    searches you found probably pertain to advanced features that you
    probably don't need.
    er, Nov 30, 2009
  7. Steve

    Jeff Flinn Guest

    Did you mean this:
    Jeff Flinn, Nov 30, 2009
  8. Steve

    Steve Guest

    Yes, not that exact message, but the gist of it. Well, it looks like
    it's worth persevering then.
    Steve, Nov 30, 2009
  9. Steve

    er Guest

    Does ppc64 concern you? Not me, so I did not need a patch.
    er, Nov 30, 2009
  10. Steve

    Steve Guest

    No it doesn't, but I thought specifying *just* that omission was
    tricky for a non-unix man like myself. I'm just being chicken ;) I'm
    sure we've all suffered the scenario when, in the middle of an idea,
    you get sidetracked installing something, and then 24 hours disappear,
    along with clumps of hair.
    I think I read somewhere that aspects of boost will be appearing C+
    +0x, so it's a no-brainer that I need to start making good use of it.I
    may be gone some time ;)

    PS Great to be talking to someone here using OSX, I almost thought it
    wasn't worth mentioning.
    Steve, Dec 1, 2009
  11. Steve

    er Guest

    You might be able to spare that by reading/posting to the
    mailing list. As far as building a library (again most don't require
    building), my own shameful tribulations are reported in posts 10/20
    and 11/2. In each case, it turns out I was overlooking simplicity.
    er, Dec 1, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.