Matching chars in a std::string

Discussion in 'C++' started by tech, Jun 23, 2008.

  1. tech

    tech Guest

    Hi, I need a function to specify a match pattern including using
    wildcard characters as below
    to find chars in a std::string.

    The match pattern can contain the wildcard characters "*" and "?",
    where "*" matches zero or more consecutive occurrences of any
    character and "?" matches a single occurrence of any character.

    Does boost or some other library have this capability? If boost does
    have this, do i need to include an entire
    boost library or just the bit i want. How much extra code size would
    result from just using a single
    utility function from the library?

    Thanks
    tech, Jun 23, 2008
    #1
    1. Advertising

  2. tech

    Mirco Wahab Guest

    tech wrote:
    > Hi, I need a function to specify a match pattern including using
    > wildcard characters as below
    > to find chars in a std::string.


    Use a Regular expression library.

    > The match pattern can contain the wildcard characters "*" and "?",
    > where "*" matches zero or more consecutive occurrences of any
    > character and "?" matches a single occurrence of any character.


    Example:
    using namespace boost;
    ...
    regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);

    > Does boost or some other library have this capability?


    Yes, it's called boost_regex
    http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html

    > If boost does have this, do i need to include an entire
    > boost library or just the bit i want. How much extra code size would
    > result from just using a single
    > utility function from the library?#


    On my (Linux-)System, the size of the shared library

    /usr/lib/libboost_regex.so.1.34.1

    is 768320 bytes.

    Regards

    M.
    Mirco Wahab, Jun 23, 2008
    #2
    1. Advertising

  3. tech

    Mirco Wahab Guest

    Addendum (was: Matching chars in a std::string)

    Mirco Wahab wrote:
    > tech wrote:
    >> If boost does have this, do i need to include an entire
    >> boost library or just the bit i want. How much extra code size would
    >> result from just using a single
    >> utility function from the library?#

    >
    > On my (Linux-)System, the size of the shared library
    >
    > /usr/lib/libboost_regex.so.1.34.1
    >
    > is 768320 bytes.


    I verified (sort of) my claim with a boost-1.34.1
    installation on a Suse Linux.

    The application needed the libboost_regex.so,
    which in (1.34.1) is 768320 bytes - but, the
    boost library itself links to the unicode system
    (libicu*) which (here) includes at least:

    libicudata.so - 11363116 bytes
    libicui18n.so - 1412764 bytes
    libicuuc.so - 1215688 bytes

    So the above files were required to copy
    to a "clean" location in order to get the
    program which uses boost_regex only to run.

    The term "clean" means: dependent on the
    actual configuration on the target machine,
    the installation of other libraries may
    be necessary. Doing a ldd on libboost_regex.so
    shows:
    linux-gate.so.1 => (0xffffe000)
    libicui18n.so.38 => /usr/lib/libicui18n.so.38 (0xb7d12000)
    libicuuc.so.38 => /usr/lib/libicuuc.so.38 (0xb7bea000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7afa000)
    libm.so.6 => /lib/libm.so.6 (0xb7ac6000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7ab7000)
    libc.so.6 => /lib/libc.so.6 (0xb7974000)
    libicudata.so.38 => /usr/lib/libicudata.so.38 (0xb6e9d000)
    libpthread.so.0 => /lib/libpthread.so.0 (0xb6e85000)
    /lib/ld-linux.so.2 (0xb7f27000)

    So it might be better to install static boost libraries
    on the developer machine and hand out statically linked
    "big-block" executables w/all bolts contained. I can't
    test this here because there aren't static boost libraries
    on my distribution and I'm too lazy to bother with that.

    Regards

    Mirco
    Mirco Wahab, Jun 23, 2008
    #3
  4. tech

    James Kanze Guest

    On Jun 23, 12:41 pm, Mirco Wahab <-halle.de> wrote:
    > tech wrote:
    > > Hi, I need a function to specify a match pattern including
    > > using wildcard characters as below to find chars in a
    > > std::string.


    > Use a Regular expression library.


    Yes, but...

    > > The match pattern can contain the wildcard characters "*" and "?",
    > > where "*" matches zero or more consecutive occurrences of any
    > > character and "?" matches a single occurrence of any character.


    > Example:
    > using namespace boost;
    > ...
    > regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);


    This is a joke, right. You need code to convert a match pattern
    to a regular expression; you have to convert "*' to something like
    "[^/]*", for example (under Unix---under Windows, the equivalent
    mapping would be "[^/\\]*"---and under Unix, at least, if it is
    the first thing in a filename, you also have to exclude .). And
    you have to escape the regular expression meta-characters as
    well.

    It's still easier to use a regular expression class than to do
    it all by hand, but you do need some extra code to generate
    the regular expression from the initial pattern.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 23, 2008
    #4
  5. tech

    Mirco Wahab Guest

    James Kanze wrote:
    > On Jun 23, 12:41 pm, Mirco Wahab <-halle.de> wrote:
    >> Use a Regular expression library.

    >
    > Yes, but...
    >
    >> Example:
    >> using namespace boost;
    >> ...
    >> regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);

    >
    > This is a joke, right. You need code to convert a match pattern
    > to a regular expression; you have to convert "*' to something like
    > "[^/]*", for example (under Unix---under Windows, the equivalent
    > mapping would be "[^/\\]*"---and under Unix, at least, if it is
    > the first thing in a filename, you also have to exclude .). And
    > you have to escape the regular expression meta-characters as
    > well.


    What are you talking about? There's no 'filename' mentioned
    nowhere. It's plain text processing with regular expressions
    (if I'm not completely off the road).

    > It's still easier to use a regular expression class than to do
    > it all by hand, but you do need some extra code to generate
    > the regular expression from the initial pattern.


    Not at all. The above would be (OK I made this up, its
    a pseudo expression) a valid regular expression. Other
    (maybe related) example. Find all links in a web page:

    int linkparser(const char* htmlname)
    {
    boost::regex reg(
    "(?isx-m: \
    < \\s* A [^>]* href \\s* = \
    [\"\\s]* \
    \\w+:// ([^\"\\s]*) \
    )"
    );

    string line; // read lines and perform one match/search per line
    int linecount = 0; // count lines (nice)
    ifstream fin(htmlname); // open saved .html file

    cout << "trying to find links in " << htmlname << endl;
    while( getline(fin, line) ) {
    ++linecount;
    boost::smatch match; // instantiate match variable
    if( boost::regex_search(line, match, reg) )
    cout << linecount << "\t" << match[1] << endl;
    }

    ...

    What part of the above expression exactly would you consider
    when saying:

    > you do need some extra code to generate the regular expression


    Maybe we speak of different things?

    Regards

    Mirco
    Mirco Wahab, Jun 23, 2008
    #5
  6. tech

    James Kanze Guest

    On Jun 23, 7:21 pm, Mirco Wahab <-halle.de> wrote:
    > James Kanze wrote:
    > > On Jun 23, 12:41 pm, Mirco Wahab <-halle.de> wrote:
    > >> Use a Regular expression library.


    > > Yes, but...


    > >> Example:
    > >> using namespace boost;
    > >> ...
    > >> regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);


    > > This is a joke, right. You need code to convert a match pattern
    > > to a regular expression; you have to convert "*' to something like
    > > "[^/]*", for example (under Unix---under Windows, the equivalent
    > > mapping would be "[^/\\]*"---and under Unix, at least, if it is
    > > the first thing in a filename, you also have to exclude .). And
    > > you have to escape the regular expression meta-characters as
    > > well.


    > What are you talking about? There's no 'filename' mentioned
    > nowhere. It's plain text processing with regular expressions
    > (if I'm not completely off the road).


    The pattern matching he described was wildcard matching of
    filenames, not regular expression evaluation. The conventions
    are different (but it is possible to map the wildcard matching
    to regular expressions, sort of).

    > > It's still easier to use a regular expression class than to
    > > do it all by hand, but you do need some extra code to
    > > generate the regular expression from the initial pattern.


    > Not at all. The above would be (OK I made this up, its
    > a pseudo expression) a valid regular expression.


    Yes, but it's not what he asked for. What he asked for was that
    ``"*" matches zero or more consecutive occurrences of any
    character and "?" matches a single occurrence of any
    character.'' A subset of the classical filename globbing
    patterns.

    [...]
    > What part of the above expression exactly would you consider
    > when saying:


    > > you do need some extra code to generate the regular expression


    > Maybe we speak of different things?


    I was talking about what the original poster asked for. You can
    do it with regular expressions (I have code which translates a
    Unix globbing pattern into a regular expression), but it takes
    some pre-processing.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 23, 2008
    #6
  7. tech

    Mirco Wahab Guest

    James Kanze wrote:
    >> Maybe we speak of different things?

    >
    > I was talking about what the original poster asked for. You can
    > do it with regular expressions (I have code which translates a
    > Unix globbing pattern into a regular expression), but it takes
    > some pre-processing.

    [...]
    > The pattern matching he described was wildcard matching of
    > filenames, not regular expression evaluation. The conventions
    > are different (but it is possible to map the wildcard matching
    > to regular expressions, sort of).


    This is the OP's question:
    |[Subject: Matching chars in a std::string]
    | Hi, I need a function to specify a match pattern including using
    | wildcard characters as below to find chars in a std::string. The
    | match pattern can contain the wildcard characters "*" and "?",
    | where "*" matches zero or more consecutive occurrences of any
    | character and "?" matches a single occurrence of any character.

    I fail to see anything here
    you mentioned in your two
    preceding posts.

    Regards

    Mirco
    Mirco Wahab, Jun 23, 2008
    #7
  8. tech

    James Kanze Guest

    On Jun 23, 9:46 pm, Mirco Wahab <-halle.de> wrote:
    > James Kanze wrote:
    > >> Maybe we speak of different things?


    > > I was talking about what the original poster asked for. You can
    > > do it with regular expressions (I have code which translates a
    > > Unix globbing pattern into a regular expression), but it takes
    > > some pre-processing.

    > [...]
    > > The pattern matching he described was wildcard matching of
    > > filenames, not regular expression evaluation. The conventions
    > > are different (but it is possible to map the wildcard matching
    > > to regular expressions, sort of).


    > This is the OP's question:
    > |[Subject: Matching chars in a std::string]
    > | Hi, I need a function to specify a match pattern including using
    > | wildcard characters as below to find chars in a std::string. The
    > | match pattern can contain the wildcard characters "*" and "?",
    > | where "*" matches zero or more consecutive occurrences of any
    > | character and "?" matches a single occurrence of any character.


    > I fail to see anything here you mentioned in your two
    > preceding posts.


    Really? You don't see any mention of "wildcard"? You don't see
    a definition of "*" which says it matches zero or more
    consecutive occurrence of any character? You don't see a
    definition of "?" which matches a single occurance of any
    character?

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 24, 2008
    #8
  9. tech

    Mirco Wahab Guest

    James Kanze wrote:
    > On Jun 23, 9:46 pm, Mirco Wahab <-halle.de> wrote:
    >> This is the OP's question:
    >> |[Subject: Matching chars in a std::string]
    >> | Hi, I need a function to specify a match pattern including using
    >> | wildcard characters as below to find chars in a std::string. The
    >> | match pattern can contain the wildcard characters "*" and "?",
    >> | where "*" matches zero or more consecutive occurrences of any
    >> | character and "?" matches a single occurrence of any character.


    [...]

    > Really? You don't see any mention of "wildcard"? You don't see
    > a definition of "*" which says it matches zero or more
    > consecutive occurrence of any character? You don't see a
    > definition of "?" which matches a single occurance of any
    > character?


    OK, I'm sorry, my mistake. When I read your post saying:

    >>> The pattern matching he described was wildcard matching of
    >>> filenames, not regular expression evaluation. The conventions
    >>> are different (but it is possible to map the wildcard matching
    >>> to regular expressions, sort of).


    I understood it more like:

    | The pattern matching he described was wildcard matching of
    | filenames, not regular expression evaluation. The conventions
    | are different (but it is possible to map the wildcard matching
    | to regular expressions, sort of).

    So you didn't really mean:
    "/... matching of filenames, not regular expression evaluation .../"

    but rather meant exactly what the OP wanted to know. Sorry
    for not being able to deduce that from it (I'm new to c.l.c++).

    Regards & Thanks for clearing this up

    Mirco
    Mirco Wahab, Jun 24, 2008
    #9
  10. On 24 Jun, 10:12, Mirco Wahab <-halle.de> wrote:
    > James Kanze wrote:
    > > On Jun 23, 9:46 pm, Mirco Wahab <-halle.de> wrote:



    > >> This is the OP's question:
    > >> |[Subject: Matching chars in a std::string]
    > >> | Hi, I need a function to specify a match pattern including using
    > >> | wildcard characters as below to find chars in a std::string. The
    > >> | match pattern can contain the wildcard characters "*" and "?",
    > >> | where "*" matches zero or more consecutive occurrences of any
    > >> | character and "?" matches a single occurrence of any character.


    I think you're both mind-reading. You're translating what the
    user asked for into what you think he wants.

    <snip>

    > >>> The pattern matching he described was wildcard matching of
    > >>> filenames, not regular expression evaluation.


    no... I wonder if he wants pattern matching and has only seen
    file globbing. he not *know* he wants reg-exprs. I think the
    *, ? was possibly only an example.

    > >>> The conventions
    > >>> are different (but it is possible to map the wildcard matching
    > >>> to regular expressions, sort of).

    >
    > I understood it more like:
    >
    > | The pattern matching he described was wildcard matching of
    > | filenames, not regular expression evaluation.  The conventions
    > | are different (but it is possible to map the wildcard matching
    > | to regular expressions, sort of).
    >
    > So you didn't really mean:
    > "/... matching of filenames, not regular expression evaluation .../"
    >
    > but rather meant exactly what the OP wanted to know. Sorry
    > for not being able to deduce that from it (I'm new to c.l.c++).


    well it confused me too. I too thought James Kanze was insisting
    that the OP was matching file names.

    Perhaps the OP could give more info?


    --
    Nick Keighley
    Nick Keighley, Jun 24, 2008
    #10
  11. tech

    tech Guest

    On Jun 24, 10:34 am, Nick Keighley <>
    wrote:
    > On 24 Jun, 10:12, Mirco Wahab <-halle.de> wrote:
    >
    > > James Kanze wrote:
    > > > On Jun 23, 9:46 pm, Mirco Wahab <-halle.de> wrote:
    > > >> This is the OP's question:
    > > >> |[Subject: Matching chars in a std::string]
    > > >> | Hi, I need a function to specify a match pattern including using
    > > >> | wildcard characters as below to find chars in a std::string. The
    > > >> | match pattern can contain the wildcard characters "*" and "?",
    > > >> | where "*" matches zero or more consecutive occurrences of any
    > > >> | character and "?" matches a single occurrence of any character.

    >
    > I think you're both mind-reading. You're translating what the
    > user asked for into what you think he wants.
    >
    > <snip>
    >
    > > >>> The pattern matching he described was wildcard matching of
    > > >>> filenames, not regular expression evaluation.

    >
    > no... I wonder if he wants pattern matching and has only seen
    > file globbing. he not *know* he wants reg-exprs. I think the
    > *, ? was possibly only an example.
    >
    >
    >
    >
    >
    > > >>> The conventions
    > > >>> are different (but it is possible to map the wildcard matching
    > > >>> to regular expressions, sort of).

    >
    > > I understood it more like:

    >
    > > | The pattern matching he described was wildcard matching of
    > > | filenames, not regular expression evaluation.  The conventions
    > > | are different (but it is possible to map the wildcard matching
    > > | to regular expressions, sort of).

    >
    > > So you didn't really mean:
    > > "/... matching of filenames, not regular expression evaluation .../"

    >
    > > but rather meant exactly what the OP wanted to know. Sorry
    > > for not being able to deduce that from it (I'm new to c.l.c++).

    >
    > well it confused me too. I too thought James Kanze was insisting
    > that the OP was matching file names.
    >
    > Perhaps the OP could give more info?
    >
    > --
    > Nick Keighley- Hide quoted text -
    >
    > - Show quoted text -


    Sorry for not being clear, i just wanted a simple pattern matcher not
    using
    regular expressions, i think this is too much

    The match pattern can contain the wildcard characters "*" and "?",
    where "*" matches zero or more consecutive occurrences of any
    character and "?" matches a single occurrence of any character

    The std::string does not have such a match function which returns a
    bool
    tech, Jun 24, 2008
    #11
  12. tech

    James Kanze Guest

    On Jun 24, 11:12 am, Mirco Wahab <-halle.de> wrote:
    > James Kanze wrote:
    > > On Jun 23, 9:46 pm, Mirco Wahab <-halle.de> wrote:
    > >> This is the OP's question:
    > >> |[Subject: Matching chars in a std::string]
    > >> | Hi, I need a function to specify a match pattern including using
    > >> | wildcard characters as below to find chars in a std::string. The
    > >> | match pattern can contain the wildcard characters "*" and "?",
    > >> | where "*" matches zero or more consecutive occurrences of any
    > >> | character and "?" matches a single occurrence of any character.


    > [...]


    > > Really? You don't see any mention of "wildcard"? You don't
    > > see a definition of "*" which says it matches zero or more
    > > consecutive occurrence of any character? You don't see a
    > > definition of "?" which matches a single occurance of any
    > > character?


    > OK, I'm sorry, my mistake. When I read your post saying:


    > >>> The pattern matching he described was wildcard matching of
    > >>> filenames, not regular expression evaluation. The conventions
    > >>> are different (but it is possible to map the wildcard matching
    > >>> to regular expressions, sort of).


    Exactly. Since that's what he said.

    > I understood it more like:


    > | The pattern matching he described was wildcard matching of
    > | filenames, not regular expression evaluation. The conventions
    > | are different (but it is possible to map the wildcard matching
    > | to regular expressions, sort of).


    > So you didn't really mean:
    > "/... matching of filenames, not regular expression evaluation .../"


    What I meant was "wildcard matching of filenames", since that's
    what the poster described. Maybe he wants to use it for
    something else, but the patterns he decribed corresponds to
    those used in filename gobbing, not in regular expressions.

    Of course, maybe he doesn't really want what he asked for, but
    is looking for something else. It does happen a lot here. But
    in this particular case, I've needed both at various times in
    the past, so I more or less assume that both have some utility,
    and that if he took the time to write "wildcard matching", and
    describe the conventions, it's because it didn't want "regular
    expression matching" (which uses significantly different
    conventions).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 24, 2008
    #12
  13. tech

    Jerry Coffin Guest

    In article <73b61208-0920-43d3-b9e2-a901dc7d8b55
    @m73g2000hsh.googlegroups.com>, says...

    [ ... ]

    > Sorry for not being clear, i just wanted a simple pattern matcher not
    > using regular expressions, i think this is too much
    >
    > The match pattern can contain the wildcard characters "*" and "?",
    > where "*" matches zero or more consecutive occurrences of any
    > character and "?" matches a single occurrence of any character
    >
    > The std::string does not have such a match function which returns a
    > bool


    I tend to agree -- given that the matching itself only takes up
    something like 4 lines of code, it's probably easier to do the match
    than convert to an RE, and then use an RE engine to do the job.

    #include <string>
    #include <functional>

    class patmat : public std::unary_function<char const *, bool> {
    std::string pat;

    bool match(char const *pat, char const *str) const {
    switch (*pat) {
    case '\0': return *str=='\0';
    case '*':
    return match(pat+1, str) || *str && match(pat, str+1);
    case '?': return *str && match(pat+1, str+1);
    default: return *pat==*str && match(pat+1, str+1);
    }
    }
    public:
    patmat(std::string pattern) : pat(pattern) {}

    bool operator()(std::string const &str) const {
    return(match(pat.c_str(), str.c_str()));
    }
    };

    #ifdef TEST

    #include <iostream>
    #include <vector>
    #include <algorithm>

    void test(char const * const *strings, size_t num, std::string pat) {
    std::cout << "\nTesting against " << pat << "\n";
    std::remove_copy_if(strings, strings+num,
    std::eek:stream_iterator<std::string>(std::cout, "\n"),
    std::not1(patmat(pat)));
    }

    int main() {

    char *test_strings[] = {
    "longstring",
    "a really, really long string, compared to the others",
    "string",
    "spring",
    "a string"
    };


    std::cout<< "Test strings:\n";
    std::copy(test_strings, test_strings+4,
    std::eek:stream_iterator<std::string>(std::cout, "\n"));

    test(test_strings, 5, "a*");
    test(test_strings, 5, "*g");
    test(test_strings, 5, "*s?r*g");
    test(test_strings, 5, "*st*g");
    return 0;
    }

    #endif

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
    Jerry Coffin, Jun 24, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Jansson
    Replies:
    5
    Views:
    6,297
    Ivan Vecerina
    Mar 17, 2005
  2. Fei Liu
    Replies:
    9
    Views:
    440
  3. Jeffrey Walton
    Replies:
    10
    Views:
    938
    Mathias Gaunard
    Nov 26, 2006
  4. Hongyu
    Replies:
    9
    Views:
    899
    James Kanze
    Aug 8, 2008
  5. Lars Schouw
    Replies:
    1
    Views:
    370
    Sousuke
    Mar 26, 2010
Loading...

Share This Page