string matching

Discussion in 'C Programming' started by Martijn, Jul 27, 2003.

  1. Martijn

    Martijn Guest

    Hi,

    Which is the prevalent way of matching a filename to a mask in runtime? The
    best I can think of, is sscanf.

    Thanks for the help!

    <OT>
    It's for the Windows platform, so any functions specific to that platform
    are welcome too
    </OT>

    --
    Martijn Haak
    http://www.serenceconcepts.nl
     
    Martijn, Jul 27, 2003
    #1
    1. Advertising

  2. Martijn

    Malcolm Guest

    "Martijn" <> wrote in message
    >
    > Which is the prevalent way of matching a filename to a mask in
    > runtime? The best I can think of, is sscanf.
    >

    You can write a wildcard matcher. It should take about a day (depending of
    course on your experience level). I think several have been posted in the ng
    not too long ago.

    int matchwild(char *str, char *pattern)

    >
    > <OT>
    > It's for the Windows platform, so any functions specific to that platform
    > are welcome too

    I think the Windows FindFirstFile() family of functions incorporate a
    wildcard matching facility.
    > </OT>
     
    Malcolm, Jul 27, 2003
    #2
    1. Advertising

  3. Martijn

    Martijn Guest

    Malcolm wrote:
    > You can write a wildcard matcher. It should take about a day
    > (depending of course on your experience level). I think several have
    > been posted in the ng not too long ago.
    >
    > int matchwild(char *str, char *pattern)


    I'll check what I can find in Google Groups. Thanks for the pointer.

    >> <OT>
    >> It's for the Windows platform, so any functions specific to that
    >> platform are welcome too

    > I think the Windows FindFirstFile() family of functions incorporate a
    > wildcard matching facility.

    I already have the filename, so I really need to match it.
    >> </OT>


    Thanks for the help!

    --
    Martijn Haak
    http://www.serenceconcepts.nl
     
    Martijn, Jul 27, 2003
    #3
  4. On Sun, 27 Jul 2003 17:32:45 -0400, Malcolm wrote:

    >> Which is the prevalent way of matching a filename to a mask in runtime?
    >> The best I can think of, is sscanf.
    >>

    > You can write a wildcard matcher. It should take about a day (depending


    This will match '*' and '?' expressions. I've also seen a version
    of this floating around that's actually a little smaller but it used
    recursion so this should be a little faster.

    Mike

    int
    matchwild(const unsigned char *name, const unsigned char *pat)
    {
    const unsigned char *spos, *wpos;

    spos = wpos = name;
    while (*name && *pat != '*') {
    if (*pat != *name && *pat != '?') {
    return 0;
    }
    name++;
    pat++;
    }

    while (*name) {
    if (*pat == '*') {
    if (*++pat == '\0') {
    return 1;
    }
    wpos = pat;
    spos = name + 1;
    } else if (*pat == *name || *pat == '?') {
    pat++;
    name++;
    } else {
    pat = wpos;
    name = spos++;
    }
    }

    while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
    pat++;
    }
    return *pat == '\0';
    }
     
    Michael B Allen, Jul 28, 2003
    #4
  5. Martijn

    Martijn Guest

    Michael B Allen wrote:
    > This will match '*' and '?' expressions. I've also seen a version
    > of this floating around that's actually a little smaller but it used
    > recursion so this should be a little faster.
    >
    > int
    > matchwild(const unsigned char *name, const unsigned char *pat)
    > {


    [snipped]


    Was this taken from this site:
    http://space.tin.it/scienza/acantato/wildmatch.html ?

    I also stumbled upon fnmatch, but that also matches collective patterns,
    those within brackets ([]).

    I'll try this one, see if it works as expected. Are there any credits due?

    Thanks,

    --
    Martijn Haak
    http://www.serenceconcepts.nl
     
    Martijn, Jul 28, 2003
    #5
  6. On Mon, 28 Jul 2003 04:14:40 -0400, Martijn wrote:

    > Michael B Allen wrote:
    >> This will match '*' and '?' expressions. I've also seen a version of
    >> this floating around that's actually a little smaller but it used
    >> recursion so this should be a little faster.
    >>
    >> int
    >> matchwild(const unsigned char *name, const unsigned char *pat) {

    >
    > [snipped]
    >
    >
    > Was this taken from this site:
    > http://space.tin.it/scienza/acantato/wildmatch.html ?


    No. I had no idea there were so many permutations of this. I found this
    one on some programming website posted to an open forum.

    Mike
     
    Michael B Allen, Jul 28, 2003
    #6
  7. Code:
     Re: string matching[/b]
    
    On Mon, 28 Jul 2003, Michael B Allen wrote:[color=blue]
    >
    > On Sun, 27 Jul 2003 17:32:45 -0400, Malcolm wrote:[color=green]
    > >
    > > You can write a wildcard matcher. It should take about a day[/color]
    >
    > This will match '*' and '?' expressions. I've also seen a version
    > of this floating around that's actually a little smaller but it used
    > recursion so this should be a little faster.[/color]
    
    Here's a recursive version of the regex-style pattern matcher,
    translated from some Pascal source I forget where.  It should be
    very easy to remove the regex functionality, at which point you'll
    have a regular old DOS-style '*'/'?' wildcard matcher.
    Bugfixes welcome.
    
    -Arthur
    
    
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #include <limits.h>
    #include "RegEx.h"  /* just the corresponding header file */
    
    /*
     * A relatively inefficient and simple version of regular expressions.
     * Recurses on the end of each text in order to determine whether it
     * matches the given regex.
     *
     * Currently supports:
     *
     *    .    Any character
     *  [...]  Any character in a set
     *  [^..]  Any character NOT in a set
     *    X    Plaintext character
     *    \    Backslash escape, inside and outside of sets
     *    *    Repeat zero or more times
     *    +    Repeat one or more times
     *    ?    Repeat zero or one times
     *
     * matches_regex() returns 1 on success, 0 on failure, or -1 if given
     * a malformed regular expression.
     *
     */
    
    #define RE_ANY 1
    #define RE_SET 2
    #define RE_NOTSET 3
    #define RE_ONE 4
    
    static int matches_single(int ch, int type, char matchset[UCHAR_MAX+1]);
    
    static int m_regex(const char *text, const char *regex, int match_case)
    {
        int to_match;
        char matchset[UCHAR_MAX+1] = {0};
    
        if (*regex == '\0') {
            return (*text == '\0');
        }
    
        switch (*regex)
        {
            case '.':
                to_match = RE_ANY;
                ++regex;
                break;
            case '[':
            {
                to_match = RE_SET;
                ++regex;
                if (*regex == '^') {
                    to_match = RE_NOTSET;
                    ++regex;
                }
                for (++regex; *regex != ']'; ++regex) {
                    if (*regex == '\\') {
                        ++regex;
                    }
                    if (*regex == '\0')
                      return -1;
                    matchset[(int) *regex] = 1;
    
                    if (match_case == 0) {
                        matchset[toupper(*regex)] = 1;
                        matchset[tolower(*regex)] = 1;
                    }
                }
                ++regex;
                break;
            }
            default:
            {
                if (*regex == '\\') {
                    ++regex;
                    if (*regex == '\0') return -1;
                }
                to_match = RE_ONE;
                matchset[(int) *regex] = 1;
    
                if (match_case == 0) {
                    matchset[toupper(*regex)] = 1;
                    matchset[tolower(*regex)] = 1;
                }
    
                ++regex;
                break;
            }
        }
    
        if (*regex == '+') {
            /* Match at least one character. */
            int i;
    
            if (*text == '\0')
              return 0;
    
            for (i=0; matches_single(text[i], to_match, matchset); ++i) {
                int tmp = m_regex(text+i+1, regex+1, match_case);
                if (tmp) return tmp;
            }
            return 0;
        }
        else if (*regex == '*') {
            /* Match any number of things. */
            int i;
            int tmp;
    
            tmp = m_regex(text, regex+1, match_case);
            if (tmp) return tmp;
    
            for (i=0; text[i] && matches_single(text[i], to_match, matchset); ++i) {
                tmp = m_regex(text+i+1, regex+1, match_case);
                if (tmp) return tmp;
            }
            return 0;
        }
        else if (*regex == '?') {
            /* Match zero or one things. */
            int tmp;
    
            tmp = m_regex(text, regex+1, match_case);
            if (tmp) return tmp;
    
            if (*text && matches_single(*text, to_match, matchset)) {
                tmp = m_regex(text+1, regex+1, match_case);
            }
            return tmp;
        }
        else {
            /* Match exactly one thing. */
            if (*text == '\0')
              return 0;
            else if (matches_single(*text, to_match, matchset)) {
                return m_regex(text+1, regex, match_case);
            }
            else return 0;
        }
    }
    
    
    static int matches_single(int ch, int type, char matchset[UCHAR_MAX+1])
    {
        if (type == RE_ANY) {
            return 1;
        }
        else if (type == RE_SET) {
            return (matchset[ch]);
        }
        else if (type == RE_NOTSET) {
            return ! (matchset[ch]);
        }
        else if (type == RE_ONE) {
            return (matchset[ch]);
        }
        return 0;
    }
    
    
    
    int matches_regex(const char *text, const char *regex)
    {
        return m_regex(text, regex, 1);
    }
    
    int matchesi_regex(const char *text, const char *regex)
    {
        return m_regex(text, regex, 0);
    }
     
    Arthur J. O'Dwyer, Jul 28, 2003
    #7
  8. Martijn

    James Antill Guest

    On Mon, 28 Jul 2003 01:05:20 -0400, Michael B Allen wrote:

    > On Sun, 27 Jul 2003 17:32:45 -0400, Malcolm wrote:
    >
    >>> Which is the prevalent way of matching a filename to a mask in runtime?
    >>> The best I can think of, is sscanf.
    >>>

    >> You can write a wildcard matcher. It should take about a day (depending

    >
    > This will match '*' and '?' expressions. I've also seen a version
    > of this floating around that's actually a little smaller but it used
    > recursion so this should be a little faster.


    [snip ... ]

    > while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
    > pat++;
    > }
    > return *pat == '\0';
    > }


    The last while loop doesn't look right as it makes...

    matchwild("ab", "?????") == 1



    --
    James Antill --
    Need an efficent and powerful string library for C?
    http://www.and.org/vstr/
     
    James Antill, Jul 28, 2003
    #8
  9. On Mon, 28 Jul 2003 12:31:29 -0400, James Antill wrote:

    >>>> Which is the prevalent way of matching a filename to a mask in
    >>>> runtime?
    >>>> The best I can think of, is sscanf.
    >>>>
    >>> You can write a wildcard matcher. It should take about a day
    >>> (depending

    >>
    >> This will match '*' and '?' expressions. I've also seen a version of
    >> this floating around that's actually a little smaller but it used
    >> recursion so this should be a little faster.

    >
    > [snip ... ]
    >
    >> while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
    >> pat++;
    >> }
    >> return *pat == '\0';
    >> }

    >
    > The last while loop doesn't look right as it makes...
    >
    > matchwild("ab", "?????") == 1


    This is actually DOS behavior. Try it in a DOS window. As to wheather
    or not it's "correct" is left to your interpretation.

    Mike
     
    Michael B Allen, Jul 28, 2003
    #9
  10. Martijn

    Martijn Guest

    Michael B Allen wrote:
    >> [snip ... ]
    >>
    >>> while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
    >>> pat++;
    >>> }
    >>> return *pat == '\0';
    >>> }

    >>
    >> The last while loop doesn't look right as it makes...
    >>
    >> matchwild("ab", "?????") == 1

    >
    > This is actually DOS behavior. Try it in a DOS window. As to wheather
    > or not it's "correct" is left to your interpretation.


    Actually, these should not match, but the routine is a good start.

    --
    Martijn Haak
    http://www.serenceconcepts.nl
     
    Martijn, Jul 29, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Seth
    Replies:
    1
    Views:
    1,080
    Ray Andraka
    Aug 1, 2003
  2. Replies:
    1
    Views:
    599
    Craig Deelsnyder
    Oct 25, 2003
  3. =?ISO-8859-1?Q?Martin_J=F8rgensen?=
    Replies:
    5
    Views:
    1,303
    =?ISO-8859-1?Q?Martin_J=F8rgensen?=
    May 6, 2006
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    238
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    234
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page