string matching

M

Martijn

Hi,

Which is the prevalent way of matching a filename to a mask in runtime? The
best I can think of, is sscanf.

Thanks for the help!

<OT>
It's for the Windows platform, so any functions specific to that platform
are welcome too
</OT>
 
M

Malcolm

Martijn said:
Which is the prevalent way of matching a filename to a mask in
runtime? The best I can think of, is sscanf.
You can write a wildcard matcher. It should take about a day (depending of
course on your experience level). I think several have been posted in the ng
not too long ago.

int matchwild(char *str, char *pattern)
<OT>
It's for the Windows platform, so any functions specific to that platform
are welcome too
I think the Windows FindFirstFile() family of functions incorporate a
wildcard matching facility.
 
M

Martijn

Malcolm said:
You can write a wildcard matcher. It should take about a day
(depending of course on your experience level). I think several have
been posted in the ng not too long ago.

int matchwild(char *str, char *pattern)

I'll check what I can find in Google Groups. Thanks for the pointer.
I think the Windows FindFirstFile() family of functions incorporate a
wildcard matching facility.
I already have the filename, so I really need to match it.
Thanks for the help!
 
M

Michael B Allen

You can write a wildcard matcher. It should take about a day (depending

This will match '*' and '?' expressions. I've also seen a version
of this floating around that's actually a little smaller but it used
recursion so this should be a little faster.

Mike

int
matchwild(const unsigned char *name, const unsigned char *pat)
{
const unsigned char *spos, *wpos;

spos = wpos = name;
while (*name && *pat != '*') {
if (*pat != *name && *pat != '?') {
return 0;
}
name++;
pat++;
}

while (*name) {
if (*pat == '*') {
if (*++pat == '\0') {
return 1;
}
wpos = pat;
spos = name + 1;
} else if (*pat == *name || *pat == '?') {
pat++;
name++;
} else {
pat = wpos;
name = spos++;
}
}

while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
pat++;
}
return *pat == '\0';
}
 
M

Martijn

Michael said:
This will match '*' and '?' expressions. I've also seen a version
of this floating around that's actually a little smaller but it used
recursion so this should be a little faster.

int
matchwild(const unsigned char *name, const unsigned char *pat)
{

[snipped]


Was this taken from this site:
http://space.tin.it/scienza/acantato/wildmatch.html ?

I also stumbled upon fnmatch, but that also matches collective patterns,
those within brackets ([]).

I'll try this one, see if it works as expected. Are there any credits due?

Thanks,
 
M

Michael B Allen

Michael said:
This will match '*' and '?' expressions. I've also seen a version of
this floating around that's actually a little smaller but it used
recursion so this should be a little faster.

int
matchwild(const unsigned char *name, const unsigned char *pat) {

[snipped]


Was this taken from this site:
http://space.tin.it/scienza/acantato/wildmatch.html ?

No. I had no idea there were so many permutations of this. I found this
one on some programming website posted to an open forum.

Mike
 
A

Arthur J. O'Dwyer

This will match '*' and '?' expressions. I've also seen a version
of this floating around that's actually a little smaller but it used
recursion so this should be a little faster.

Here's a recursive version of the regex-style pattern matcher,
translated from some Pascal source I forget where. It should be
very easy to remove the regex functionality, at which point you'll
have a regular old DOS-style '*'/'?' wildcard matcher.
Bugfixes welcome.

-Arthur


#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <limits.h>
#include "RegEx.h" /* just the corresponding header file */

/*
* A relatively inefficient and simple version of regular expressions.
* Recurses on the end of each text in order to determine whether it
* matches the given regex.
*
* Currently supports:
*
* . Any character
* [...] Any character in a set
* [^..] Any character NOT in a set
* X Plaintext character
* \ Backslash escape, inside and outside of sets
* * Repeat zero or more times
* + Repeat one or more times
* ? Repeat zero or one times
*
* matches_regex() returns 1 on success, 0 on failure, or -1 if given
* a malformed regular expression.
*
*/

#define RE_ANY 1
#define RE_SET 2
#define RE_NOTSET 3
#define RE_ONE 4

static int matches_single(int ch, int type, char matchset[UCHAR_MAX+1]);

static int m_regex(const char *text, const char *regex, int match_case)
{
int to_match;
char matchset[UCHAR_MAX+1] = {0};

if (*regex == '\0') {
return (*text == '\0');
}

switch (*regex)
{
case '.':
to_match = RE_ANY;
++regex;
break;
case '[':
{
to_match = RE_SET;
++regex;
if (*regex == '^') {
to_match = RE_NOTSET;
++regex;
}
for (++regex; *regex != ']'; ++regex) {
if (*regex == '\\') {
++regex;
}
if (*regex == '\0')
return -1;
matchset[(int) *regex] = 1;

if (match_case == 0) {
matchset[toupper(*regex)] = 1;
matchset[tolower(*regex)] = 1;
}
}
++regex;
break;
}
default:
{
if (*regex == '\\') {
++regex;
if (*regex == '\0') return -1;
}
to_match = RE_ONE;
matchset[(int) *regex] = 1;

if (match_case == 0) {
matchset[toupper(*regex)] = 1;
matchset[tolower(*regex)] = 1;
}

++regex;
break;
}
}

if (*regex == '+') {
/* Match at least one character. */
int i;

if (*text == '\0')
return 0;

for (i=0; matches_single(text, to_match, matchset); ++i) {
int tmp = m_regex(text+i+1, regex+1, match_case);
if (tmp) return tmp;
}
return 0;
}
else if (*regex == '*') {
/* Match any number of things. */
int i;
int tmp;

tmp = m_regex(text, regex+1, match_case);
if (tmp) return tmp;

for (i=0; text && matches_single(text, to_match, matchset); ++i) {
tmp = m_regex(text+i+1, regex+1, match_case);
if (tmp) return tmp;
}
return 0;
}
else if (*regex == '?') {
/* Match zero or one things. */
int tmp;

tmp = m_regex(text, regex+1, match_case);
if (tmp) return tmp;

if (*text && matches_single(*text, to_match, matchset)) {
tmp = m_regex(text+1, regex+1, match_case);
}
return tmp;
}
else {
/* Match exactly one thing. */
if (*text == '\0')
return 0;
else if (matches_single(*text, to_match, matchset)) {
return m_regex(text+1, regex, match_case);
}
else return 0;
}
}


static int matches_single(int ch, int type, char matchset[UCHAR_MAX+1])
{
if (type == RE_ANY) {
return 1;
}
else if (type == RE_SET) {
return (matchset[ch]);
}
else if (type == RE_NOTSET) {
return ! (matchset[ch]);
}
else if (type == RE_ONE) {
return (matchset[ch]);
}
return 0;
}



int matches_regex(const char *text, const char *regex)
{
return m_regex(text, regex, 1);
}

int matchesi_regex(const char *text, const char *regex)
{
return m_regex(text, regex, 0);
}
 
J

James Antill

This will match '*' and '?' expressions. I've also seen a version
of this floating around that's actually a little smaller but it used
recursion so this should be a little faster.

[snip ... ]
while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
pat++;
}
return *pat == '\0';
}

The last while loop doesn't look right as it makes...

matchwild("ab", "?????") == 1
 
M

Michael B Allen

This will match '*' and '?' expressions. I've also seen a version of
this floating around that's actually a little smaller but it used
recursion so this should be a little faster.

[snip ... ]
while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
pat++;
}
return *pat == '\0';
}

The last while loop doesn't look right as it makes...

matchwild("ab", "?????") == 1

This is actually DOS behavior. Try it in a DOS window. As to wheather
or not it's "correct" is left to your interpretation.

Mike
 
M

Martijn

Michael said:
[snip ... ]
while (*pat == '*' || (*pat && *(pat - 1) == '?')) {
pat++;
}
return *pat == '\0';
}

The last while loop doesn't look right as it makes...

matchwild("ab", "?????") == 1

This is actually DOS behavior. Try it in a DOS window. As to wheather
or not it's "correct" is left to your interpretation.

Actually, these should not match, but the routine is a good start.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top