Matching chars in a std::string

T

tech

Hi, I need a function to specify a match pattern including using
wildcard characters as below
to find chars in a std::string.

The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character.

Does boost or some other library have this capability? If boost does
have this, do i need to include an entire
boost library or just the bit i want. How much extra code size would
result from just using a single
utility function from the library?

Thanks
 
M

Mirco Wahab

tech said:
Hi, I need a function to specify a match pattern including using
wildcard characters as below
to find chars in a std::string.

Use a Regular expression library.
The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character.

Example:
using namespace boost;
...
regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);
Does boost or some other library have this capability?

Yes, it's called boost_regex
http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html
If boost does have this, do i need to include an entire
boost library or just the bit i want. How much extra code size would
result from just using a single
utility function from the library?#

On my (Linux-)System, the size of the shared library

/usr/lib/libboost_regex.so.1.34.1

is 768320 bytes.

Regards

M.
 
M

Mirco Wahab

Mirco said:
On my (Linux-)System, the size of the shared library

/usr/lib/libboost_regex.so.1.34.1

is 768320 bytes.

I verified (sort of) my claim with a boost-1.34.1
installation on a Suse Linux.

The application needed the libboost_regex.so,
which in (1.34.1) is 768320 bytes - but, the
boost library itself links to the unicode system
(libicu*) which (here) includes at least:

libicudata.so - 11363116 bytes
libicui18n.so - 1412764 bytes
libicuuc.so - 1215688 bytes

So the above files were required to copy
to a "clean" location in order to get the
program which uses boost_regex only to run.

The term "clean" means: dependent on the
actual configuration on the target machine,
the installation of other libraries may
be necessary. Doing a ldd on libboost_regex.so
shows:
linux-gate.so.1 => (0xffffe000)
libicui18n.so.38 => /usr/lib/libicui18n.so.38 (0xb7d12000)
libicuuc.so.38 => /usr/lib/libicuuc.so.38 (0xb7bea000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7afa000)
libm.so.6 => /lib/libm.so.6 (0xb7ac6000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7ab7000)
libc.so.6 => /lib/libc.so.6 (0xb7974000)
libicudata.so.38 => /usr/lib/libicudata.so.38 (0xb6e9d000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb6e85000)
/lib/ld-linux.so.2 (0xb7f27000)

So it might be better to install static boost libraries
on the developer machine and hand out statically linked
"big-block" executables w/all bolts contained. I can't
test this here because there aren't static boost libraries
on my distribution and I'm too lazy to bother with that.

Regards

Mirco
 
J

James Kanze

Use a Regular expression library.

Yes, but...
Example:
using namespace boost;
...
regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);

This is a joke, right. You need code to convert a match pattern
to a regular expression; you have to convert "*' to something like
"[^/]*", for example (under Unix---under Windows, the equivalent
mapping would be "[^/\\]*"---and under Unix, at least, if it is
the first thing in a filename, you also have to exclude .). And
you have to escape the regular expression meta-characters as
well.

It's still easier to use a regular expression class than to do
it all by hand, but you do need some extra code to generate
the regular expression from the initial pattern.
 
M

Mirco Wahab

James said:
Use a Regular expression library.

Yes, but...
Example:
using namespace boost;
...
regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);

This is a joke, right. You need code to convert a match pattern
to a regular expression; you have to convert "*' to something like
"[^/]*", for example (under Unix---under Windows, the equivalent
mapping would be "[^/\\]*"---and under Unix, at least, if it is
the first thing in a filename, you also have to exclude .). And
you have to escape the regular expression meta-characters as
well.

What are you talking about? There's no 'filename' mentioned
nowhere. It's plain text processing with regular expressions
(if I'm not completely off the road).
It's still easier to use a regular expression class than to do
it all by hand, but you do need some extra code to generate
the regular expression from the initial pattern.

Not at all. The above would be (OK I made this up, its
a pseudo expression) a valid regular expression. Other
(maybe related) example. Find all links in a web page:

int linkparser(const char* htmlname)
{
boost::regex reg(
"(?isx-m: \
< \\s* A [^>]* href \\s* = \
[\"\\s]* \
\\w+:// ([^\"\\s]*) \
)"
);

string line; // read lines and perform one match/search per line
int linecount = 0; // count lines (nice)
ifstream fin(htmlname); // open saved .html file

cout << "trying to find links in " << htmlname << endl;
while( getline(fin, line) ) {
++linecount;
boost::smatch match; // instantiate match variable
if( boost::regex_search(line, match, reg) )
cout << linecount << "\t" << match[1] << endl;
}

...

What part of the above expression exactly would you consider
when saying:
you do need some extra code to generate the regular expression

Maybe we speak of different things?

Regards

Mirco
 
J

James Kanze

James said:
Use a Regular expression library.
Yes, but...
Example:
using namespace boost;
...
regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);
This is a joke, right. You need code to convert a match pattern
to a regular expression; you have to convert "*' to something like
"[^/]*", for example (under Unix---under Windows, the equivalent
mapping would be "[^/\\]*"---and under Unix, at least, if it is
the first thing in a filename, you also have to exclude .). And
you have to escape the regular expression meta-characters as
well.
What are you talking about? There's no 'filename' mentioned
nowhere. It's plain text processing with regular expressions
(if I'm not completely off the road).

The pattern matching he described was wildcard matching of
filenames, not regular expression evaluation. The conventions
are different (but it is possible to map the wildcard matching
to regular expressions, sort of).
Not at all. The above would be (OK I made this up, its
a pseudo expression) a valid regular expression.

Yes, but it's not what he asked for. What he asked for was that
``"*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any
character.'' A subset of the classical filename globbing
patterns.

[...]
What part of the above expression exactly would you consider
when saying:
Maybe we speak of different things?

I was talking about what the original poster asked for. You can
do it with regular expressions (I have code which translates a
Unix globbing pattern into a regular expression), but it takes
some pre-processing.
 
M

Mirco Wahab

James said:
Maybe we speak of different things?

I was talking about what the original poster asked for. You can
do it with regular expressions (I have code which translates a
Unix globbing pattern into a regular expression), but it takes
some pre-processing. [...]
The pattern matching he described was wildcard matching of
filenames, not regular expression evaluation. The conventions
are different (but it is possible to map the wildcard matching
to regular expressions, sort of).

This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.

I fail to see anything here
you mentioned in your two
preceding posts.

Regards

Mirco
 
J

James Kanze

James said:
Maybe we speak of different things?
I was talking about what the original poster asked for. You can
do it with regular expressions (I have code which translates a
Unix globbing pattern into a regular expression), but it takes
some pre-processing. [...]
The pattern matching he described was wildcard matching of
filenames, not regular expression evaluation. The conventions
are different (but it is possible to map the wildcard matching
to regular expressions, sort of).
This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.
I fail to see anything here you mentioned in your two
preceding posts.

Really? You don't see any mention of "wildcard"? You don't see
a definition of "*" which says it matches zero or more
consecutive occurrence of any character? You don't see a
definition of "?" which matches a single occurance of any
character?
 
M

Mirco Wahab

James said:
This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.
[...]

Really? You don't see any mention of "wildcard"? You don't see
a definition of "*" which says it matches zero or more
consecutive occurrence of any character? You don't see a
definition of "?" which matches a single occurance of any
character?

OK, I'm sorry, my mistake. When I read your post saying:

I understood it more like:

| The pattern matching he described was wildcard matching of
| filenames, not regular expression evaluation. The conventions
| are different (but it is possible to map the wildcard matching
| to regular expressions, sort of).

So you didn't really mean:
"/... matching of filenames, not regular expression evaluation .../"

but rather meant exactly what the OP wanted to know. Sorry
for not being able to deduce that from it (I'm new to c.l.c++).

Regards & Thanks for clearing this up

Mirco
 
N

Nick Keighley

James said:
On Jun 23, 9:46 pm, Mirco Wahab <[email protected]> wrote:
This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.

I think you're both mind-reading. You're translating what the
user asked for into what you think he wants.


no... I wonder if he wants pattern matching and has only seen
file globbing. he not *know* he wants reg-exprs. I think the
*, ? was possibly only an example.
I understood it more like:

| The pattern matching he described was wildcard matching of
| filenames, not regular expression evaluation.  The conventions
| are different (but it is possible to map the wildcard matching
| to regular expressions, sort of).

So you didn't really mean:
"/... matching of filenames, not regular expression evaluation .../"

but rather meant exactly what the OP wanted to know. Sorry
for not being able to deduce that from it (I'm new to c.l.c++).

well it confused me too. I too thought James Kanze was insisting
that the OP was matching file names.

Perhaps the OP could give more info?
 
T

tech

James said:
This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.

I think you're both mind-reading. You're translating what the
user asked for into what you think he wants.


no... I wonder if he wants pattern matching and has only seen
file globbing. he not *know* he wants reg-exprs. I think the
*, ? was possibly only an example.




I understood it more like:
| The pattern matching he described was wildcard matching of
| filenames, not regular expression evaluation.  The conventions
| are different (but it is possible to map the wildcard matching
| to regular expressions, sort of).
So you didn't really mean:
"/... matching of filenames, not regular expression evaluation .../"
but rather meant exactly what the OP wanted to know. Sorry
for not being able to deduce that from it (I'm new to c.l.c++).

well it confused me too. I too thought James Kanze was insisting
that the OP was matching file names.

Perhaps the OP could give more info?

Sorry for not being clear, i just wanted a simple pattern matcher not
using
regular expressions, i think this is too much

The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character

The std::string does not have such a match function which returns a
bool
 
J

James Kanze

James said:
This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.
Really? You don't see any mention of "wildcard"? You don't
see a definition of "*" which says it matches zero or more
consecutive occurrence of any character? You don't see a
definition of "?" which matches a single occurance of any
character?
OK, I'm sorry, my mistake. When I read your post saying:

Exactly. Since that's what he said.
I understood it more like:
| The pattern matching he described was wildcard matching of
| filenames, not regular expression evaluation. The conventions
| are different (but it is possible to map the wildcard matching
| to regular expressions, sort of).
So you didn't really mean:
"/... matching of filenames, not regular expression evaluation .../"

What I meant was "wildcard matching of filenames", since that's
what the poster described. Maybe he wants to use it for
something else, but the patterns he decribed corresponds to
those used in filename gobbing, not in regular expressions.

Of course, maybe he doesn't really want what he asked for, but
is looking for something else. It does happen a lot here. But
in this particular case, I've needed both at various times in
the past, so I more or less assume that both have some utility,
and that if he took the time to write "wildcard matching", and
describe the conventions, it's because it didn't want "regular
expression matching" (which uses significantly different
conventions).
 
J

Jerry Coffin

[ ... ]
Sorry for not being clear, i just wanted a simple pattern matcher not
using regular expressions, i think this is too much

The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character

The std::string does not have such a match function which returns a
bool

I tend to agree -- given that the matching itself only takes up
something like 4 lines of code, it's probably easier to do the match
than convert to an RE, and then use an RE engine to do the job.

#include <string>
#include <functional>

class patmat : public std::unary_function<char const *, bool> {
std::string pat;

bool match(char const *pat, char const *str) const {
switch (*pat) {
case '\0': return *str=='\0';
case '*':
return match(pat+1, str) || *str && match(pat, str+1);
case '?': return *str && match(pat+1, str+1);
default: return *pat==*str && match(pat+1, str+1);
}
}
public:
patmat(std::string pattern) : pat(pattern) {}

bool operator()(std::string const &str) const {
return(match(pat.c_str(), str.c_str()));
}
};

#ifdef TEST

#include <iostream>
#include <vector>
#include <algorithm>

void test(char const * const *strings, size_t num, std::string pat) {
std::cout << "\nTesting against " << pat << "\n";
std::remove_copy_if(strings, strings+num,
std::eek:stream_iterator<std::string>(std::cout, "\n"),
std::not1(patmat(pat)));
}

int main() {

char *test_strings[] = {
"longstring",
"a really, really long string, compared to the others",
"string",
"spring",
"a string"
};


std::cout<< "Test strings:\n";
std::copy(test_strings, test_strings+4,
std::eek:stream_iterator<std::string>(std::cout, "\n"));

test(test_strings, 5, "a*");
test(test_strings, 5, "*g");
test(test_strings, 5, "*s?r*g");
test(test_strings, 5, "*st*g");
return 0;
}

#endif
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top