Searching of byte array with wildcards?

R

Richard Berg

Hello,

I need to search a byte[] array for a sequence of bytes. The sequence
may include wildcards. For example if the array contains 0xAA, 0xBB,
0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
matches... I've been all around Google but still can't find any
suggestions on how anything like this can be implemented..... Can
someone please give me a clue? Some well-known algorithm maybe?

Thanks!
 
J

Jeff Schwab

Richard said:
I need to search a byte[] array for a sequence of bytes. The sequence
may include wildcards. For example if the array contains 0xAA, 0xBB,
0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
matches... I've been all around Google but still can't find any
suggestions on how anything like this can be implemented..... Can
someone please give me a clue? Some well-known algorithm maybe?

Try boost::regex.
 
S

Siemel Naran

Richard Berg said:
I need to search a byte[] array for a sequence of bytes. The sequence
may include wildcards. For example if the array contains 0xAA, 0xBB,
0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
matches... I've been all around Google but still can't find any
suggestions on how anything like this can be implemented..... Can
someone please give me a clue? Some well-known algorithm maybe?

Your post example suggests you're looking for ? or . matches, that is match
any one character. In this case, you can use the strstr or
std::search algorithm with an appropriate comparison function.

struct eqany {
bool operator()(char c1, char c2) const {
if (c2=='?') return true;
return c1 == c2;
}
};

using std::string;
const string source("ababcdabcxd");
const string expected("cx");
string::const_iterator find =
std::search(source.begin(), source.end(),
expected.begin(), expected.end(),
eqany());


The worst case running time of the above algorithm is O(n*m) where
n = length(source) and m = length(expected).

If all the chars in the expected string are unique we can get O(n) running
time. Maintain a pointer to the expected char in the expected string,
initially pointing at 'c'. Now loop over the chars in the source string.
When you find a char that matches increment
the expected pointer. You'll loop through "ababc" and the last 'c' matches,
so you increment the expected
pointer to expect 'x'. But the next char is 'd' so reset the expected
pointer back to 'c'. So eventually you'll get to "ababcdabc" and the
'c' matches, so increment the expected pointer to 'x'. The next
character in the sequence is 'x' which matches the expected character,
so increment the expected pointer. At this point there are no more
expected chars so it means you've found match.

To match any number of characters, we could call std::search
repeatedly. Suppose the expected string is "ba*cx". Split this
into "ba" and "cx". Search the source string for "ba". If not
found then "ba*cx" couldn't possibly be found either. But if
"ba" found then find "cx".

using std::string;
typedef string::const_iterator Iter;
const string source("ababcdabcxd");
const string expected("ba*cx");
Iter expect = expected.begin();
while (expect!=expected.end())
{
using std::find;
using std::search;
Iter expectend = find(expect, expect.end(), '*');
Iter s = search(source.begin(), source.end(), expect, expectend);
if (s == source.end()) return false; // not found
if (expectend == expected.end()) return true;
expect = expectend;
++expect;
++s;
}


The efficient algorithm is to loop over the chars in the search string
's' or "ababcdabcxd". It is sufficient to loop over the first
length(s)-length(s2)+1 chars.

Note the std::search algorithm does not assume null terminated arrays
and works with any container, such as std::vector<char>,
std::vector<int>, etc.
 
M

Matt

Richard said:
Hello,

I need to search a byte[] array for a sequence of bytes. The sequence
may include wildcards. For example if the array contains 0xAA, 0xBB,
0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
matches... I've been all around Google but still can't find any
suggestions on how anything like this can be implemented..... Can
someone please give me a clue? Some well-known algorithm maybe?

Thanks!

See language theory books such as _Introduction to Automata Theory,
Languages, and Computation_, by Hopcroft and Ullman, 1979, or similar.

See compiler textbooks such as _Compilers: Principles, Techniques, and
Tools_, by Aho, Sethi, and Ullman, 1987, Chapter 3: Lexical Analysis, or
similar.

Topics: regular expressions, regular languages, deterministic finite
automata, lexical analysis.

You would build a little dynamic lexical-analyzer generator. The
generated lexical analyzer would be a little deterministic finite
automaton (DFA). Start off reading about DFAs and regular expressions
and you will rather soon see how to do it.
 
S

Siemel Naran

Matt said:
See language theory books such as _Introduction to Automata Theory,
Languages, and Computation_, by Hopcroft and Ullman, 1979, or similar.

See compiler textbooks such as _Compilers: Principles, Techniques, and
Tools_, by Aho, Sethi, and Ullman, 1987, Chapter 3: Lexical Analysis, or
similar.

Topics: regular expressions, regular languages, deterministic finite
automata, lexical analysis.

You would build a little dynamic lexical-analyzer generator. The
generated lexical analyzer would be a little deterministic finite
automaton (DFA). Start off reading about DFAs and regular expressions
and you will rather soon see how to do it.

Fine, but the real thig. But what if std::search will suffice?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top