Finding a substring in a binary string

T

Tarun

Hi All,

I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.

Regards
Tarun
 
D

Daniel Cer

I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.

I assume you don't want to use memmem since it's a gnu extension and thus
would make the resulting code less portable.

If this is case, and if nothing from the standard library fits the bill,
why not just write your own function to do the comparison?

e.g. something like:

/* binstr - searches a block of memory for a given character string
*
* params:
* bin - pointer to region to be searched
* bin_sz - size of region in bytes
* str - null terminated character string to search for
*
* returns: 1 if the string is found, 0 otherwise
*/

int binstr(const void *bin, int bin_sz, const char *str) {
const char *c_bin; int bin_i, str_i; c_bin = bin;
for (bin_i = 0; bin_i < bin_sz; bin_i++) {
for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i]; str_i++);
if (!str[str_i]) return 1;
}
return 0;
}

-Dan
 
P

Pierre Habouzit

e.g. something like:
[...]
int binstr(const void *bin, int bin_sz, const char *str) {
const char *c_bin; int bin_i, str_i; c_bin = bin;
for (bin_i = 0; bin_i < bin_sz; bin_i++) {
for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i];
str_i++); if (!str[str_i]) return 1;
}
return 0;
}

that is higly uneffective if the searched pattern is long : complexity of
such an algorithm is O(length(bin) * length(str)) where the KMP algorithm
is O(length(bin)+length(str)).

Here is a link I found using google :
http://www-igm.univ-mlv.fr/~lecroq/string/node8.html

The C code seems reasonnable even if the naming of the variables is stupid.
 
M

Mabden

Daniel Cer said:
I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.

I assume you don't want to use memmem since it's a gnu extension and thus
would make the resulting code less portable.

If this is case, and if nothing from the standard library fits the bill,
why not just write your own function to do the comparison?

e.g. something like:

/* binstr - searches a block of memory for a given character string
*
* params:
* bin - pointer to region to be searched
* bin_sz - size of region in bytes
* str - null terminated character string to search for
*
* returns: 1 if the string is found, 0 otherwise
*/

int binstr(const void *bin, int bin_sz, const char *str) {
const char *c_bin; int bin_i, str_i; c_bin = bin;
for (bin_i = 0; bin_i < bin_sz; bin_i++) {
for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i]; str_i++);
if (!str[str_i]) return 1;
}
return 0;
}

You should just go to the local high school and set up a booth with a
sign "I'll do your homework for free!"
Then you'll be really kewl.
 
C

CBFalconer

Tarun said:
I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.

Here is some code from the archives that I wrote last year.

--------------- cut binfsrch.c ----------------
/*
Leor said:
I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id != id[j])) j = next[j];
i++; j++;
next = j;
}
#if (0)
for (i = 0; i < lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id, i, next);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */
 
D

Daniel Cer

that is higly uneffective if the searched pattern is long : complexity of
such an algorithm is O(length(bin) * length(str)) where the KMP algorithm
is O(length(bin)+length(str)).

Here is a link I found using google :
http://www-igm.univ-mlv.fr/~lecroq/string/node8.html

For many practical string matching problems, doesn't KMP not perform all
that much better than the (previously given) brute force approach?

For something that's typically much better than both of these,
you might want to check out the Boyer-Moore algorithm. links:

http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/index.html

http://en.wikipedia.org/wiki/Boyer-Moore_string_searching_algorithm

-Dan
 
C

CBFalconer

Daniel said:
.... snip ...

For many practical string matching problems, doesn't KMP not
perform all that much better than the (previously given) brute
force approach?

True. However the brute force method requires backtracking in the
searched stream. KMP totally avoids that, so that you can find a
match immediately without using any buffers.

Earlier today I published, in this thread, a demonstration
program. It can, for example, search input from an unbuffered
serial input. Modifications of the technique can find the longest
match, which can be useful in compression software.

The Boyer-Moore technique requires the entire searched area to be
buffered. When that is feasible it can offer serious run-time
improvement.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top