Finding a substring in a binary string

Discussion in 'C Programming' started by Tarun, Aug 23, 2005.

  1. Tarun

    Tarun Guest

    Hi All,

    I need to find a particular substring in a binary string(some text
    appended & prepended to binary data).
    I cant use strstr since it terminates on receiving '\0'which can be
    there in binary data also I cant use memmem. Is there any other
    available function to do this.

    Regards
    Tarun
    Tarun, Aug 23, 2005
    #1
    1. Advertising

  2. Tarun

    Daniel Cer Guest

    > I need to find a particular substring in a binary string(some text
    > appended & prepended to binary data).
    > I cant use strstr since it terminates on receiving '\0'which can be
    > there in binary data also I cant use memmem. Is there any other
    > available function to do this.
    >


    I assume you don't want to use memmem since it's a gnu extension and thus
    would make the resulting code less portable.

    If this is case, and if nothing from the standard library fits the bill,
    why not just write your own function to do the comparison?

    e.g. something like:

    /* binstr - searches a block of memory for a given character string
    *
    * params:
    * bin - pointer to region to be searched
    * bin_sz - size of region in bytes
    * str - null terminated character string to search for
    *
    * returns: 1 if the string is found, 0 otherwise
    */

    int binstr(const void *bin, int bin_sz, const char *str) {
    const char *c_bin; int bin_i, str_i; c_bin = bin;
    for (bin_i = 0; bin_i < bin_sz; bin_i++) {
    for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i]; str_i++);
    if (!str[str_i]) return 1;
    }
    return 0;
    }

    -Dan
    Daniel Cer, Aug 23, 2005
    #2
    1. Advertising

  3. > e.g. something like:
    [...]
    > int binstr(const void *bin, int bin_sz, const char *str) {
    > const char *c_bin; int bin_i, str_i; c_bin = bin;
    > for (bin_i = 0; bin_i < bin_sz; bin_i++) {
    > for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i];
    > str_i++); if (!str[str_i]) return 1;
    > }
    > return 0;
    > }


    that is higly uneffective if the searched pattern is long : complexity of
    such an algorithm is O(length(bin) * length(str)) where the KMP algorithm
    is O(length(bin)+length(str)).

    Here is a link I found using google :
    http://www-igm.univ-mlv.fr/~lecroq/string/node8.html

    The C code seems reasonnable even if the naming of the variables is stupid.
    --
    ·O· Pierre Habouzit
    ··O
    OOO http://www.madism.org
    Pierre Habouzit, Aug 23, 2005
    #3
  4. Tarun

    Mabden Guest

    "Daniel Cer" <> wrote in message
    news:p...
    > > I need to find a particular substring in a binary string(some text
    > > appended & prepended to binary data).
    > > I cant use strstr since it terminates on receiving '\0'which can be
    > > there in binary data also I cant use memmem. Is there any other
    > > available function to do this.
    > >

    >
    > I assume you don't want to use memmem since it's a gnu extension and

    thus
    > would make the resulting code less portable.
    >
    > If this is case, and if nothing from the standard library fits the

    bill,
    > why not just write your own function to do the comparison?
    >
    > e.g. something like:
    >
    > /* binstr - searches a block of memory for a given character string
    > *
    > * params:
    > * bin - pointer to region to be searched
    > * bin_sz - size of region in bytes
    > * str - null terminated character string to search for
    > *
    > * returns: 1 if the string is found, 0 otherwise
    > */
    >
    > int binstr(const void *bin, int bin_sz, const char *str) {
    > const char *c_bin; int bin_i, str_i; c_bin = bin;
    > for (bin_i = 0; bin_i < bin_sz; bin_i++) {
    > for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i];

    str_i++);
    > if (!str[str_i]) return 1;
    > }
    > return 0;
    > }
    >


    You should just go to the local high school and set up a booth with a
    sign "I'll do your homework for free!"
    Then you'll be really kewl.

    --
    Mabden
    Mabden, Aug 23, 2005
    #4
  5. Tarun

    CBFalconer Guest

    Tarun wrote:
    >
    > I need to find a particular substring in a binary string(some text
    > appended & prepended to binary data).
    > I cant use strstr since it terminates on receiving '\0'which can be
    > there in binary data also I cant use memmem. Is there any other
    > available function to do this.


    Here is some code from the archives that I wrote last year.

    --------------- cut binfsrch.c ----------------
    /*
    Leor Zolman wrote:
    > On 25 Feb 2004 07:34:40 -0800, (spike) wrote:
    >
    >> Im trying to write a program that should read through a binary
    >> file searching for the character sequence "\name\"
    >>
    >> Then it should read the characters following the "\name\"
    >> sequence until a NULL character is encountered.
    >>
    >> But when my program runs it gets a SIGSEGV (Segmentation
    >> vioalation) signal.
    >>
    >> Whats wrong? And is there a better way than mine to solve
    >> this task (most likely)

    >
    > I think so. Here's a version I just threw together:

    */

    /* And heres another throw -- binfsrch.c by CBF */
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #include <assert.h>

    /* The difference between a binary and a text file, on read,
    is the conversion of end-of-line delimiters. What those
    delimiters are does not affect the action. In some cases
    the presence of 0x1a EOF markers (MsDos) does.

    This is a version of Knuth-Morris-Pratt algorithm. The
    point of using this is to avoid any backtracking in file
    reading, and thus avoiding any use of buffer arrays.
    */

    size_t chrcount; /* debuggery, count of input chars, zeroed */

    /* --------------------- */

    /* Almost straight out of Sedgewick */
    /* The next array indicates what index in id should next be
    compared to the current char. Once the (lgh - 1)th char
    has been successfully compared, the id has been found.
    The array is formed by comparing id to itself. */
    void initnext(int *next, const char *id, int lgh)
    {
    int i, j;

    assert(lgh > 0);
    next[0] = -1; i = 0; j = -1;
    while (i < lgh) {
    while ((j >= 0) && (id != id[j])) j = next[j];
    i++; j++;
    next = j;
    }
    #if (0)
    for (i = 0; i < lgh; i++)
    printf("id[%d] = '%c' next[%d] = %d\n",
    i, id, i, next);
    #endif
    } /* initnext */

    /* --------------------- */

    /* reads f without rewinding until either EOF or *marker
    has been found. Returns EOF if not found. At exit the
    last matching char has been read, and no further. */
    int kmpffind(const char *marker, int lgh, int *next, FILE *f)
    {
    int j; /* char position in marker to check */
    int ch; /* current char */

    assert(lgh > 0);
    j = 0;
    while ((j < lgh) && (EOF != (ch = getc(f)))) {
    chrcount++;
    while ((j >= 0) && (ch != marker[j])) j = next[j];
    j++;
    }
    return ch;
    } /* kmpffind */

    /* --------------------- */

    /* Find marker in f, display following printing chars
    up to some non printing character or EOF */
    int binfsrch(const char *marker, FILE *f)
    {
    int *next;
    int lgh;
    int ch;
    int items; /* count of markers found */

    lgh = strlen(marker);
    if (!(next = malloc(lgh * sizeof *next))) {
    puts("No memory");
    exit(EXIT_FAILURE);
    }
    else {
    initnext(next, marker, lgh);
    items = 0;
    while (EOF != kmpffind(marker, lgh, next, f)) {
    /* found, take appropriate action */
    items++;
    printf("%d %s : \"", items, marker);
    while (isprint(ch = getc(f))) {
    chrcount++;
    putchar(ch);
    }
    puts("\"");
    if (EOF == ch) break;
    else chrcount++;
    }
    free(next);
    return items;
    }
    } /* binfsrch */

    /* --------------------- */

    int main(int argc, char **argv)
    {
    FILE *f;

    f = stdin;
    if (3 == argc) {
    if (!(f = fopen(argv[2], "rb"))) {
    printf("Can't open %s\n", argv[2]);
    exit(EXIT_FAILURE);
    }
    argc--;
    }
    if (2 != argc) {
    puts("Usage: binfsrch name [binaryfile]");
    puts(" (file defaults to stdin text mode)");
    }
    else if (binfsrch(argv[1], f)) {
    printf("\"%s\" : found\n", argv[1]);
    }
    else printf("\"%s\" : not found\n", argv[1]);
    printf("%lu chars\n", (unsigned long)chrcount);
    return 0;
    } /* main binfsrch */

    --
    "If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers." - Keith Thompson
    CBFalconer, Aug 23, 2005
    #5
  6. Tarun

    Daniel Cer Guest

    > that is higly uneffective if the searched pattern is long : complexity of
    > such an algorithm is O(length(bin) * length(str)) where the KMP algorithm
    > is O(length(bin)+length(str)).
    >
    > Here is a link I found using google :
    > http://www-igm.univ-mlv.fr/~lecroq/string/node8.html


    For many practical string matching problems, doesn't KMP not perform all
    that much better than the (previously given) brute force approach?

    For something that's typically much better than both of these,
    you might want to check out the Boyer-Moore algorithm. links:

    http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/index.html

    http://en.wikipedia.org/wiki/Boyer-Moore_string_searching_algorithm

    -Dan
    Daniel Cer, Aug 23, 2005
    #6
  7. Tarun

    CBFalconer Guest

    Daniel Cer wrote:
    >

    .... snip ...
    >
    > For many practical string matching problems, doesn't KMP not
    > perform all that much better than the (previously given) brute
    > force approach?


    True. However the brute force method requires backtracking in the
    searched stream. KMP totally avoids that, so that you can find a
    match immediately without using any buffers.

    Earlier today I published, in this thread, a demonstration
    program. It can, for example, search input from an unbuffered
    serial input. Modifications of the technique can find the longest
    match, which can be useful in compression software.

    The Boyer-Moore technique requires the entire searched area to be
    buffered. When that is feasible it can offer serious run-time
    improvement.

    --
    "If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers." - Keith Thompson
    CBFalconer, Aug 23, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Badass Scotsman

    Finding a SubString within a String

    Badass Scotsman, Mar 31, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    6,113
    S. Justin Gengo
    Mar 31, 2006
  2. johndesp
    Replies:
    13
    Views:
    1,585
    Paul Lutus
    Aug 20, 2004
  3. Stefan Ram
    Replies:
    0
    Views:
    354
    Stefan Ram
    Sep 19, 2008
  4. fedora

    substring finding problem!

    fedora, Feb 14, 2010, in forum: C Programming
    Replies:
    229
    Views:
    3,458
  5. Replies:
    3
    Views:
    191
    Sherm Pendley
    Aug 3, 2005
Loading...

Share This Page