Re: memcmp versus strstr; reaction to chr(0)

Discussion in 'C Programming' started by Burne C, Jul 24, 2003.

  1. Burne C

    Burne C Guest

    "Walter Dnes" <> wrote in message
    news:bfocbv$gepau$-berlin.de...
    > When I asked in another thread about string comparisons, I forgot
    > about the chr(0) booby-trap in C strings. Since I want to compare
    > random binary data, this is important to me. Someone correct me if I'm
    > wrong; strstr stops at chr(0). memcmp doesn't treat chr(0) as a
    > delimiter, and can compare ranges (the word "strings" is incorrect here)
    > that included embedded chr(0). I realize that memcmp won't
    > automatically scan a larger string, but I can put it in a loop to sweep
    > through a larger string. Too bad that memmem is not standard.
    >


    You would use memchr together to speed up the memcmp search sightly:

    ----------------------
    void *memsearch(void *source, void *target, long soc_length, long tar_length)
    {
    void *addr,*start_address;

    search_address = source;

    while(addr = memchr(search_address, *(char*)target, soc_length))
    {
    if(memcmp(addr, target, tar_length)==0)
    return addr;

    ++(char*)search_address;
    }

    return NULL;
    }


    --
    BC
    Burne C, Jul 24, 2003
    #1
    1. Advertising

  2. Burne C

    Burne C Guest

    "Burne C" <> wrote in message news:bfp14b$...
    >
    > "Walter Dnes" <> wrote in message
    > news:bfocbv$gepau$-berlin.de...
    > > When I asked in another thread about string comparisons, I forgot
    > > about the chr(0) booby-trap in C strings. Since I want to compare
    > > random binary data, this is important to me. Someone correct me if I'm
    > > wrong; strstr stops at chr(0). memcmp doesn't treat chr(0) as a
    > > delimiter, and can compare ranges (the word "strings" is incorrect here)
    > > that included embedded chr(0). I realize that memcmp won't
    > > automatically scan a larger string, but I can put it in a loop to sweep
    > > through a larger string. Too bad that memmem is not standard.
    > >

    >
    > You would use memchr together to speed up the memcmp search sightly:
    >
    > ----------------------
    > void *memsearch(void *source, void *target, long soc_length, long tar_length)
    > {
    > void *addr,*start_address;
    >


    Sorry, I have made a mistake to the variable name

    it should be "void *addr, *search_address"

    --
    BC
    Burne C, Jul 24, 2003
    #2
    1. Advertising

  3. Burne C

    Burne C Guest

    "Peter Ammon" <> wrote in message news:bfq8uv$3cv$...
    > Burne C wrote:
    >
    > > "Walter Dnes" <> wrote in message
    > > news:bfocbv$gepau$-berlin.de...
    > >
    > >> When I asked in another thread about string comparisons, I forgot
    > >>about the chr(0) booby-trap in C strings. Since I want to compare
    > >>random binary data, this is important to me. Someone correct me if I'm
    > >>wrong; strstr stops at chr(0). memcmp doesn't treat chr(0) as a
    > >>delimiter, and can compare ranges (the word "strings" is incorrect here)
    > >>that included embedded chr(0). I realize that memcmp won't
    > >>automatically scan a larger string, but I can put it in a loop to sweep
    > >>through a larger string. Too bad that memmem is not standard.
    > >>

    > >
    > >
    > > You would use memchr together to speed up the memcmp search sightly:
    > >
    > > ----------------------
    > > void *memsearch(void *source, void *target, long soc_length, long tar_length)
    > > {
    > > void *addr,*start_address;
    > >
    > > search_address = source;
    > >
    > > while(addr = memchr(search_address, *(char*)target, soc_length))
    > > {
    > > if(memcmp(addr, target, tar_length)==0)
    > > return addr;
    > >
    > > ++(char*)search_address;
    > > }
    > >
    > > return NULL;
    > > }
    > >

    >
    > This function has a few unfortunate properties.
    >
    > 1) It won't compile, since (char*)search_address it not an lvalue, and
    > so ++(char*)search_address is illegal.


    search_address is lvalue, it is a local variable. I have made a mistake to the variable name
    "start_address" and I have corrected it in the last post.

    >
    > 2) If it were to compile, it would invoke undefined behavior in many
    > cases, because you do not update the soc_length value to reflect the
    > incremented pointer and so search beyond the ends of the array.
    >


    Right.


    > 3) If the behavior were defined, it would be inefficient. For example,
    > memsearch("aaaaaaaaaaaaaaaaaaax", "x1", 20, 2) would start at the first
    > 'a', walk the entire string looking for an 'x', find the x, determine
    > that the string "x1" was not present, then go to the second 'a' and
    > repeat the process. This has O(n^2) complexity.


    Yes, the "++(char*)search_address" line should be

    search_address = (char*)addr+1;

    It search for the first "x" using memchr, and check the substring using "memcmp", if it is not
    match, the memchr search continue _after_ the returned addr.

    >
    > 4) Function beginning with "mem" and followed by a lowercase letter are
    > reserved. You should call your function something like searchmem or
    > mem_search.
    >
    > That said, here's my recently written real life function to do the same
    > thing. Now I'm the target.
    >
    > const char* mymemstr(const char* hay, const char* needle, size_t
    > hayLength, size_t needleLength) {
    > size_t hayOuter;
    > size_t needleIndex=0;
    > size_t memory=0;
    > for (hayOuter=0; hayOuter < hayLength; hayOuter++) {
    > if (needleIndex >= needleLength) return hay + hayOuter -
    > needleLength;
    > if (needle[needleIndex]==hay[hayOuter]) {
    > if (needleIndex++==0) memory=hayOuter;
    > }
    > else { /* needle[needleIndex]!=hay[hayOuter] */
    > if (needleIndex > 0) {
    > needleIndex=0;
    > hayOuter=memory+1;
    > }
    > }
    > }
    > return NULL;
    > }
    >
    > -Peter
    >
    Burne C, Jul 25, 2003
    #3
  4. Burne C

    Peter Ammon Guest

    Burne C wrote:

    > "Peter Ammon" <> wrote in message news:bfq8uv$3cv$...
    >
    >>Burne C wrote:
    >>
    >>
    >>>"Walter Dnes" <> wrote in message
    >>>news:bfocbv$gepau$-berlin.de...
    >>>
    >>>
    >>>> When I asked in another thread about string comparisons, I forgot
    >>>>about the chr(0) booby-trap in C strings. Since I want to compare
    >>>>random binary data, this is important to me. Someone correct me if I'm
    >>>>wrong; strstr stops at chr(0). memcmp doesn't treat chr(0) as a
    >>>>delimiter, and can compare ranges (the word "strings" is incorrect here)
    >>>>that included embedded chr(0). I realize that memcmp won't
    >>>>automatically scan a larger string, but I can put it in a loop to sweep
    >>>>through a larger string. Too bad that memmem is not standard.
    >>>>
    >>>
    >>>
    >>>You would use memchr together to speed up the memcmp search sightly:
    >>>
    >>>----------------------
    >>>void *memsearch(void *source, void *target, long soc_length, long tar_length)
    >>>{
    >>> void *addr,*start_address;
    >>>
    >>> search_address = source;
    >>>
    >>> while(addr = memchr(search_address, *(char*)target, soc_length))
    >>> {
    >>> if(memcmp(addr, target, tar_length)==0)
    >>> return addr;
    >>>
    >>> ++(char*)search_address;
    >>> }
    >>>
    >>> return NULL;
    >>>}
    >>>

    >>
    >>This function has a few unfortunate properties.
    >>
    >>1) It won't compile, since (char*)search_address it not an lvalue, and
    >>so ++(char*)search_address is illegal.

    >
    >
    > search_address is lvalue, it is a local variable.


    Yes, but (char*)search_address is not an lvalue. Casting an lvalue
    doesn't give you an lvalue.

    > I have made a mistake to the variable name
    > "start_address" and I have corrected it in the last post.


    I see that, but it's still not an lvalue.

    [...]
    >>3) If the behavior were defined, it would be inefficient. For example,
    >>memsearch("aaaaaaaaaaaaaaaaaaax", "x1", 20, 2) would start at the first
    >>'a', walk the entire string looking for an 'x', find the x, determine
    >>that the string "x1" was not present, then go to the second 'a' and
    >>repeat the process. This has O(n^2) complexity.

    >
    >
    > Yes, the "++(char*)search_address" line should be
    >
    > search_address = (char*)addr+1;
    >
    > It search for the first "x" using memchr, and check the substring using "memcmp", if it is not
    > match, the memchr search continue _after_ the returned addr.


    Much better.
    [...]
    -Peter
    Peter Ammon, Jul 25, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. crjunk
    Replies:
    2
    Views:
    16,832
    crjunk
    Aug 8, 2003
  2. Joona I Palaste

    Re: memcmp versus strstr; reaction to chr(0)

    Joona I Palaste, Jul 24, 2003, in forum: C Programming
    Replies:
    0
    Views:
    425
    Joona I Palaste
    Jul 24, 2003
  3. Thomas Matthews

    Re: memcmp versus strstr; reaction to chr(0)

    Thomas Matthews, Jul 24, 2003, in forum: C Programming
    Replies:
    0
    Views:
    516
    Thomas Matthews
    Jul 24, 2003
  4. Dan Pop
    Replies:
    0
    Views:
    375
    Dan Pop
    Jul 24, 2003
  5. Paul Butcher
    Replies:
    12
    Views:
    706
    Gary Wright
    Nov 28, 2007
Loading...

Share This Page