memmem()

B

barcaroller

Does anyone know of a POSIX-equivalent for GNU's memmem(), which
searches for a sequence of bytes within a larger block of memory?
 
B

Bill Cunningham

barcaroller said:
Does anyone know of a POSIX-equivalent for GNU's memmem(), which
searches for a sequence of bytes within a larger block of memory?

I think you might want comp.unix.programmer

Bill
 
P

puppi

Does anyone know of a POSIX-equivalent for GNU's memmem(), which
searches for a sequence of bytes within a larger block of memory?

There is strstr(). It's defined by C89 and C99. The essential
difference between strstr() and memmem() is that strstr() assumes that
the "haystack" and the "needle" are null-byte ended, while in memmem()
the lengths are specified. Hence it's not quite an equivalence (but
unless you are reading raw binary data, it's close to that).
 
B

Bill Cunningham

puppi said:
On May 7, 11:53 pm, barcaroller <[email protected]> wrote:
There is strstr(). It's defined by C89 and C99. The essential
difference between strstr() and memmem() is that strstr() assumes that
the "haystack" and the "needle" are null-byte ended, while in memmem()
the lengths are specified. Hence it's not quite an equivalence (but
unless you are reading raw binary data, it's close to that).

Strstr() interesting choice I'll look at that. But he's asking about
POSIX and I wouldn't know how to answer myself.

Bill
 
J

James Kuyper

There is strstr(). It's defined by C89 and C99. The essential
difference between strstr() and memmem() is that strstr() assumes that
the "haystack" and the "needle" are null-byte ended, while in memmem()
the lengths are specified. Hence it's not quite an equivalence (but
unless you are reading raw binary data, it's close to that).

I strongly suspect that the OP is not working with null-terminated data.
 
B

barcaroller

There is strstr(). It's defined by C89 and C99. The essential
difference between strstr() and memmem() is that strstr() assumes that
the "haystack" and the "needle" are null-byte ended, while in memmem()
the lengths are specified. Hence it's not quite an equivalence (but
unless you are reading raw binary data, it's close to that).

Thank your for your response. I am in fact dealing with raw binary
data. I am a bit suprised that strstr() is in C99, but no similar
function for binary data. I could roll my own, but I like GNU's
optimizations in memmem (making use of word boundaries etc).
 
A

Angel

Thank your for your response. I am in fact dealing with raw binary
data. I am a bit suprised that strstr() is in C99, but no similar
function for binary data. I could roll my own, but I like GNU's
optimizations in memmem (making use of word boundaries etc).

You might want to be careful with that, memmem() is broken in several
versions of libc.
 
B

barcaroller

You might want to be careful with that, memmem() is broken in several
versions of libc.

I know that it was broken at some point in the past (in fact, it says
so in the manpage) but my understanding is that it has now been fixed.
If you know otherwise, please let me know (or point me to a source).
 
P

puppi

Thank your for your response.  I am in fact dealing with raw binary
data.  I am a bit suprised that strstr() is in C99, but no similar
function for binary data.  I could roll my own, but I like GNU's
optimizations in memmem (making use of word boundaries etc).

If your project is open-source, you can always simply use glibc's
implementation in a "copy-paste" fashion. Just be sure to account for
the legal technicalities in doing that.
 
P

puppi

I know that it was broken at some point in the past (in fact, it says
so in the manpage) but my understanding is that it has now been fixed.  
If you know otherwise, please let me know (or point me to a source).

You could also bitwise AND with 0x01 each byte of the sequences to be
compared, filtering out the low bits (and preferably packing them 8 by
8 in a single byte), then bitwise OR the original sequences with 0x01,
put a sentinel 0x00 at the end and use 2 strstr()s (or 1 strstr() and
1 plain equality test for each possible match). Of course it wouldn't
be as efficient, but at least it's as alternative.
 
K

Keith Thompson

puppi said:
You could also bitwise AND with 0x01 each byte of the sequences to be
compared, filtering out the low bits (and preferably packing them 8 by
8 in a single byte), then bitwise OR the original sequences with 0x01,
put a sentinel 0x00 at the end and use 2 strstr()s (or 1 strstr() and
1 plain equality test for each possible match). Of course it wouldn't
be as efficient, but at least it's as alternative.

Or you could use memcmp() in a loop.
 
B

Ben Pfaff

barcaroller said:
Does anyone know of a POSIX-equivalent for GNU's memmem(), which
searches for a sequence of bytes within a larger block of memory?

It's not hard to write such a function. Have you considered
doing that?
 
N

Nobody

It's not hard to write such a function. Have you considered
doing that?

It's not hard to write something which works, but it's easy to end up with
something which is far less efficient than it needs to be.

Hint to OP: look up the Boyer–Moore algorithm.
 
D

Dr Nick

Nobody said:
It's not hard to write something which works, but it's easy to end up with
something which is far less efficient than it needs to be.

Do you need it now, or do you want to make sure your code is portable.
If you want it now then I think you do either have to write your own or
use GNU's if the license is compatible.

If you're "just" worried about the future, what I do is (putting the
relevant code for all such functions together in one header/source file
pair) do something like this:

use something external to C to check if the function is available
(autoconf in my case).
if it is, I conditionally #define another name, comfortably in user
namespace, for it
#define search_memory memmem
if not, #error

then a future porter "just" needs to put the prototype in the header
file where the #error is, and write the code in the C file and there he
is.

I've got "caseless_strcmp" in my code, built on strcasecmp (and, on
porting to Windows, stricmp)

Thusly:

#ifdef HAVE_STRCASECMP
#define caseless_strcmp(x,y) strcasecmp(x,y)
#else
#ifdef HAVE_STRICMP
#define caseless_strcmp(x,y) stricmp(x,y)
#else
#error Need to implement caseless string comparison here
#endif
#endif

So I push the problem off for the time being, avoid writing code I don't
need to, but don't leave anyone porting it in the future to a non GNU
system foundering with no idea of what has gone wrong.
 
B

Ben Pfaff

Nobody said:
It's not hard to write something which works, but it's easy to end up with
something which is far less efficient than it needs to be.

It depends on what you are searching for. If it's fairly short,
then simple and obvious algorithms are just fine.
 
N

Nobody

It depends on what you are searching for. If it's fairly short,
then simple and obvious algorithms are just fine.

True enough. In fact, that's probably what memmem() uses.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top