Any search pattern method recommed for mmap memory

O

Owen Zhang

I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
 
C

CBFalconer

Owen said:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

You don't need the 'virtual memory'. Look the following over.

/*
Leor said:
I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id != id[j])) j = next[j];
i++; j++;
next = j;
}
#ifdef DEBUG
for (i = 0; i <= lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id, i, next);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(1 + lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */
 
M

Mark Bluemel

Owen said:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
Google for Boyer-Moore, I suspect...
 
K

Keith Thompson

Owen Zhang said:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The mmap() function is not part of standard C. If it's relevant to
your question, you should ask in a system-specific newsgroup, most
likely comp.unix.programmer.

But I don't see how it's relevant. Is there some reason you think
searching a chunk of memory allocated by mmap is different from
searching any other chunk of memory?

Standard C provides some simple searching functions such as strstr().
If that doesn't suit your needs, then you probably have an algorithm
question; comp.programming is likely to be the best place to ask.
 
T

Tor Rustad

Owen said:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?


The on-topic answer is: strstr().

mmap() specific considerations, should rather be posted to "c.u.programmer".
 
R

Richard Tobin

Owen Zhang said:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

I think about all you can say is that a method that access data
sequentially rather than randomly is likely to work better, because it
matches disk access better. That's assuming you don't have any kind
of indexing of course.

-- Richard
 
K

Keith Thompson

Tor Rustad said:
The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
 
T

Tor Rustad

Keith said:
Tor Rustad said:
The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

I don't follow.. why can't OP check for extra requirements after each
match by strstr()?


OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
 
K

Keith Thompson

Tor Rustad said:
Keith said:
Tor Rustad said:
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

I don't follow.. why can't OP check for extra requirements after each
match by strstr()?

Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().

I hadn't thought of that, though it shouldn't be to hard to address
it.
 
T

Tor Rustad

Keith said:
Tor Rustad said:
Keith said:
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?

Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.

There "might" be a lot of false positives, particularly if Keith is
allowed to construct that input file! :)

OTOH, let say OP want to scan C source files for keywords, will there
normally be more matches for "int" than [ \t]?

If complex matching is required, OP should rather look into using a
regular expression library, or a lex tool. No reason to reinvent the
wheel for this.

I hadn't thought of that, though it shouldn't be to hard to address
it.

I had the case in mind, where other programs access the file simultaneously.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top