Any search pattern method recommed for mmap memory

Discussion in 'C Programming' started by Owen Zhang, Sep 21, 2007.

  1. Owen Zhang

    Owen Zhang Guest

    I have a file loaded into virtual memory space by mmap. I need to
    search some key word inside the memory opened by mmap. What is the
    best and efficient way to do?
    Owen Zhang, Sep 21, 2007
    1. Advertisements

  2. Owen Zhang

    CBFalconer Guest

    You don't need the 'virtual memory'. Look the following over.


    /* And heres another throw -- binfsrch.c by CBF */
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #include <assert.h>

    /* The difference between a binary and a text file, on read,
    is the conversion of end-of-line delimiters. What those
    delimiters are does not affect the action. In some cases
    the presence of 0x1a EOF markers (MsDos) does.

    This is a version of Knuth-Morris-Pratt algorithm. The
    point of using this is to avoid any backtracking in file
    reading, and thus avoiding any use of buffer arrays.

    size_t chrcount; /* debuggery, count of input chars, zeroed */

    /* --------------------- */

    /* Almost straight out of Sedgewick */
    /* The next array indicates what index in id should next be
    compared to the current char. Once the (lgh - 1)th char
    has been successfully compared, the id has been found.
    The array is formed by comparing id to itself. */
    void initnext(int *next, const char *id, int lgh)
    int i, j;

    assert(lgh > 0);
    next[0] = -1; i = 0; j = -1;
    while (i < lgh) {
    while ((j >= 0) && (id != id[j])) j = next[j];
    i++; j++;
    next = j;
    #ifdef DEBUG
    for (i = 0; i <= lgh; i++)
    printf("id[%d] = '%c' next[%d] = %d\n",
    i, id, i, next);
    } /* initnext */

    /* --------------------- */

    /* reads f without rewinding until either EOF or *marker
    has been found. Returns EOF if not found. At exit the
    last matching char has been read, and no further. */
    int kmpffind(const char *marker, int lgh, int *next, FILE *f)
    int j; /* char position in marker to check */
    int ch; /* current char */

    assert(lgh > 0);
    j = 0;
    while ((j < lgh) && (EOF != (ch = getc(f)))) {
    while ((j >= 0) && (ch != marker[j])) j = next[j];
    return ch;
    } /* kmpffind */

    /* --------------------- */

    /* Find marker in f, display following printing chars
    up to some non printing character or EOF */
    int binfsrch(const char *marker, FILE *f)
    int *next;
    int lgh;
    int ch;
    int items; /* count of markers found */

    lgh = strlen(marker);
    if (!(next = malloc(1 + lgh * sizeof *next))) {
    puts("No memory");
    else {
    initnext(next, marker, lgh);
    items = 0;
    while (EOF != kmpffind(marker, lgh, next, f)) {
    /* found, take appropriate action */
    printf("%d %s : \"", items, marker);
    while (isprint(ch = getc(f))) {
    if (EOF == ch) break;
    else chrcount++;
    return items;
    } /* binfsrch */

    /* --------------------- */

    int main(int argc, char **argv)
    FILE *f;

    f = stdin;
    if (3 == argc) {
    if (!(f = fopen(argv[2], "rb"))) {
    printf("Can't open %s\n", argv[2]);
    if (2 != argc) {
    puts("Usage: binfsrch name [binaryfile]");
    puts(" (file defaults to stdin text mode)");
    else if (binfsrch(argv[1], f)) {
    printf("\"%s\" : found\n", argv[1]);
    else printf("\"%s\" : not found\n", argv[1]);
    printf("%lu chars\n", (unsigned long)chrcount);
    return 0;
    } /* main binfsrch */
    CBFalconer, Sep 21, 2007
    1. Advertisements

  3. Owen Zhang

    Mark Bluemel Guest

    Google for Boyer-Moore, I suspect...
    Mark Bluemel, Sep 21, 2007
  4. The mmap() function is not part of standard C. If it's relevant to
    your question, you should ask in a system-specific newsgroup, most
    likely comp.unix.programmer.

    But I don't see how it's relevant. Is there some reason you think
    searching a chunk of memory allocated by mmap is different from
    searching any other chunk of memory?

    Standard C provides some simple searching functions such as strstr().
    If that doesn't suit your needs, then you probably have an algorithm
    question; comp.programming is likely to be the best place to ask.
    Keith Thompson, Sep 21, 2007
  5. Owen Zhang

    Tor Rustad Guest

    The on-topic answer is: strstr().

    mmap() specific considerations, should rather be posted to "c.u.programmer".
    Tor Rustad, Sep 21, 2007
  6. I think about all you can say is that a method that access data
    sequentially rather than randomly is likely to work better, because it
    matches disk access better. That's assuming you don't have any kind
    of indexing of course.

    -- Richard
    Richard Tobin, Sep 21, 2007
  7. [...]

    Sure, but strstr() simply searches for a specified substring, not
    necessarily for a "keyword" (which may imply it's delimited somehow).
    Without more information, we can't be sure whether strstr will do the
    job or not.
    Keith Thompson, Sep 21, 2007
  8. Owen Zhang

    Tor Rustad Guest

    I don't follow.. why can't OP check for extra requirements after each
    match by strstr()?

    OTOH, files are typically not null terminated, but I didn't bother to
    check if OP needed to address this issue when using mmap().
    Tor Rustad, Sep 21, 2007
  9. Yes, he could do that, but it might not be as efficient as a more
    specialized search. If the keyword is sufficiently short, for
    example, there might be a lot of false positives. But again, we don't
    know much about the OP's requirements.
    I hadn't thought of that, though it shouldn't be to hard to address
    Keith Thompson, Sep 21, 2007
  10. Owen Zhang

    Tor Rustad Guest

    There "might" be a lot of false positives, particularly if Keith is
    allowed to construct that input file! :)

    OTOH, let say OP want to scan C source files for keywords, will there
    normally be more matches for "int" than [ \t]?

    If complex matching is required, OP should rather look into using a
    regular expression library, or a lex tool. No reason to reinvent the
    wheel for this.

    I had the case in mind, where other programs access the file simultaneously.
    Tor Rustad, Sep 22, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.