Any search pattern method recommed for mmap memory

Discussion in 'C Programming' started by Owen Zhang, Sep 21, 2007.

  1. Owen Zhang

    Owen Zhang Guest

    I have a file loaded into virtual memory space by mmap. I need to
    search some key word inside the memory opened by mmap. What is the
    best and efficient way to do?
     
    Owen Zhang, Sep 21, 2007
    #1
    1. Advertisements

  2. Owen Zhang

    CBFalconer Guest

    You don't need the 'virtual memory'. Look the following over.

    /*
    */

    /* And heres another throw -- binfsrch.c by CBF */
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #include <assert.h>

    /* The difference between a binary and a text file, on read,
    is the conversion of end-of-line delimiters. What those
    delimiters are does not affect the action. In some cases
    the presence of 0x1a EOF markers (MsDos) does.

    This is a version of Knuth-Morris-Pratt algorithm. The
    point of using this is to avoid any backtracking in file
    reading, and thus avoiding any use of buffer arrays.
    */

    size_t chrcount; /* debuggery, count of input chars, zeroed */

    /* --------------------- */

    /* Almost straight out of Sedgewick */
    /* The next array indicates what index in id should next be
    compared to the current char. Once the (lgh - 1)th char
    has been successfully compared, the id has been found.
    The array is formed by comparing id to itself. */
    void initnext(int *next, const char *id, int lgh)
    {
    int i, j;

    assert(lgh > 0);
    next[0] = -1; i = 0; j = -1;
    while (i < lgh) {
    while ((j >= 0) && (id != id[j])) j = next[j];
    i++; j++;
    next = j;
    }
    #ifdef DEBUG
    for (i = 0; i <= lgh; i++)
    printf("id[%d] = '%c' next[%d] = %d\n",
    i, id, i, next);
    #endif
    } /* initnext */

    /* --------------------- */

    /* reads f without rewinding until either EOF or *marker
    has been found. Returns EOF if not found. At exit the
    last matching char has been read, and no further. */
    int kmpffind(const char *marker, int lgh, int *next, FILE *f)
    {
    int j; /* char position in marker to check */
    int ch; /* current char */

    assert(lgh > 0);
    j = 0;
    while ((j < lgh) && (EOF != (ch = getc(f)))) {
    chrcount++;
    while ((j >= 0) && (ch != marker[j])) j = next[j];
    j++;
    }
    return ch;
    } /* kmpffind */

    /* --------------------- */

    /* Find marker in f, display following printing chars
    up to some non printing character or EOF */
    int binfsrch(const char *marker, FILE *f)
    {
    int *next;
    int lgh;
    int ch;
    int items; /* count of markers found */

    lgh = strlen(marker);
    if (!(next = malloc(1 + lgh * sizeof *next))) {
    puts("No memory");
    exit(EXIT_FAILURE);
    }
    else {
    initnext(next, marker, lgh);
    items = 0;
    while (EOF != kmpffind(marker, lgh, next, f)) {
    /* found, take appropriate action */
    items++;
    printf("%d %s : \"", items, marker);
    while (isprint(ch = getc(f))) {
    chrcount++;
    putchar(ch);
    }
    puts("\"");
    if (EOF == ch) break;
    else chrcount++;
    }
    free(next);
    return items;
    }
    } /* binfsrch */

    /* --------------------- */

    int main(int argc, char **argv)
    {
    FILE *f;

    f = stdin;
    if (3 == argc) {
    if (!(f = fopen(argv[2], "rb"))) {
    printf("Can't open %s\n", argv[2]);
    exit(EXIT_FAILURE);
    }
    argc--;
    }
    if (2 != argc) {
    puts("Usage: binfsrch name [binaryfile]");
    puts(" (file defaults to stdin text mode)");
    }
    else if (binfsrch(argv[1], f)) {
    printf("\"%s\" : found\n", argv[1]);
    }
    else printf("\"%s\" : not found\n", argv[1]);
    printf("%lu chars\n", (unsigned long)chrcount);
    return 0;
    } /* main binfsrch */
     
    CBFalconer, Sep 21, 2007
    #2
    1. Advertisements

  3. Owen Zhang

    Mark Bluemel Guest

    Google for Boyer-Moore, I suspect...
     
    Mark Bluemel, Sep 21, 2007
    #3
  4. The mmap() function is not part of standard C. If it's relevant to
    your question, you should ask in a system-specific newsgroup, most
    likely comp.unix.programmer.

    But I don't see how it's relevant. Is there some reason you think
    searching a chunk of memory allocated by mmap is different from
    searching any other chunk of memory?

    Standard C provides some simple searching functions such as strstr().
    If that doesn't suit your needs, then you probably have an algorithm
    question; comp.programming is likely to be the best place to ask.
     
    Keith Thompson, Sep 21, 2007
    #4
  5. Owen Zhang

    Tor Rustad Guest


    The on-topic answer is: strstr().

    mmap() specific considerations, should rather be posted to "c.u.programmer".
     
    Tor Rustad, Sep 21, 2007
    #5
  6. I think about all you can say is that a method that access data
    sequentially rather than randomly is likely to work better, because it
    matches disk access better. That's assuming you don't have any kind
    of indexing of course.

    -- Richard
     
    Richard Tobin, Sep 21, 2007
    #6
  7. [...]

    Sure, but strstr() simply searches for a specified substring, not
    necessarily for a "keyword" (which may imply it's delimited somehow).
    Without more information, we can't be sure whether strstr will do the
    job or not.
     
    Keith Thompson, Sep 21, 2007
    #7
  8. Owen Zhang

    Tor Rustad Guest

    I don't follow.. why can't OP check for extra requirements after each
    match by strstr()?


    OTOH, files are typically not null terminated, but I didn't bother to
    check if OP needed to address this issue when using mmap().
     
    Tor Rustad, Sep 21, 2007
    #8
  9. Yes, he could do that, but it might not be as efficient as a more
    specialized search. If the keyword is sufficiently short, for
    example, there might be a lot of false positives. But again, we don't
    know much about the OP's requirements.
    I hadn't thought of that, though it shouldn't be to hard to address
    it.
     
    Keith Thompson, Sep 21, 2007
    #9
  10. Owen Zhang

    Tor Rustad Guest

    There "might" be a lot of false positives, particularly if Keith is
    allowed to construct that input file! :)

    OTOH, let say OP want to scan C source files for keywords, will there
    normally be more matches for "int" than [ \t]?

    If complex matching is required, OP should rather look into using a
    regular expression library, or a lex tool. No reason to reinvent the
    wheel for this.

    I had the case in mind, where other programs access the file simultaneously.
     
    Tor Rustad, Sep 22, 2007
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.