Verify that line exists in a file

Discussion in 'C Programming' started by Jonny, Feb 19, 2005.

  1. Jonny

    Jonny Guest

    Hi,

    Please could you tell me what the quickest way would be to verify that a
    line of text exists in a file.

    I would like to be able to do something like:

    grep "^line of text$" filename.txt

    and then check the exit code, or:

    grep -c "^line of text$" filename.txt

    and then check the count.

    But I need to do it in a C program.

    Many Thanks,
    Jonny
     
    Jonny, Feb 19, 2005
    #1
    1. Advertising

  2. In article <T2ORd.1037$>,
    Jonny <> wrote:
    :please could you tell me what the quickest way would be to verify that a
    :line of text exists in a file.

    open the file, loop reading lines, for each line strcmp against the
    target; if you get a match then set an appropriate status and break
    early; close the file.

    Watch our for the factor of whether the line read includes or excludes
    the record seperator; note that the answer might not be the same for the
    last line in the file.


    I should be slightly more honest: the above is not necessarily
    the -quickest- way, if by 'quick' you mean that execution speed is
    crucial. If execution speed is crucial, there are faster algorithms.

    For example, you could read a bufferful of the file at a time into
    memory and proceed through it. Instead of starting from the start
    of the buffer and comparing forward, you could instead go immediately
    a number of characters further on, where the offset is the size of
    the string you are trying to match against. If the character there does
    not match the -last- character in the string, then you know already
    that that particular section you are looking at is not a copy of the
    line; if it does match, then you proceed backwards comparing a character
    at at time until you reach the beginning of the string or find a mismatch.

    If that first test did not match, you could scan forward until you
    found the end-of-line and then start the backwards comparison from there.
    There are optimizations that can be made to even that process, though:
    the character that was in fact at the place you looked can give you
    hints about how much further on to look. For example if the character
    you found was an 'x', and you knew that 'x' was the 7th last character in
    the string, then you could immediately advance by 7 characters and look
    for the newline there, instead of testing each of the characters
    inbetween for newline.
    --
    "There are three kinds of lies: lies, damn lies, and statistics."
    -- not Twain, perhaps Disraeli, first quoted by Leonard Courtney
     
    Walter Roberson, Feb 19, 2005
    #2
    1. Advertising

  3. Jonny

    Michael Mair Guest

    Jonny wrote:
    > Hi,
    >
    > Please could you tell me what the quickest way would be to verify that a
    > line of text exists in a file.
    >
    > I would like to be able to do something like:
    >
    > grep "^line of text$" filename.txt
    >
    > and then check the exit code, or:
    >
    > grep -c "^line of text$" filename.txt
    >
    > and then check the count.
    >
    > But I need to do it in a C program.


    You want to have a function

    int OccurrenceOfLine(FILE *file, const char *line)

    which reads lines with fgets() into a buffer of size
    >=strlen(line)+1, strips the '\n' from the end of the line

    (if there is none, the line was too long, i.e. it does not
    match and you can read characters until you have reached the
    '\n' of this line) and uses strcmp() on the buffer and
    line.
    If you have a match, you can return 1. If you encounter an
    error, you can return -1. Otherwise, you return 0.
    If you want to _count_ the occurences, then I suggest

    int OccurrencesOfLine(FILE *file, const char *line, size_t *count)

    where you return 0 on success and an error code !=0 on failure.
    The number of matching lines is stored in *count.

    Note: You need of course <stdio.h> and <string.h> for both
    ways I have suggested.


    Cheers
    Michael
    --
    E-Mail: Mine is an /at/ gmx /dot/ de address.
     
    Michael Mair, Feb 19, 2005
    #3
  4. Jonny wrote:

    > Hi,
    >
    > Please could you tell me what the quickest way would be to verify that a
    > line of text exists in a file.
    >
    > I would like to be able to do something like:
    >
    > grep "^line of text$" filename.txt
    >
    > and then check the exit code, or:
    >
    > grep -c "^line of text$" filename.txt
    >
    > and then check the count.
    >
    > But I need to do it in a C program.
    >
    > Many Thanks,
    > Jonny


    Knuth-pratt-morris or boyer-moore
    http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/

    gtoomey
     
    Gregory Toomey, Feb 19, 2005
    #4
  5. Jonny

    Guest

    Gregory Toomey wrote:

    [ ... ]

    > Knuth-pratt-morris or boyer-moore
    > http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/


    With the proviso that you _rarely_ want to use Boyer-Moore as
    originally defined -- nearly all practical use is of simplified
    variants such as Boyer-Moore-Horspool (or Sunday's variant thereof). A
    full-blown Boyer-Moore search is theoretically better in some sense,
    but the savings rarely justify the initialization effort.

    The development effort is considerable as well -- to the point that
    incorrect implementations abound, even those published by the most
    highly respected computer scientists (e.g. Knuth). In fact, for the
    first 15 years after the algorithm was described, EVERY published
    implementation seems to have contained at least one defect.

    By contrasty, Sunday's variant of B-M-H is easy to set up, much easier
    implement correctly (I'm among the few to have really screwed it up),
    and will usually run faster to boot!

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    , Feb 20, 2005
    #5
  6. Jonny

    Jonny Guest

    Thanks to everyone for replying.

    I do need a function which is efficient, so I'll probably need to try an
    implementation of one of the more advanced algorithms suggested.

    I found the section of code below, which claims to be an implementation
    of Boyer-Moore-Horspool.

    I would be grateful if someone could briefly tell me if and how I could
    use it.


    char *search( pat, text, n )
    char *pat, *text;
    int n;

    { int i, j, k, m, skip[MAXCHAR];
    m = strlen(pat);
    if( m==0 ) return( text );
    for( k=0; k<MAXCHAR; k++ ) skip[k] = m;
    for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1;

    for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {
    for( j=m-1, i=k; j>=0 && text == pat[j]; j-- ) i--;
    if( j == (-1) ) return( text+i+1 );
    }
    return( NULL );
    }


    Regards,
    Jonny
     
    Jonny, Feb 20, 2005
    #6
  7. Jonny

    CBFalconer Guest

    Gregory Toomey wrote:
    > Jonny wrote:
    >
    >> Please could you tell me what the quickest way would be to verify
    >> that a line of text exists in a file.
    >>
    >> I would like to be able to do something like:
    >>
    >> grep "^line of text$" filename.txt
    >>
    >> and then check the exit code, or:
    >>
    >> grep -c "^line of text$" filename.txt
    >>
    >> and then check the count. But I need to do it in a C program.

    >
    > Knuth-pratt-morris or boyer-moore
    > http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/


    For complete lines combining fgets and strcmp is hard to beat.
    Normally strcmp will fail on the first char. or two, and fgets need
    not read into a buffer significantly larger than the comparee. The
    only complication will be stripping the '\n' off the end of file
    lines, if present, and flushing the input line thru '\n' if not
    present.

    There is no need for more complex searches such as kmp or bm.

    --
    "If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers." - Keith Thompson
     
    CBFalconer, Feb 20, 2005
    #7
  8. Jonny

    Michael Mair Guest

    Jonny wrote:
    > Thanks to everyone for replying.
    >
    > I do need a function which is efficient, so I'll probably need to try an
    > implementation of one of the more advanced algorithms suggested.


    Which is _complete_ overkill.
    Both search for a comparatively small pattern in a comparatively
    large _unstructured_ text.
    You, OTOH, have a pattern which is exactly as large as the text
    without the '\n'. That means that setting up the skip vector
    about triples the cost.

    The only disadvantage of fgets() is that it does not return or
    make available the number of read characters so that you may
    consider using getc() and comparing yourself.
    Implement the "read line -- compare it" variant and one advanced
    algorithm and just use a profiler or clock() or whatever to
    find out which is better.

    A "better" algorithm is not necessarily the best algorithm for
    a given task -- whenever you have additional information about
    the problem at hand, using a method tailored to your degree of
    information is best.


    > I found the section of code below, which claims to be an implementation
    > of Boyer-Moore-Horspool.
    >
    > I would be grateful if someone could briefly tell me if and how I could
    > use it.


    You could, with some restrictions.
    Why did you not just paste it into your code _and_ try it?
    We are not here to counsel you about algorithms or code off some
    dubious site in the web but about _your_ C code -- which you still
    fail to show us.


    > char *search( pat, text, n )
    > char *pat, *text;
    > int n;


    K&R style -- this is outdated by a good fifteen years.

    char *search (const char *pat, const char *text, int n)

    where you probably would change int n into size_t n and
    would change the order of text and pat to fit the order
    in standard library functions such as strstr().

    >
    > { int i, j, k, m, skip[MAXCHAR];
    > m = strlen(pat);
    > if( m==0 ) return( text );
    > for( k=0; k<MAXCHAR; k++ ) skip[k] = m;
    > for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1;
    >
    > for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {


    Here is a catch: The author assumes MAXCHAR to be 1<<s.
    To repair this, make it skip[text[k] % MAXCHAR]
    and leave the micro-optimisations to the compiler.

    > for( j=m-1, i=k; j>=0 && text == pat[j]; j-- ) i--;
    > if( j == (-1) ) return( text+i+1 );
    > }
    > return( NULL );
    > }


    I have not closely looked at or tested the rest of the
    code -- the latter is your job.
    However, I think that you are trying to be lazy on the
    thinking and programming part. This will not lead to the
    most efficient solution.


    Cheers
    Michael
    --
    E-Mail: Mine is an /at/ gmx /dot/ de address.
     
    Michael Mair, Feb 20, 2005
    #8
  9. Jonny

    SM Ryan Guest

    Jonny <> wrote:
    # Hi,
    #
    # Please could you tell me what the quickest way would be to verify that a
    # line of text exists in a file.

    Set up an FSM and scan the file character by character. Then you don't have to worry
    about line bufferring.

    If the line you're interested in has N characters, the state machine would have
    N+4 states:
    s=0 initial state
    0<=s<N s characters of a line have been read and match the given string.
    if the next input character is the sth string character
    then move to state s+1
    else move to state N+2
    s=N all N characters match
    if the next input character is a new line or EOF
    then move to state N+2
    else move to state N+1
    s=N+1 this line does not match the string
    if the next input character is a new line
    then move to state 0
    else if the next input is an EOF
    then move to state N+3
    else
    stay in state N+1
    s=N+2 accepting final state: the line is in the file
    s=N+3 rejecting final state: the line is not in the file

    If you get the line out of argv, the only variables you need are the next input,
    N, s, and argc and argv.

    If the file length is M and the sought line is randomly distributed in the file,
    the estimated running time is a little over O(M/2), and it is independent of N.
    The storage cost is O(4+N).

    --
    SM Ryan http://www.rawbw.com/~wyrmwif/
    The whole world's against us.
     
    SM Ryan, Feb 20, 2005
    #9
  10. On Sat, 19 Feb 2005 21:18:43 GMT, in comp.lang.c , Jonny
    <> wrote:

    >Hi,
    >
    >Please could you tell me what the quickest way would be to verify that a
    >line of text exists in a file.
    >
    >I would like to be able to do something like:
    >
    >grep "^line of text$" filename.txt


    1) d/l the source code for grep. Its in the gnu txtutils pack I think.

    2) read the file line by line till you find a line that matches your
    pattern.

    --
    Mark McIntyre
    CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
    CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
     
    Mark McIntyre, Feb 20, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?TXJNaWtl?=

    Verify File Exists

    =?Utf-8?B?TXJNaWtl?=, Jan 5, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    6,160
    =?Utf-8?B?TXJNaWtl?=
    Jan 5, 2005
  2. Carlos Albert

    Verify if Session(hashtable) exists

    Carlos Albert, Dec 9, 2005, in forum: ASP .Net
    Replies:
    8
    Views:
    6,419
    addup
    Dec 14, 2005
  3. Replies:
    4
    Views:
    1,041
  4. Henry Miranda

    Verify if file exists on network shared HELP

    Henry Miranda, Dec 17, 2003, in forum: ASP .Net Security
    Replies:
    0
    Views:
    126
    Henry Miranda
    Dec 17, 2003
  5. akadelski

    Verify URL/file exists

    akadelski, May 18, 2006, in forum: Javascript
    Replies:
    3
    Views:
    232
    Randy Webb
    May 19, 2006
Loading...

Share This Page