Verify that line exists in a file

Jonny · Feb 19, 2005

Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count.

But I need to do it in a C program.

Many Thanks,
Jonny

Walter Roberson · Feb 19, 2005

lease could you tell me what the quickest way would be to verify that a
:line of text exists in a file.

open the file, loop reading lines, for each line strcmp against the
target; if you get a match then set an appropriate status and break
early; close the file.

Watch our for the factor of whether the line read includes or excludes
the record seperator; note that the answer might not be the same for the
last line in the file.

I should be slightly more honest: the above is not necessarily
the -quickest- way, if by 'quick' you mean that execution speed is
crucial. If execution speed is crucial, there are faster algorithms.

For example, you could read a bufferful of the file at a time into
memory and proceed through it. Instead of starting from the start
of the buffer and comparing forward, you could instead go immediately
a number of characters further on, where the offset is the size of
the string you are trying to match against. If the character there does
not match the -last- character in the string, then you know already
that that particular section you are looking at is not a copy of the
line; if it does match, then you proceed backwards comparing a character
at at time until you reach the beginning of the string or find a mismatch.

If that first test did not match, you could scan forward until you
found the end-of-line and then start the backwards comparison from there.
There are optimizations that can be made to even that process, though:
the character that was in fact at the place you looked can give you
hints about how much further on to look. For example if the character
you found was an 'x', and you knew that 'x' was the 7th last character in
the string, then you could immediately advance by 7 characters and look
for the newline there, instead of testing each of the characters
inbetween for newline.

Michael Mair · Feb 19, 2005

Jonny said:
Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count.

But I need to do it in a C program.

You want to have a function

int OccurrenceOfLine(FILE *file, const char *line)

which reads lines with fgets() into a buffer of size

>=strlen(line)+1, strips the '\n' from the end of the line

(if there is none, the line was too long, i.e. it does not
match and you can read characters until you have reached the
'\n' of this line) and uses strcmp() on the buffer and
line.
If you have a match, you can return 1. If you encounter an
error, you can return -1. Otherwise, you return 0.
If you want to _count_ the occurences, then I suggest

int OccurrencesOfLine(FILE *file, const char *line, size_t *count)

where you return 0 on success and an error code !=0 on failure.
The number of matching lines is stored in *count.

Note: You need of course <stdio.h> and <string.h> for both
ways I have suggested.

Cheers
Michael

Gregory Toomey · Feb 19, 2005

Jonny said:
Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count.

But I need to do it in a C program.

Many Thanks,
Jonny

Knuth-pratt-morris or boyer-moore
http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/

gtoomey

jcoffin · Feb 19, 2005

Gregory Toomey wrote:

[ ... ]

Knuth-pratt-morris or boyer-moore
http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/

With the proviso that you _rarely_ want to use Boyer-Moore as
originally defined -- nearly all practical use is of simplified
variants such as Boyer-Moore-Horspool (or Sunday's variant thereof). A
full-blown Boyer-Moore search is theoretically better in some sense,
but the savings rarely justify the initialization effort.

The development effort is considerable as well -- to the point that
incorrect implementations abound, even those published by the most
highly respected computer scientists (e.g. Knuth). In fact, for the
first 15 years after the algorithm was described, EVERY published
implementation seems to have contained at least one defect.

By contrasty, Sunday's variant of B-M-H is easy to set up, much easier
implement correctly (I'm among the few to have really screwed it up),
and will usually run faster to boot!

Jonny · Feb 19, 2005

Thanks to everyone for replying.

I do need a function which is efficient, so I'll probably need to try an
implementation of one of the more advanced algorithms suggested.

I found the section of code below, which claims to be an implementation
of Boyer-Moore-Horspool.

I would be grateful if someone could briefly tell me if and how I could
use it.

char *search( pat, text, n )
char *pat, *text;
int n;

{ int i, j, k, m, skip[MAXCHAR];
m = strlen(pat);
if( m==0 ) return( text );
for( k=0; k<MAXCHAR; k++ ) skip[k] = m;
for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1;

for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {
for( j=m-1, i=k; j>=0 && text == pat[j]; j-- ) i--;
if( j == (-1) ) return( text+i+1 );
}
return( NULL );
}

Regards,
Jonny

CBFalconer · Feb 19, 2005

Gregory said:
Knuth-pratt-morris or boyer-moore
http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/

For complete lines combining fgets and strcmp is hard to beat.
Normally strcmp will fail on the first char. or two, and fgets need
not read into a buffer significantly larger than the comparee. The
only complication will be stripping the '\n' off the end of file
lines, if present, and flushing the input line thru '\n' if not
present.

There is no need for more complex searches such as kmp or bm.

Michael Mair · Feb 20, 2005

Jonny said:
Thanks to everyone for replying.

I do need a function which is efficient, so I'll probably need to try an
implementation of one of the more advanced algorithms suggested.

Which is _complete_ overkill.
Both search for a comparatively small pattern in a comparatively
large _unstructured_ text.
You, OTOH, have a pattern which is exactly as large as the text
without the '\n'. That means that setting up the skip vector
about triples the cost.

The only disadvantage of fgets() is that it does not return or
make available the number of read characters so that you may
consider using getc() and comparing yourself.
Implement the "read line -- compare it" variant and one advanced
algorithm and just use a profiler or clock() or whatever to
find out which is better.

A "better" algorithm is not necessarily the best algorithm for
a given task -- whenever you have additional information about
the problem at hand, using a method tailored to your degree of
information is best.

I found the section of code below, which claims to be an implementation
of Boyer-Moore-Horspool.

I would be grateful if someone could briefly tell me if and how I could
use it.

You could, with some restrictions.
Why did you not just paste it into your code _and_ try it?
We are not here to counsel you about algorithms or code off some
dubious site in the web but about _your_ C code -- which you still
fail to show us.

char *search( pat, text, n )
char *pat, *text;
int n;

K&R style -- this is outdated by a good fifteen years.

char *search (const char *pat, const char *text, int n)

where you probably would change int n into size_t n and
would change the order of text and pat to fit the order
in standard library functions such as strstr().

{ int i, j, k, m, skip[MAXCHAR];
m = strlen(pat);
if( m==0 ) return( text );
for( k=0; k<MAXCHAR; k++ ) skip[k] = m;
for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1;

for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {

Here is a catch: The author assumes MAXCHAR to be 1<<s.
To repair this, make it skip[text[k] % MAXCHAR]
and leave the micro-optimisations to the compiler.

for( j=m-1, i=k; j>=0 && text == pat[j]; j-- ) i--;
if( j == (-1) ) return( text+i+1 );
}
return( NULL );
}

I have not closely looked at or tested the rest of the
code -- the latter is your job.
However, I think that you are trying to be lazy on the
thinking and programming part. This will not lead to the
most efficient solution.

Cheers
Michael

SM Ryan · Feb 20, 2005

# Hi,
#
# Please could you tell me what the quickest way would be to verify that a
# line of text exists in a file.

Set up an FSM and scan the file character by character. Then you don't have to worry
about line bufferring.

If the line you're interested in has N characters, the state machine would have
N+4 states:
s=0 initial state
0<=s<N s characters of a line have been read and match the given string.
if the next input character is the sth string character
then move to state s+1
else move to state N+2
s=N all N characters match
if the next input character is a new line or EOF
then move to state N+2
else move to state N+1
s=N+1 this line does not match the string
if the next input character is a new line
then move to state 0
else if the next input is an EOF
then move to state N+3
else
stay in state N+1
s=N+2 accepting final state: the line is in the file
s=N+3 rejecting final state: the line is not in the file

If you get the line out of argv, the only variables you need are the next input,
N, s, and argc and argv.

If the file length is M and the sought line is randomly distributed in the file,
the estimated running time is a little over O(M/2), and it is independent of N.
The storage cost is O(4+N).

Mark McIntyre · Feb 20, 2005

Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

1) d/l the source code for grep. Its in the gnu txtutils pack I think.

2) read the file line by line till you find a line that matches your
pattern.

Command Line Arguments	0	Mar 7, 2023
Using JS to verify registration info?	1	Mar 19, 2020
CodeHS #3.3.4 Horizontal line	6	Feb 12, 2025
How to read a file as binary or hex "string" so that I can do regex search?	3	Dec 18, 2024
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Unable to read input from keyboard, in below C code, for a BST.	0	Jul 20, 2025
Equation of a straight line	1	May 26, 2022
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023

Verify that line exists in a file

Jonny

Walter Roberson

Michael Mair

Gregory Toomey

jcoffin

Jonny

CBFalconer

Michael Mair

SM Ryan

Mark McIntyre

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads