reading file backwards and parsing

M

Matt DeFoor

I have some log files that I'm working with that look like this:

1000000000 3456 1234
1000000001 3456 1235
1000020002 3456 1223
1000203044 3456 986
etc.

I'm trying to read the file backwards and just look at the first
column. Here's what I've got so far:

in=fopen(fpath,"rb");
if (in!=NULL) {
fseek(in,0,SEEK_END);
back1line(in); /* function that goes back 1 line */

while (1) {
pos=ftell(in);
fgets(buffer,1024,in);
buffer[strlen(buffer)-1]=0;

printf("line=%s\n", buffer);

len=strlen(buffer);
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
memset(cptrs,0,sizeof(cptrs));
i=0;
cptrs=buffer;
while (cptrs && i<3) {
++i;
cptrs=strchr(cptrs[i-1],' ');
if (cptrs==NULL) {
printf("we break in here\n");
break;
}
*cptrs=0;
++cptrs;
}
lprintf(0,"got 0 = <%d>\n",atoi(cptrs[0]));
lprintf(0,"got 1 = <%d>\n",atoi(cptrs[1]));
lprintf(0,"got 2 = <%d>\n",atoi(cptrs[2]));
fseek(in,pos,0);
if (back1line(in)!=0)
return -1;
}
}
else
rc=1;
fclose(in);

This just prints out the elements of the last line of the file and I'm
not sure why. If I replace the while loop that splits on ' ' with a
while loop that uses strtok to tokenize the string, it works great.
But, it seems to me that the above should work. Then again, what do I
know?

TIA!

-matt
 
E

Eric Sosman

Matt said:
I have some log files that I'm working with that look like this:

1000000000 3456 1234
1000000001 3456 1235
1000020002 3456 1223
1000203044 3456 986
etc.

I'm trying to read the file backwards and just look at the first
column. Here's what I've got so far:
[snipped]

One obstacle to diagnosing your difficulty is that we're
forced to make guesses about the portions of your code that
you didn't provide. You omitted all the declarations (the
one I'm most interested in is `cptrs'), and you omitted the
definitions of the back1line() and lprintf() functions.

Albert Einstein is supposed to have said "Things should
be made as simple as possible, but no simpler." In your zeal
for brevity I fear you've violated the second part of his
advice ...

Please post the shortest *complete* program that demonstrates
your problem. Otherwise, we're all sitting here debugging our
own suppositions about what you've omitted.
 
M

Matt DeFoor

Eric Sosman said:
One obstacle to diagnosing your difficulty is that we're
forced to make guesses about the portions of your code that
you didn't provide. You omitted all the declarations (the
one I'm most interested in is `cptrs'), and you omitted the
definitions of the back1line() and lprintf() functions.

Please post the shortest *complete* program that demonstrates
your problem. Otherwise, we're all sitting here debugging our
own suppositions about what you've omitted.

Sorry about. As soon I posted I realized that I had left my custom
printf function in as well as omitting a few others. Apologies. I have
another apology to make as well. This program is meant to be compiled
with Metrowerks CodeWarrior where it doesn't work properly (yet it
compiles). However, I've since tested and compiled on my rusty, yet
trusty, Linux box and it works there with a small modification by
typecasting line 62 as an int.

With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.

Anyway, here it is. In all its horrid glory:

#include <stdlib.h>
#include <stdio.h>

int backtonewline (FILE *fp) {
char ch;
long pos;
int rc;

rc=fseek(fp, -1L, 1);
if (rc==-1)
return -1;

while (1) {
ch=fgetc(fp);
if (ch=='\n')
break;
rc=fseek(fp,-2L,1);
if (rc==-1)
return -1;
}
fseek(fp,-1L,1);
return 0;
}

int back1line (FILE *fp) {
if (backtonewline (fp) != 0) return -1;
if (backtonewline (fp) != 0) return -1;
fseek(fp,1L,1);
return 0;
}


int main () {
FILE *in;
int rc,len,i;
char buffer[100];
char *cptrs[3];
long pos;

len=strlen(buffer);
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
in=fopen("log","rb");
if (in!=NULL) {
fseek(in,0,SEEK_END);
back1line(in);

while (1) {
pos=ftell(in);
fgets(buffer,100,in);
buffer[strlen(buffer)-1]=0;

len=strlen(buffer);
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
printf("line=%s\n", buffer);
memset(cptrs,0,sizeof(cptrs));
i=0;
cptrs=buffer;
while (cptrs && i<3) {
++i;
(int *)cptrs=strchr(cptrs[i-1],' '); /* typecast change to
work on Linux */
if (cptrs==NULL) {
printf("we break in here\n");
break;
}
*cptrs=0;
++cptrs;
}
printf("got 0 = <%d>\n",atoi(cptrs[0]));
printf("got 1 = <%d>\n",atoi(cptrs[1]));
printf("got 2 = <%d>\n",atoi(cptrs[2]));
fseek(in,pos,0);
if (back1line(in)!=0)
rc=1;
}
}
else
rc=1;
fclose(in);

}

Cheers,
Matt
 
B

Barry Schwarz

Sorry about. As soon I posted I realized that I had left my custom
printf function in as well as omitting a few others. Apologies. I have
another apology to make as well. This program is meant to be compiled
with Metrowerks CodeWarrior where it doesn't work properly (yet it
compiles). However, I've since tested and compiled on my rusty, yet
trusty, Linux box and it works there with a small modification by
typecasting line 62 as an int.

While I believe you tested something, it was not this code. It
doesn't compile clean. Did you cut and paste or retype the code?
With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.

Anyway, here it is. In all its horrid glory:

#include <stdlib.h>
#include <stdio.h>

int backtonewline (FILE *fp) {
char ch;
long pos;
int rc;

rc=fseek(fp, -1L, 1);

-1 would work just as well as -1L. But how do you know that SEEK_CUR
is 1 on all the systems you compile it on?
if (rc==-1)

fseek can return any non-zero value on error. Are you sure each of
your systems returns -1?
return -1;

while (1) {
ch=fgetc(fp);

fgetc returns an int, not a char. You need that to check for errors.
if (ch=='\n')
break;
rc=fseek(fp,-2L,1);
if (rc==-1)
return -1;
}
fseek(fp,-1L,1);

This positions you 1 character before the '\n'.
return 0;
}

int back1line (FILE *fp) {
if (backtonewline (fp) != 0) return -1;

If successful, this positions you one character before the '\n' before
the current line,
if (backtonewline (fp) != 0) return -1;

If successful, this positions you one character before the '\n' before
the previous line.

After processing line 2, this will fail because there is no '\n'
before line 1 and obviously no character before that.
fseek(fp,1L,1);

This positions you at the '\n' before the previous line.
return 0;
}


int main () {
FILE *in;
int rc,len,i;
char buffer[100];
char *cptrs[3];
long pos;

len=strlen(buffer);

You forgot to include string.h for strlen, memset, strchr, etc.

buffer is uninitialized. This invokes undefined behavior. I assume
this is out of sequence?
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
in=fopen("log","rb");
if (in!=NULL) {
fseek(in,0,SEEK_END);
back1line(in);

If your file ends with a '\n' this will work. I believe that is
implementation defined. If there is no '\n', then you will skip the
last line and start with the one before it.
while (1) {
pos=ftell(in);
fgets(buffer,100,in);

Since back1line left you pointed at the '\n', you will only read in
that one character.
buffer[strlen(buffer)-1]=0;

This assumes the line was less than 99 characters. Are you sure?
len=strlen(buffer);

Did you really want to call strlen twice?
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))

The only possible '\n' was at the end of the string, just before the
'\0', and you removed it two statements earlier.
buffer[len-1] = '\0';
printf("line=%s\n", buffer);
memset(cptrs,0,sizeof(cptrs));

All bits 0 is not necessarily a valid value for a pointer. Why do you
bother since you initialize each cptrs as needed?

Undefined behavior in C89 because memset is assumed to return an int
which is not true.
i=0;
cptrs=buffer;
while (cptrs && i<3) {
++i;
(int *)cptrs=strchr(cptrs[i-1],' '); /* typecast change to
work on Linux */


More undefined behavior or it would be if it wasn't for the syntax
error.

Tell us this was meant as a joke. The result of a cast is not a
modifiable l-value and therefore may not appear as the destination of
an assignment operator. Why did your note in the beginning say cast
to int when here you cast to int*?
if (cptrs==NULL) {
printf("we break in here\n");
break;
}
*cptrs=0;
++cptrs;
}
printf("got 0 = <%d>\n",atoi(cptrs[0]));
printf("got 1 = <%d>\n",atoi(cptrs[1]));
printf("got 2 = <%d>\n",atoi(cptrs[2]));


If you break out of the previous while loop because cptrs is NULL,
then at least one of these calls to printf invokes undefined behavior.
fseek(in,pos,0);

Is 0 guaranteed to be SEEK_SET?
if (back1line(in)!=0)
rc=1;
}
}
else
rc=1;
fclose(in);

}

Please provide the real code.


<<Remove the del for email>>
 
M

Matt DeFoor

(e-mail address removed) (Matt DeFoor) wrote in message
Sorry about. As soon I posted I realized that I had left my custom
printf function in as well as omitting a few others. Apologies. I have
another apology to make as well. This program is meant to be compiled
with Metrowerks CodeWarrior where it doesn't work properly (yet it
compiles). However, I've since tested and compiled on my rusty, yet
trusty, Linux box and it works there with a small modification by
typecasting line 62 as an int.

Anyway, here it is. In all its horrid glory:

#include <stdlib.h>
#include <stdio.h>

Forgot to include string.h.

-matt
 
E

Eric Sosman

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

My crystal ball is obviously in good working order. Send
me a GIF of your palm and I'll read your future -- for a small
fee, calculated in `double' on the DeathStation 9000 ...
Please post the shortest *complete* program [...]

Anyway, here it is. In all its horrid glory:
[...]

int main () {
FILE *in;
int rc,len,i;
char buffer[100];
char *cptrs[3];
long pos;

So `cptrs' is an array of three pointers, called cptrs[0]
through cptrs[2]. Further along:
i=0;
cptrs=buffer;
while (cptrs && i<3) {


This test can succeed if `i' is equal to 2 ...

.... in which case the next line changes `i' to 3 ...
(int *)cptrs=strchr(cptrs[i-1],' '); /* typecast change to
work on Linux */


.... and you're now trying to store something into cptrs[3],
which doesn't exist. (The cast is bogus; rather than quieting
a warning, it should have prevented the program from compiling
at all. I suspect you're operating the compiler in a non-
conforming mode; if it's gcc, try using the "-ansi -pedantic"
flags, along with "-W -Wall". The proper way to quiet the
warning would have been to #include <string.h>; without that
inclusion your use of strchr(), strlen(), and memset() is not
only suspect, but flat-out incorrect.)

Once you try to store something in an array element that
doesn't exist, all bets are off. One quite likely (but not
guaranteed) outcome is that some other variable that just
happens to reside next to cptrs[2] will get clobbered. It's
likely (although again, not guaranteed) that the next-door
neighbor will be one of `pos' or `buffer' -- and if you've
managed to dump garbage in either of those, there's a good
chance your program will misbehave.
With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.

This really isn't enough of an explanation to support an
informed recommendation. If you're trying to process the entire
log backwards, you'll be much better off reading it in the
forward direction and rearranging the processing. If you're
only interested in the last N lines (for smallish N), you may
do well to make a guess about how long those lines are, fseek()
to a position shortly before the estimated start of the group,
read and store the entire tail end of the file, and then figure
out which lines are which.
 
M

Matt DeFoor

Eric Sosman said:
This really isn't enough of an explanation to support an
informed recommendation. If you're trying to process the entire
log backwards, you'll be much better off reading it in the
forward direction and rearranging the processing. If you're
only interested in the last N lines (for smallish N), you may
do well to make a guess about how long those lines are, fseek()
to a position shortly before the estimated start of the group,
read and store the entire tail end of the file, and then figure
out which lines are which.

I need to search through the whole file, line by line, to find the
most recent entry that matches a certain criteria (e.g. 3 months ago
from today). So, I thought that reading the file backwards would be
the fastest and best approach.

-matt
 
O

Old Wolf

Barry Schwarz said:
Matt DeFoor wrote: [without including string.h]
memset(cptrs,0,sizeof(cptrs));

Undefined behavior in C89 because memset is assumed to return an int
which is not true.

Isn't this OK as long as the return value is not used?
 
R

Rob Thorpe

I need to search through the whole file, line by line, to find the
most recent entry that matches a certain criteria (e.g. 3 months ago
from today). So, I thought that reading the file backwards would be
the fastest and best approach.

-matt

If you have control over the program creating the log file the best
solution is probably:

1) Modify that application so it creates a separate log file for every
day/week/month (pick most appropriate) named by date.
2) Have your program search those log file in reverse date order, but
*forwards*
 
J

Joe Wright

Matt said:
I need to search through the whole file, line by line, to find the
most recent entry that matches a certain criteria (e.g. 3 months ago
from today). So, I thought that reading the file backwards would be
the fastest and best approach.

-matt
I haven't gone back through this thread so I don't know what advice
you already have. I'd do it something like this..

Let's say the file is a series of key=value pairs written
sequentially over time. You want to know the last value associated
with key. For sake of argument, we are looking for a key named PASS
which will occur several times in the file. We are interested to
know where the 'last' occurrence is.

Read the file line by line with fgets(), remembering (saving) the
address of the beginning of the line as returned by ftell().

long pass = 0;
long tell = 0;
char line[ENOUGH];
char *cp;

/* Open your file in text mode and prepare to read it to the end. */

while (fgets(line, sizeof line, fp) != NULL) {
if ((cp = strstr(line, "PASS=")) != NULL)
pass = tell; /* the beginning of this line */
tell = ftell(fp); /* the beginning of the next line */
}
fseek(fp, pass, SEEK_SET); /* last line with "PASS=" */
fgets(line, sizeof line, fp); /* read the line */
cp = strchr(line, '=') + 1; /* point to the value */

This is not a program. It's a hint.
 
C

CBFalconer

Matt said:
.... snip ...

I need to search through the whole file, line by line, to find
the most recent entry that matches a certain criteria (e.g. 3
months ago from today). So, I thought that reading the file
backwards would be the fastest and best approach.

Why didn't you say so in the first place! Just read it in the
normal forward direction, and whenever the criterion is satisfied
note where you are, overwriting the previous note. When you hit
EOF the note will specify the last position. Pseudo code:

locn = NOWHERE;
where = STARTOFFILE;
while (fggets(&ln, f) {
if (findin(ln, criterion) locn = where;
where = currentposition(f);
free(ln);
}

assuming the use of ggets.zip available on my site.
 
B

Barry Schwarz

Barry Schwarz said:
Matt DeFoor wrote: [without including string.h]
memset(cptrs,0,sizeof(cptrs));

Undefined behavior in C89 because memset is assumed to return an int
which is not true.

Isn't this OK as long as the return value is not used?

Consider the situation where returned pointers are stored in one set
of registers and returned integers are stored in another set of
registers. Since memset will return a pointer it will update one of
the registers in the first set. The compiler, thinking that memset
returns an int, is allowed to assume that all the values it has
previously loaded in that set are still intact. Any subsequent code
that uses the changed register has got a problem because the value the
compiler thinks is there is not.


<<Remove the del for email>>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,190
Latest member
Martindap

Latest Threads

Top