C
Cliff Martin
Hi,
I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.
For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007
I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)
My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.
I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:
* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.
* Is this the best way to (re)initialize a struct (init_cr)?
* How do I (re)initialize an array of struct?
* any comments on how I plan to tackle this
Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
typedef struct {
char field1[11];
char field1[11];
time_t date_secs;
} record;
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];
record records[11];
fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}
while(fgets(buf, len, fp)) {
init_cr(&cr);
/* split line into fields */
/* process fields and save in struct */
/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/
/* else */
/* print array of structs */
/* reset array ??? */
}
return 0;
}
void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}
Cliff
I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.
For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007
I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)
My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.
I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:
* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.
* Is this the best way to (re)initialize a struct (init_cr)?
* How do I (re)initialize an array of struct?
* any comments on how I plan to tackle this
Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
typedef struct {
char field1[11];
char field1[11];
time_t date_secs;
} record;
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];
record records[11];
fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}
while(fgets(buf, len, fp)) {
init_cr(&cr);
/* split line into fields */
/* process fields and save in struct */
/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/
/* else */
/* print array of structs */
/* reset array ??? */
}
return 0;
}
void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}
Cliff