Looking for advice on how to deal with array of structs

C

Cliff Martin

Hi,

I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.

For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007

I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)

My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.

I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:

* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.

* Is this the best way to (re)initialize a struct (init_cr)?

* How do I (re)initialize an array of struct?

* any comments on how I plan to tackle this

Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct {
char field1[11];
char field1[11];
time_t date_secs;
} record;

int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];

record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}

while(fgets(buf, len, fp)) {
init_cr(&cr);

/* split line into fields */
/* process fields and save in struct */

/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/
/* else */
/* print array of structs */
/* reset array ??? */
}
return 0;
}

void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}


Cliff
 
S

santosh

Cliff said:
Hi,

I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.

For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007

I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)

My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.
From your requirements it appears that a linked list will be a better
option than an array of structs. It's easy to keep it sorted as you add
elements to the list or remove them.
I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:

* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.

Depends on the desired lifetime for the structure instance.
* Is this the best way to (re)initialize a struct (init_cr)?

* How do I (re)initialize an array of struct?

Use a FOR loop.
* any comments on how I plan to tackle this

Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct {
char field1[11];
char field1[11];

Two objects in scope with the same identifier.
time_t date_secs;
} record;

int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;

This is not guaranteed to work on all C implementations.
record cr, lr;
int len = 512;
char buf[len+1];

Nor this.
record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);

Use EXIT_FAILURE instead of 1, unless you have a good reason for using
the latter.
}

while(fgets(buf, len, fp)) {

fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.
init_cr(&cr);

/* split line into fields */
/* process fields and save in struct */

/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/

Note that you cannot resize a statically allocated array, unless it's a
VLA. Allocate using malloc() and resize with realloc().
/* else */
/* print array of structs */
/* reset array ??? */

If the next iteration will rewrite to the array, then probably this is
not needed.
}
return 0;
}

void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}

I still feel a linked list of structures may serve your purpose better.
 
C

Cliff Martin

typedef struct {
char field1[11];
char field1[11];Two objects in scope with the same identifier.

should be field2, but just for the example. real code has about 12
different identifiers, all of which are unique.
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
This is not guaranteed to work on all C implementations.

assigning to a file pointer? So I should just open stdin like a regular
file if the fp is NULL? Is the a compiler option to force this usage?
I'm using gcc.
record cr, lr;
int len = 512;
char buf[len+1];
Nor this.

what's wrong with this?
Use EXIT_FAILURE instead of 1, unless you have a good reason for using
the latter.
ok.

fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.

did not know this. Use this style everywhere. I will correct it.
*/Note that you cannot resize a statically allocated array, unless it's a
VLA. Allocate using malloc() and resize with realloc().

What is a VLA - Very Large Array?

Thanks,

Cliff
 
C

Cliff Martin

OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.

The compiler does not catch the assigning stdin to fp, how can I
enforce that?

Cliff
 
C

Cliff Martin

OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.

Never mind about the mixing declarations and code, I just had some
lines out of order.

Cliff
 
C

Chris Dollin

Cliff Martin wrote:

(You need to quote more context. Remember not all of us are reading
this on Google Groups.)
OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.

The preceeding lines are:
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;

So you have a declaration (`record ...`) following a statement
(`fp = stdin;`). You can't do this in C90 (which is what the
-ansi -pedantic options get you, I believe) [you /can/ in C99,
and it's a widely available extension].

Easy fix:

FILE *fp = stdin;

Now it's a declaration.
The compiler does not catch the assigning stdin to fp, how can I
enforce that?

You don't need to. (I think santosh may have meant that you were
mixing declarations and statements: assigning `stdin` to a FILE*
variable seems OK to me.)
 
D

David T. Ashley

santosh said:
option than an array of structs. It's easy to keep it sorted as you add
elements to the list or remove them.

A somewhat simpler data structure is an array pointers, each of which points
to a structure.

In this case, insert operations boil down to allocating memory for the new
structure, perhaps realloc()'ing the array of pointers (I'd arrange not to
do this every time), then moving a bunch of pointers down by one element and
putting the new pointer in.

It is likely that memmove()'ing a bunch of pointers is not more expensive
than traversing a linked list to find the right insert point.

However, on a modern machine and with less than 1E6 records, I doubt anyone
would notice a difference.
 
C

CBFalconer

Cliff said:
OK, I am using the command line options -ansi and -pedantic. They
tell me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.

Possibly because you are mixing declarations and code.
The compiler does not catch the assigning stdin to fp, how can I
enforce that?

The crystal ball is murky, but appears to be saying 'line 142'.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
D

Dave Thompson

Cliff Martin wrote:

This is not guaranteed to work on all C implementations.
Yes it is. Well, all conforming ones except freestanding which don't
have stdio at all, but I don't think that's what you meant.

Copying an _actual_ FILE
FILE fobject = * stdin;
and using it is not guaranteed to work, although usually it does.

Assigning _to_ stdin,out,err is not guaranteed to work, and often
doesn't.

Copying any FILE* including stdin,out,err to another pointer and using
it (consistent with the open and any other uses) is legal, and
sometimes useful, although in this example (snipped) it was not useful
because it was subsequently unconditionally overwritten.
char buf[len+1];

Nor this.
Correct there.
fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.

Up to len-1 characters from the input including the \n if reached,
PLUS the terminating null byte, for a total of at most the size given,
so yes sizeof(buf) would do.

Nit: and the parentheses aren't needed for sizeof primaryexpr, nor
sizeof *ptr or sizeof ptr->field or sizeof obj.field, although they
are for some more complicated expressions and typenames.

- David.Thompson1 at worldnet.att.net
 
C

CBFalconer

Dave said:
.... snip ...

Copying any FILE* including stdin,out,err to another pointer and
using it (consistent with the open and any other uses) is legal,
and sometimes useful, although in this example (snipped) it was
not useful because it was subsequently unconditionally overwritten.

No it wasn't (unconditionally overwritten). Look again. That was
the whole point of the extract, since stdin is a text file, and
when overwritten it would be a binary file.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
D

Dave Thompson

No it wasn't (unconditionally overwritten). Look again. That was
the whole point of the extract, since stdin is a text file, and
when overwritten it would be a binary file.

From original:
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];

record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}

while(fgets(buf, len, fp)) {
<snip rest>

Looks unconditional to me.

Setting a file pointer to either stdin or fopen (argv, m) and then
using it is indeed often useful, but not in the code posted.

- David.Thompson1 at worldnet.att.net
 
C

CBFalconer

Dave said:
.... snip ...

From original:
int main(int argc, char *argv[]) {
.... snip ...
fp = fopen(argv[1], "r");
<snip rest>

Looks unconditional to me.

Setting a file pointer to either stdin or fopen (argv, m) and then
using it is indeed often useful, but not in the code posted.


We are talking about different code.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top