Looking for advice on how to deal with array of structs

Discussion in 'C Programming' started by Cliff Martin, Jan 25, 2007.

  1. Cliff  Martin

    Cliff Martin Guest

    Hi,

    I am reading a fairly large file a line at a time, doing some
    processing, and filtering out bits of the line. I am storing the
    interesting information in a struct and then printing it out. This
    works without any problems. I now would like to filter "duplicate"
    records. They aren't really duplicate, I can't just qsort as is and
    eliminate the matching rows. I have number of fields that will be the
    same, but some of the fields will differ and the timestamp may be off
    by a second or two. I want to eliminate records that have the fields
    that match where the difference between timestamps is less than n
    seconds. Again, this is not a problem, I can get seconds since epoch
    and compare, or just use difftime. My problem is this, I want to build
    an array of structs, that will hold lines that match, sort them on the
    date member and then eliminate based on matching timestamps.

    For example in the set:
    foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
    foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
    foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
    foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
    foo,baz ..., Thu Jan 25 01:40:11 EST 2007

    I would like to read the first 4 lines into structs, store them in
    array, sort them and then print them out, while eliminating the
    "duplicate" line 4. I would not want to read the 5th line, yet, because
    it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
    ignoring the field that has a or b in it (one of the reasons a simple
    sort will not work)

    My problem comes from not knowing where to create the array, what size
    to allocate, and how to re-initialize it when I move to the next set to
    sort.

    I have included a mix of psuedo code and real code. I have also made
    everything generic. My ultimate questions are:

    * Is it a good idea to declare the struct instance outside the while
    loop and the reinitialize it every time through the loop, or would it
    be better to make it local? The real struct is 132 bytes.

    * Is this the best way to (re)initialize a struct (init_cr)?

    * How do I (re)initialize an array of struct?

    * any comments on how I plan to tackle this

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>

    typedef struct {
    char field1[11];
    char field1[11];
    time_t date_secs;
    } record;

    int main(int argc, char *argv[]) {
    FILE *fp;
    fp = stdin;
    record cr, lr;
    int len = 512;
    char buf[len+1];

    record records[11];

    fp = fopen(argv[1], "r");
    if(fp == NULL) {
    fputs("Could not open file for reading", stderr);
    exit(1);
    }

    while(fgets(buf, len, fp)) {
    init_cr(&cr);

    /* split line into fields */
    /* process fields and save in struct */

    /* if the fields of interest match last fields */
    /* save struct in next position in array */
    /* increment counter for how many structs are saved */
    /* resize the array to accommodate more structs if needed
    */
    /* else */
    /* print array of structs */
    /* reset array ??? */
    }
    return 0;
    }

    void init_cr(callrecord *cr) {
    cr->field1[0] = '\0';
    cr->field2[0] = '\0';
    cr->date_secs = 0;
    }


    Cliff
     
    Cliff Martin, Jan 25, 2007
    #1
    1. Advertising

  2. Cliff  Martin

    santosh Guest

    Cliff Martin wrote:
    > Hi,
    >
    > I am reading a fairly large file a line at a time, doing some
    > processing, and filtering out bits of the line. I am storing the
    > interesting information in a struct and then printing it out. This
    > works without any problems. I now would like to filter "duplicate"
    > records. They aren't really duplicate, I can't just qsort as is and
    > eliminate the matching rows. I have number of fields that will be the
    > same, but some of the fields will differ and the timestamp may be off
    > by a second or two. I want to eliminate records that have the fields
    > that match where the difference between timestamps is less than n
    > seconds. Again, this is not a problem, I can get seconds since epoch
    > and compare, or just use difftime. My problem is this, I want to build
    > an array of structs, that will hold lines that match, sort them on the
    > date member and then eliminate based on matching timestamps.
    >
    > For example in the set:
    > foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
    > foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
    > foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
    > foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
    > foo,baz ..., Thu Jan 25 01:40:11 EST 2007
    >
    > I would like to read the first 4 lines into structs, store them in
    > array, sort them and then print them out, while eliminating the
    > "duplicate" line 4. I would not want to read the 5th line, yet, because
    > it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
    > ignoring the field that has a or b in it (one of the reasons a simple
    > sort will not work)
    >
    > My problem comes from not knowing where to create the array, what size
    > to allocate, and how to re-initialize it when I move to the next set to
    > sort.


    >From your requirements it appears that a linked list will be a better

    option than an array of structs. It's easy to keep it sorted as you add
    elements to the list or remove them.

    > I have included a mix of psuedo code and real code. I have also made
    > everything generic. My ultimate questions are:
    >
    > * Is it a good idea to declare the struct instance outside the while
    > loop and the reinitialize it every time through the loop, or would it
    > be better to make it local? The real struct is 132 bytes.


    Depends on the desired lifetime for the structure instance.

    > * Is this the best way to (re)initialize a struct (init_cr)?
    >
    > * How do I (re)initialize an array of struct?


    Use a FOR loop.

    > * any comments on how I plan to tackle this
    >
    > Code:
    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <time.h>
    >
    > typedef struct {
    > char field1[11];
    > char field1[11];


    Two objects in scope with the same identifier.

    > time_t date_secs;
    > } record;
    >
    > int main(int argc, char *argv[]) {
    > FILE *fp;
    > fp = stdin;


    This is not guaranteed to work on all C implementations.

    > record cr, lr;
    > int len = 512;
    > char buf[len+1];


    Nor this.

    > record records[11];
    >
    > fp = fopen(argv[1], "r");
    > if(fp == NULL) {
    > fputs("Could not open file for reading", stderr);
    > exit(1);


    Use EXIT_FAILURE instead of 1, unless you have a good reason for using
    the latter.

    > }
    >
    > while(fgets(buf, len, fp)) {


    fgets() will store only len-1 characters into the buffer so sizeof(buf)
    would do.

    > init_cr(&cr);
    >
    > /* split line into fields */
    > /* process fields and save in struct */
    >
    > /* if the fields of interest match last fields */
    > /* save struct in next position in array */
    > /* increment counter for how many structs are saved */
    > /* resize the array to accommodate more structs if needed
    > */


    Note that you cannot resize a statically allocated array, unless it's a
    VLA. Allocate using malloc() and resize with realloc().

    > /* else */
    > /* print array of structs */
    > /* reset array ??? */


    If the next iteration will rewrite to the array, then probably this is
    not needed.

    > }
    > return 0;
    > }
    >
    > void init_cr(callrecord *cr) {
    > cr->field1[0] = '\0';
    > cr->field2[0] = '\0';
    > cr->date_secs = 0;
    > }


    I still feel a linked list of structures may serve your purpose better.
     
    santosh, Jan 25, 2007
    #2
    1. Advertising

  3. Cliff  Martin

    Cliff Martin Guest

    > > typedef struct {
    > > char field1[11];
    > > char field1[11];Two objects in scope with the same identifier.


    should be field2, but just for the example. real code has about 12
    different identifiers, all of which are unique.

    > > int main(int argc, char *argv[]) {
    > > FILE *fp;
    > > fp = stdin;

    > This is not guaranteed to work on all C implementations.


    assigning to a file pointer? So I should just open stdin like a regular
    file if the fp is NULL? Is the a compiler option to force this usage?
    I'm using gcc.

    > > record cr, lr;
    > > int len = 512;
    > > char buf[len+1];

    > Nor this.


    what's wrong with this?

    > > exit(1);

    >Use EXIT_FAILURE instead of 1, unless you have a good reason for using
    > the latter.


    ok.

    > > while(fgets(buf, len, fp)) {

    > fgets() will store only len-1 characters into the buffer so sizeof(buf)
    > would do.


    did not know this. Use this style everywhere. I will correct it.

    > */Note that you cannot resize a statically allocated array, unless it's a
    > VLA. Allocate using malloc() and resize with realloc().


    What is a VLA - Very Large Array?

    Thanks,

    Cliff
     
    Cliff Martin, Jan 25, 2007
    #3
  4. Cliff  Martin

    Cliff Martin Guest

    OK, I am using the command line options -ansi and -pedantic. They tell
    me some things I'm doing wrong.

    I don't understand why this is wrong:

    record cr, lr;

    The compiler complains I'm mixing declarations and code.

    The compiler does not catch the assigning stdin to fp, how can I
    enforce that?

    Cliff
     
    Cliff Martin, Jan 25, 2007
    #4
  5. Cliff  Martin

    Cliff Martin Guest

    On Jan 25, 9:16 am, "Cliff Martin" <> wrote:
    > OK, I am using the command line options -ansi and -pedantic. They tell
    > me some things I'm doing wrong.
    >
    > I don't understand why this is wrong:
    >
    > record cr, lr;
    >
    > The compiler complains I'm mixing declarations and code.
    >


    Never mind about the mixing declarations and code, I just had some
    lines out of order.

    Cliff
     
    Cliff Martin, Jan 25, 2007
    #5
  6. Cliff  Martin

    Chris Dollin Guest

    Cliff Martin wrote:

    (You need to quote more context. Remember not all of us are reading
    this on Google Groups.)

    > OK, I am using the command line options -ansi and -pedantic. They tell
    > me some things I'm doing wrong.
    >
    > I don't understand why this is wrong:
    >
    > record cr, lr;
    >
    > The compiler complains I'm mixing declarations and code.


    The preceeding lines are:

    > > int main(int argc, char *argv[]) {
    > > FILE *fp;
    > > fp = stdin;


    So you have a declaration (`record ...`) following a statement
    (`fp = stdin;`). You can't do this in C90 (which is what the
    -ansi -pedantic options get you, I believe) [you /can/ in C99,
    and it's a widely available extension].

    Easy fix:

    FILE *fp = stdin;

    Now it's a declaration.

    > The compiler does not catch the assigning stdin to fp, how can I
    > enforce that?


    You don't need to. (I think santosh may have meant that you were
    mixing declarations and statements: assigning `stdin` to a FILE*
    variable seems OK to me.)

    --
    Chris "electric hedgehog" Dollin
    "A facility for quotation covers the absence of original thought." /Gaudy Night/
     
    Chris Dollin, Jan 25, 2007
    #6
  7. "santosh" <> wrote in message
    news:...
    > Cliff Martin wrote:
    >> Hi,
    >>
    >> I am reading a fairly large file a line at a time, doing some
    >> processing, and filtering out bits of the line. I am storing the
    >> interesting information in a struct and then printing it out. This
    >> works without any problems. I now would like to filter "duplicate"
    >> records. They aren't really duplicate, I can't just qsort as is and
    >> eliminate the matching rows. I have number of fields that will be the
    >> same, but some of the fields will differ and the timestamp may be off
    >> by a second or two. I want to eliminate records that have the fields
    >> that match where the difference between timestamps is less than n
    >> seconds. Again, this is not a problem, I can get seconds since epoch
    >> and compare, or just use difftime. My problem is this, I want to build
    >> an array of structs, that will hold lines that match, sort them on the
    >> date member and then eliminate based on matching timestamps.
    >>
    >> For example in the set:
    >> foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
    >> foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
    >> foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
    >> foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
    >> foo,baz ..., Thu Jan 25 01:40:11 EST 2007
    >>
    >> I would like to read the first 4 lines into structs, store them in
    >> array, sort them and then print them out, while eliminating the
    >> "duplicate" line 4. I would not want to read the 5th line, yet, because
    >> it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
    >> ignoring the field that has a or b in it (one of the reasons a simple
    >> sort will not work)
    >>
    >> My problem comes from not knowing where to create the array, what size
    >> to allocate, and how to re-initialize it when I move to the next set to
    >> sort.

    >
    >>From your requirements it appears that a linked list will be a better

    > option than an array of structs. It's easy to keep it sorted as you add
    > elements to the list or remove them.


    A somewhat simpler data structure is an array pointers, each of which points
    to a structure.

    In this case, insert operations boil down to allocating memory for the new
    structure, perhaps realloc()'ing the array of pointers (I'd arrange not to
    do this every time), then moving a bunch of pointers down by one element and
    putting the new pointer in.

    It is likely that memmove()'ing a bunch of pointers is not more expensive
    than traversing a linked list to find the right insert point.

    However, on a modern machine and with less than 1E6 records, I doubt anyone
    would notice a difference.

    --
    David T. Ashley ()
    http://www.e3ft.com (Consulting Home Page)
    http://www.dtashley.com (Personal Home Page)
    http://gpl.e3ft.com (GPL Publications and Projects)
     
    David T. Ashley, Jan 25, 2007
    #7
  8. Cliff  Martin

    CBFalconer Guest

    Cliff Martin wrote:
    >
    > OK, I am using the command line options -ansi and -pedantic. They
    > tell me some things I'm doing wrong.
    >
    > I don't understand why this is wrong:
    >
    > record cr, lr;
    >
    > The compiler complains I'm mixing declarations and code.


    Possibly because you are mixing declarations and code.

    >
    > The compiler does not catch the assigning stdin to fp, how can I
    > enforce that?


    The crystal ball is murky, but appears to be saying 'line 142'.

    --
    <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

    "A man who is right every time is not likely to do very much."
    -- Francis Crick, co-discover of DNA
    "There is nothing more amazing than stupidity in action."
    -- Thomas Matthews
     
    CBFalconer, Jan 25, 2007
    #8
  9. On 24 Jan 2007 23:52:01 -0800, "santosh" <>
    wrote:

    > Cliff Martin wrote:


    > > #include <stdio.h>


    > > FILE *fp;
    > > fp = stdin;

    >
    > This is not guaranteed to work on all C implementations.
    >

    Yes it is. Well, all conforming ones except freestanding which don't
    have stdio at all, but I don't think that's what you meant.

    Copying an _actual_ FILE
    FILE fobject = * stdin;
    and using it is not guaranteed to work, although usually it does.

    Assigning _to_ stdin,out,err is not guaranteed to work, and often
    doesn't.

    Copying any FILE* including stdin,out,err to another pointer and using
    it (consistent with the open and any other uses) is legal, and
    sometimes useful, although in this example (snipped) it was not useful
    because it was subsequently unconditionally overwritten.

    > > char buf[len+1];

    >
    > Nor this.
    >

    Correct there.

    > > while(fgets(buf, len, fp)) {

    >
    > fgets() will store only len-1 characters into the buffer so sizeof(buf)
    > would do.


    Up to len-1 characters from the input including the \n if reached,
    PLUS the terminating null byte, for a total of at most the size given,
    so yes sizeof(buf) would do.

    Nit: and the parentheses aren't needed for sizeof primaryexpr, nor
    sizeof *ptr or sizeof ptr->field or sizeof obj.field, although they
    are for some more complicated expressions and typenames.

    - David.Thompson1 at worldnet.att.net
     
    Dave Thompson, Feb 6, 2007
    #9
  10. Cliff  Martin

    CBFalconer Guest

    Dave Thompson wrote:
    >

    .... snip ...
    >
    > Copying any FILE* including stdin,out,err to another pointer and
    > using it (consistent with the open and any other uses) is legal,
    > and sometimes useful, although in this example (snipped) it was
    > not useful because it was subsequently unconditionally overwritten.


    No it wasn't (unconditionally overwritten). Look again. That was
    the whole point of the extract, since stdin is a text file, and
    when overwritten it would be a binary file.

    --
    <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
    <http://www.securityfocus.com/columnists/423>

    "A man who is right every time is not likely to do very much."
    -- Francis Crick, co-discover of DNA
    "There is nothing more amazing than stupidity in action."
    -- Thomas Matthews
     
    CBFalconer, Feb 6, 2007
    #10
  11. On Tue, 06 Feb 2007 08:12:30 -0500, CBFalconer <>
    wrote:

    > Dave Thompson wrote:
    > >

    > ... snip ...
    > >
    > > Copying any FILE* including stdin,out,err to another pointer and
    > > using it (consistent with the open and any other uses) is legal,
    > > and sometimes useful, although in this example (snipped) it was
    > > not useful because it was subsequently unconditionally overwritten.

    >
    > No it wasn't (unconditionally overwritten). Look again. That was
    > the whole point of the extract, since stdin is a text file, and
    > when overwritten it would be a binary file.


    From original:
    > int main(int argc, char *argv[]) {
    > FILE *fp;
    > fp = stdin;
    > record cr, lr;
    > int len = 512;
    > char buf[len+1];
    >
    > record records[11];
    >
    > fp = fopen(argv[1], "r");
    > if(fp == NULL) {
    > fputs("Could not open file for reading", stderr);
    > exit(1);
    > }
    >
    > while(fgets(buf, len, fp)) {

    <snip rest>

    Looks unconditional to me.

    Setting a file pointer to either stdin or fopen (argv, m) and then
    using it is indeed often useful, but not in the code posted.

    - David.Thompson1 at worldnet.att.net
     
    Dave Thompson, Feb 15, 2007
    #11
  12. Cliff  Martin

    CBFalconer Guest

    Dave Thompson wrote:
    >

    .... snip ...
    >
    > From original:
    > > int main(int argc, char *argv[]) {

    >

    .... snip ...
    > fp = fopen(argv[1], "r");


    > <snip rest>
    >
    > Looks unconditional to me.
    >
    > Setting a file pointer to either stdin or fopen (argv, m) and then
    > using it is indeed often useful, but not in the code posted.


    We are talking about different code.

    --
    <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
    <http://www.securityfocus.com/columnists/423>

    "A man who is right every time is not likely to do very much."
    -- Francis Crick, co-discover of DNA
    "There is nothing more amazing than stupidity in action."
    -- Thomas Matthews
     
    CBFalconer, Feb 15, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patricia  Van Hise

    structs with fields that are structs

    Patricia Van Hise, Apr 5, 2004, in forum: C Programming
    Replies:
    5
    Views:
    662
    Al Bowers
    Apr 5, 2004
  2. Chris Hauxwell

    const structs in other structs

    Chris Hauxwell, Apr 23, 2004, in forum: C Programming
    Replies:
    6
    Views:
    579
    Chris Hauxwell
    Apr 27, 2004
  3. rbt

    deal or no deal

    rbt, Dec 22, 2005, in forum: Python
    Replies:
    7
    Views:
    587
    Duncan Smith
    Dec 28, 2005
  4. Paminu
    Replies:
    5
    Views:
    655
    Eric Sosman
    Oct 11, 2005
  5. Tuan  Bui
    Replies:
    14
    Views:
    507
    it_says_BALLS_on_your forehead
    Jul 29, 2005
Loading...

Share This Page