Textual data files in C

F

Franz Hose

Hi all!

I'm trying to read textual data files in C, but I'm still not sure what
the best design might be. All data files have this format

----sample data---
SOME_ID_IDENTIFYING_FILE_TYPE
data_record[0]
data_record[1]
....
data_record[n-1]
----end sample data---


somebody suggested the following strategy

typedef struct
{
/* status data, cur line number, etc. info */
FILE *f;
} FOO_FILE;

FOO_FILE *foo_open(char *filename)
{
/*
- alloc space for FOO_FILE
- open file (if that fails, return NULL or non-NULL and record
reason for failure?)
*/
}

int foo_close(FOO_FILE *)
{
/*
- close file if open
- free space for FOO_FILE
*/
}

int foo_readline(char *line, int len, FOO_FILE *foo)
{
/*
-fgets line (record possible errors inside *foo)
- if last char is not \n , error
- update line number
- if line is empty, repeat from start (???)
- return number of chars in line
*/
}

unsigned int foo_read_long(char *line, int len, FOO_FILE *foo)
{
/*
- foo_readline(/*...*/);
- if success, parse line as an int
- if not an int, store error condition in FOO_FILE struct (???)
- return this number
*/
}

, plus similar functions for reading lines with differently typed
data.

Finally, in the higher level code, all that needs to be done is

char line[LEN]
foo_open(/*...*/); /* check success */
foo_read_BAR1(/*...*/); /* check success */
foo_read_BAR2(/*...*/); /* check success */
while(foo_read_BAR3(/*...*/)
{
}
foo_close(/*...*/);


Does this sound sensible?

In particular, we would like all kinds of errors (file existance,
I/O errors, data format violations, etc.) to be recorded in a single
location, that is if error info is available in errno, pass it on to
the higher level functions, otherwise use some generic error code.

What would be the best place to pass error information around?

Are there better designs?

Thanks.

--
 
U

user923005

Hi all!

I'm trying to read textual data files in C, but I'm still not sure what
the best design might be. All data files have this format

----sample data---
SOME_ID_IDENTIFYING_FILE_TYPE
data_record[0]
data_record[1]
...
data_record[n-1]
----end sample data---

somebody suggested the following strategy

typedef struct
{
  /* status data, cur line number, etc. info */
  FILE *f;

} FOO_FILE;

FOO_FILE *foo_open(char *filename)
{
/*
    - alloc space for FOO_FILE
    - open file (if that fails, return NULL or non-NULL and record
      reason for failure?)
*/

}

int foo_close(FOO_FILE *)
{
/*
    - close file if open
    - free space for FOO_FILE
*/

}

int foo_readline(char *line, int len, FOO_FILE *foo)
{
/*
    -fgets line (record possible errors inside *foo)
     - if last char is not \n , error
    - update line number
    - if line is empty, repeat from start (???)
    - return number of chars in line
*/

}

unsigned int foo_read_long(char *line, int len, FOO_FILE *foo)
{
/*
    - foo_readline(/*...*/);
      - if success, parse line as an int
      - if not an int, store error condition in FOO_FILE struct (???)
    - return this number
*/

}

, plus similar functions for reading lines with differently typed
data.

Finally, in the higher level code, all that needs to be done is

    char line[LEN]
    foo_open(/*...*/);      /* check success */
    foo_read_BAR1(/*...*/); /* check success */
    foo_read_BAR2(/*...*/); /* check success */
    while(foo_read_BAR3(/*...*/)
    {
    }
    foo_close(/*...*/);

Does this sound sensible?

Sounds OK.
In particular, we would like all kinds of errors (file existance,
I/O errors, data format violations, etc.) to be recorded in a single
location, that is if error info is available in errno, pass it on to
the higher level functions, otherwise use some generic error code.

What would be the best place to pass error information around?

Why not a log file?
Are there better designs?

There are always better designs.

I guess that a real database will be about 1000x better than your
solution.
 
M

Malcolm McLean

Franz Hose said:
Hi all!

I'm trying to read textual data files in C, but I'm still not sure what
the best design might be. All data files have this format

----sample data---
SOME_ID_IDENTIFYING_FILE_TYPE
data_record[0]
data_record[1]
...
data_record[n-1]
----end sample data---
typedef struct
{
char filename[MAX_PATH];
char error[1024];
int Nrecords;
RECORD *record; /* you'll have to work out the structure for a record
yourself */
} FOOFILE;

int ff_haserror(FOOFILE *ff)
{
return ff->error[0] == 0 ? 1 : 0;
}

void ff_reporterror(FOOFILE *ff, FILE *fperr)
{
}
/*
use a sticky error to prevent a cascade of errors from faulty files
*/
static void seterror(FOOFILE *ff, char *fmt, ...)
{
va_list args;

va_start(args, fmt);
if(ff->error[0] == 0)
vsnprintf(ff->error, 1024, fmt, args);
va_end(args);
}

Call ff_haserror() every so often and abort reading the file, to prevent
stupid memory demands and the like.
 
P

pete

Franz said:
Hi all!

I'm trying to read textual data files in C, but I'm still not sure what
the best design might be. All data files have this format

----sample data---
SOME_ID_IDENTIFYING_FILE_TYPE
data_record[0]
data_record[1]
...
data_record[n-1]
----end sample data---

somebody suggested the following strategy

typedef struct
{
/* status data, cur line number, etc. info */
FILE *f;
} FOO_FILE;

FOO_FILE *foo_open(char *filename)
{
/*
- alloc space for FOO_FILE
- open file (if that fails, return NULL or non-NULL and record
reason for failure?)
*/
}

int foo_close(FOO_FILE *)
{
/*
- close file if open
- free space for FOO_FILE
*/
}

int foo_readline(char *line, int len, FOO_FILE *foo)
{
/*
-fgets line (record possible errors inside *foo)
- if last char is not \n , error
- update line number
- if line is empty, repeat from start (???)
- return number of chars in line
*/
}

unsigned int foo_read_long(char *line, int len, FOO_FILE *foo)
{
/*
- foo_readline(/*...*/);
- if success, parse line as an int
- if not an int, store error condition in FOO_FILE struct (???)
- return this number
*/
}

, plus similar functions for reading lines with differently typed
data.

Finally, in the higher level code, all that needs to be done is

char line[LEN]
foo_open(/*...*/); /* check success */
foo_read_BAR1(/*...*/); /* check success */
foo_read_BAR2(/*...*/); /* check success */
while(foo_read_BAR3(/*...*/)
{
}
foo_close(/*...*/);

Does this sound sensible?

In particular, we would like all kinds of errors (file existance,
I/O errors, data format violations, etc.) to be recorded in a single
location, that is if error info is available in errno, pass it on to
the higher level functions, otherwise use some generic error code.

What would be the best place to pass error information around?

Are there better designs?

Thanks.

I like to read all the lines of a text file
and store corresponding strings using linked lists.
Then I work on the data in the lists.

Here are three examples of programs that do that:
http://www.mindspring.com/~pfilandr/C/lists_and_files/file_sort.c
http://www.mindspring.com/~pfilandr/C/lists_and_files/file_parse.c
http://www.mindspring.com/~pfilandr/C/lists_and_files/file_collate.c

Here are the common files to all three of those:
http://www.mindspring.com/~pfilandr/C/lists_and_files/file_lib.h
http://www.mindspring.com/~pfilandr/C/lists_and_files/file_lib.c
http://www.mindspring.com/~pfilandr/C/lists_and_files/list_lib.h
http://www.mindspring.com/~pfilandr/C/lists_and_files/list_lib.c
 
U

user923005

Franz said:
I'm trying to read textual data files in C, but I'm still not sure what
the best design might be. All data files have this format
----sample data---
SOME_ID_IDENTIFYING_FILE_TYPE
data_record[0]
data_record[1]
...
data_record[n-1]
----end sample data---
somebody suggested the following strategy
typedef struct
{
  /* status data, cur line number, etc. info */
  FILE *f;
} FOO_FILE;
FOO_FILE *foo_open(char *filename)
{
/*
    - alloc space for FOO_FILE
    - open file (if that fails, return NULL or non-NULL and record
      reason for failure?)
*/
}
int foo_close(FOO_FILE *)
{
/*
    - close file if open
    - free space for FOO_FILE
*/
}
int foo_readline(char *line, int len, FOO_FILE *foo)
{
/*
    -fgets line (record possible errors inside *foo)
     - if last char is not \n , error
    - update line number
    - if line is empty, repeat from start (???)
    - return number of chars in line
*/
}
unsigned int foo_read_long(char *line, int len, FOO_FILE *foo)
{
/*
    - foo_readline(/*...*/);
      - if success, parse line as an int
      - if not an int, store error condition in FOO_FILE struct (???)
    - return this number
*/
}
, plus similar functions for reading lines with differently typed
data.
Finally, in the higher level code, all that needs to be done is
    char line[LEN]
    foo_open(/*...*/);      /* check success */
    foo_read_BAR1(/*...*/); /* check success */
    foo_read_BAR2(/*...*/); /* check success */
    while(foo_read_BAR3(/*...*/)
    {
    }
    foo_close(/*...*/);
Does this sound sensible?
In particular, we would like all kinds of errors (file existance,
I/O errors, data format violations, etc.) to be recorded in a single
location, that is if error info is available in errno, pass it on to
the higher level functions, otherwise use some generic error code.
What would be the best place to pass error information around?
Are there better designs?

I like to read all the lines of a text file
and store corresponding strings using linked lists.
Then I work on the data in the lists.

Here are three examples of programs that do that:http://www.mindspring.com/~pfilandr...om/~pfilandr/C/lists_and_files/file_collate.c

Here are the common files to all three of those:http://www.mindspring.com/~pfilandr...g.com/~pfilandr/C/lists_and_files/list_lib..c

list_string.c seems a little flaky:

sh-2.04$ gcc -W -Wall -ansi -pedantic -I. list_string.c
In file included from list_string.c:7:
list_lib.h:11: error: redefinition of 'struct list_node'
list_lib.h:16: error: redefinition of typedef 'list_type'
list_type.h:11: error: previous declaration of 'list_type' was here
list_string.c: In function 'insert_string':
list_string.c:45: warning: implicit declaration of function
'merge_lists'
list_string.c:45: warning: assignment makes pointer from integer
without a cast
sh-2.04$

C:\pete>splint list_string.c
Splint 3.1.1 --- 12 Mar 2007

list_lib.h(14,2): Struct tag struct list_node defined more than once
A function or variable is redefined. One of the declarations should
use
extern. (Use -redef to inhibit warning)
list_type.h(9,2): Previous definition of struct list_node
list_lib.h(16,26): Datatype list_type defined more than once
list_type.h(11,26): Previous definition of list_type
list_string.c: (in function append_string)
list_string.c(22,17): Implicitly only storage tail->next (type struct
list_node
*) not released before assignment: tail->next
= node
A memory leak has been detected. Only-qualified storage is not
released
before the last reference to it is lost. (Use -mustfreeonly to
inhibit
warning)
list_string.c(31,12): Possibly null storage node returned as non-null:
node
Function returns a possibly null pointer, but is not declared using
/*@null@*/ annotation of result. If function may return NULL, add /
*@null@*/
annotation to the return value declaration. (Use -nullret to inhibit
warning)
list_string.c(15,12): Storage node may become null
list_string.c(31,12): Returned storage *node contains 2 undefined
fields:
next, data
Storage derivable from a parameter, return value or global is not
defined.
Use /*@out@*/ to denote passed or returned storage which need not be
defined.
(Use -compdef to inhibit warning)
list_string.c(31,12): Kept storage node returned as implicitly only:
node
storage is transferred to a non-temporary reference after being
passed as
keep parameter. The storage may be released or new aliases created.
(Use
-kepttrans to inhibit warning)
list_string.c(30,5): Storage node becomes kept
list_string.c: (in function insert_string)
list_string.c(45,21): Unrecognized identifier: merge_lists
Identifier used in code has not been declared. (Use -unrecog to
inhibit
warning)
list_string.c(51,12): Return value type boolean does not match
declared type
int: node != NULL
To make bool and int types equivalent, use +boolint.
list_string.c(51,25): Fresh storage node not released before return
A memory leak has been detected. Storage allocated locally is not
released
before the last reference to it is lost. (Use -mustfreefresh to
inhibit
warning)
list_string.c(39,5): Fresh storage node created

Finished checking --- 9 code warnings

C:\pete>cl /W4 /Ox list_string.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762
for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.

list_string.c
c:\pete\list_lib.h(11) : error C2011: 'list_node' : 'struct' type
redefinition
c:\pete\list_type.h(6) : see declaration of 'list_node'
list_string.c(45) : warning C4013: 'merge_lists' undefined; assuming
extern returning int
list_string.c(45) : warning C4047: '=' : 'list_type *' differs in
levels of indirection from 'int'
 
P

pete

user923005 said:

list_string.c is one of *your* files, not one of mine.

Which is not to say that I never wrote a file called list_string.c,
but if I did, it's not on my website
and I didn't refer to it in this thread.
 
B

Barry Schwarz

Hi all!

I'm trying to read textual data files in C, but I'm still not sure what
the best design might be. All data files have this format

----sample data---
SOME_ID_IDENTIFYING_FILE_TYPE
data_record[0]
data_record[1]
...
data_record[n-1]
----end sample data---


somebody suggested the following strategy

typedef struct
{
/* status data, cur line number, etc. info */
FILE *f;
} FOO_FILE;

FOO_FILE *foo_open(char *filename)
{
/*
- alloc space for FOO_FILE

Check that allocation succeeded.
- open file (if that fails, return NULL or non-NULL and record

Free the allocated space also.
reason for failure?)
*/
}

int foo_close(FOO_FILE *)
{
/*
- close file if open
- free space for FOO_FILE
*/
}

int foo_readline(char *line, int len, FOO_FILE *foo)
{
/*
-fgets line (record possible errors inside *foo)
- if last char is not \n , error
- update line number
- if line is empty, repeat from start (???)
- return number of chars in line
*/
}

unsigned int foo_read_long(char *line, int len, FOO_FILE *foo)
{
/*
- foo_readline(/*...*/);
- if success, parse line as an int
- if not an int, store error condition in FOO_FILE struct (???)
- return this number
*/
}

, plus similar functions for reading lines with differently typed
data.

Finally, in the higher level code, all that needs to be done is

char line[LEN]
foo_open(/*...*/); /* check success */
foo_read_BAR1(/*...*/); /* check success */
foo_read_BAR2(/*...*/); /* check success */
while(foo_read_BAR3(/*...*/)
{
}
foo_close(/*...*/);


Does this sound sensible?

In particular, we would like all kinds of errors (file existance,
I/O errors, data format violations, etc.) to be recorded in a single
location, that is if error info is available in errno, pass it on to
the higher level functions, otherwise use some generic error code.

What would be the best place to pass error information around?

Are there better designs?

Thanks.


Remove del for email
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top