parsing config file

M

Mantorok Redgormor

If I am parsing a config file that uses '#' for comments and the
config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
it recommended to use

a) fgetc (parse a character at a time)
b) fgets (read in blocks of whatever size)
c) fread (get the size of the file and fread the entire thing into
memory)

and when would it be appropriate to use either a, b, or c?


nethlek
 
J

Joona I Palaste

Mantorok Redgormor said:
If I am parsing a config file that uses '#' for comments and the
config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
it recommended to use
a) fgetc (parse a character at a time)
b) fgets (read in blocks of whatever size)
c) fread (get the size of the file and fread the entire thing into
memory)
and when would it be appropriate to use either a, b, or c?

If the config file's format is so that each VARIABLE=VALUE is on a
separate line, I definitely recommed b) fgets. Otherwise you're best off
with c) fread, but the problem is, you'll have to parse the delimiters
out yourself.

--
/-- Joona Palaste ([email protected]) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"A bicycle cannot stand up by itself because it's two-tyred."
- Sky Text
 
C

Christopher Benson-Manica

Joona I Palaste said:
If the config file's format is so that each VARIABLE=VALUE is on a
separate line, I definitely recommed b) fgets. Otherwise you're best off
with c) fread, but the problem is, you'll have to parse the delimiters
out yourself.

Why not

fscanf( "%[^#=]=%s", &variable, &value );

?
 
T

The Real OS/2 Guy

If I am parsing a config file that uses '#' for comments and the
config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
it recommended to use

a) fgetc (parse a character at a time)

Maybe a good choice because you handles each char that comes in
directly. No need to mess up with buffer sizes for a whole line.
b) fgets (read in blocks of whatever size)

Maybe a good choice because you can thereafter handle the line as such
as you likes.
Maybe a bad choice because it may be possible that the buffer you
gives fgets is too small
c) fread (get the size of the file and fread the entire thing into
memory)

Maybe a good choice when you knows the whole size of the file. Anyway
it costs more memory as absolutely required.
and when would it be appropriate to use either a, b, or c?

Does you like to handle undersized input buffers? Then use b).
Does you have quick access to the size of the file? Then use c)
Don't you like to handle dynamic input buffers only to get a line
coplete because it is longer than you had think it should be? And is
your memory limited in size (wheras your progam may not the only that
runs on the mashine)?
Or is it even not so easy to determine the size of the file in a
manner that you can allocate a buffer big enough to read it in at
once?
If the anywer you gives to one of the questions above is yes then a)
is your choice.
 
J

Joe Wright

The said:
Maybe a good choice because you handles each char that comes in
directly. No need to mess up with buffer sizes for a whole line.


Maybe a good choice because you can thereafter handle the line as such
as you likes.
Maybe a bad choice because it may be possible that the buffer you
gives fgets is too small


Maybe a good choice when you knows the whole size of the file. Anyway
it costs more memory as absolutely required.


Does you like to handle undersized input buffers? Then use b).
Does you have quick access to the size of the file? Then use c)
Don't you like to handle dynamic input buffers only to get a line
coplete because it is longer than you had think it should be? And is
your memory limited in size (wheras your progam may not the only that
runs on the mashine)?
Or is it even not so easy to determine the size of the file in a
manner that you can allocate a buffer big enough to read it in at
once?
If the anywer you gives to one of the questions above is yes then a)
is your choice.
Herbert, I disagree. Choice b) is the only choice. Choice a) is too ugly
for a mother to love. Choice c), fread() a text file and then parse it,
uses lots of memory and complicates things more than necessary.

The configuration file as described, defines variables in 'key=value'
format, line at a time. It is fgets() that reads a file 'line at a
time'. It is trivial to determine comment lines beginning with '#' or';'
or whatever and skip them.

Everyone please note that in order to read any file correctly, you must
know how it was written, ie. its format. There are 'rules' to writing
..cfg or .ini (or other) files which you must know exactly before you can
read them successfully.
 
J

Jack Klein

Joona I Palaste said:
If the config file's format is so that each VARIABLE=VALUE is on a
separate line, I definitely recommed b) fgets. Otherwise you're best off
with c) fread, but the problem is, you'll have to parse the delimiters
out yourself.

Why not

fscanf( "%[^#=]=%s", &variable, &value );

?

Because any *scanf with "%s" lacking a size specifier is just another
name for gets(), a nasty buffer overrun just waiting to happen.

Thus are worms born...

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
M

Mike Wahler

Joe Wright said:
Herbert, I disagree. Choice b) is the only choice. Choice a) is too ugly
for a mother to love. Choice c), fread() a text file and then parse it,
uses lots of memory and complicates things more than necessary.

The configuration file as described, defines variables in 'key=value'
format, line at a time. It is fgets() that reads a file 'line at a
time'. It is trivial to determine comment lines beginning with '#' or';'
or whatever and skip them.

Everyone please note that in order to read any file correctly, you must
know how it was written, ie. its format. There are 'rules' to writing
.cfg or .ini (or other) files which you must know exactly before you can
read them successfully.

And robust code that reads them should be able to handle
corrupt or incorrectly formatted data (e.g. by assuming
'defaults', or giving an error message, terminating, etc.)

-Mike
 
C

Christopher Benson-Manica

Jack Klein said:
Because any *scanf with "%s" lacking a size specifier is just another
name for gets(), a nasty buffer overrun just waiting to happen.

Well, considering the OP was just parsing a config file, the chances for an
exploit shouldn't be too high, eh? Thanks, though, for I had forgotten about
that... Although something like

fscanf( "%20[^=]s=%20s", &s1, &s2 );

would fail if it weren't given exactly 20 characters for the first string,
right...?
 
C

Christopher Benson-Manica

Jack Klein said:
Because any *scanf with "%s" lacking a size specifier is just another
name for gets(), a nasty buffer overrun just waiting to happen.

Well, considering the OP was just parsing a config file, the chances for an
exploit shouldn't be too high, eh? Thanks, though, for I had forgotten about
that... Although something like

fscanf( "%20[^#=]s=%20s", &s1, &s2 );

would fail if it weren't given exactly 20 characters for the first string,
right...? Maybe I should just be quiet now...
 
D

Dave Thompson

Jack Klein said:
Because any *scanf with "%s" lacking a size specifier is just another
name for gets(), a nasty buffer overrun just waiting to happen.

Well, considering the OP was just parsing a config file, the chances for an
exploit shouldn't be too high, eh? Thanks, though, for I had forgotten about
that... Although something like

fscanf( "%20[^#=]s=%20s", &s1, &s2 );

would fail if it weren't given exactly 20 characters for the first string,
right...? Maybe I should just be quiet now...

Not right; no comment on whether you should. A width specifier on any
*scanf conversion is an upper limit, although %Nc will always read to
the upper limit or end-of-input/error.

Also the %20s on the right side won't allow whitespace in the value,
which I would want to; %20[^\n] will. And either of those will
normally leave the newline in the input stream, which is probably OK
if you want to handle # lines with a getc or similar rather than
another (prior?) fscanf; either will also leave any text exceeding the
limit, and %20s any text following a whitespace; adding %*[^\n] would
reduce the number of different cases you have to handle.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top