parsing config file

Discussion in 'C Programming' started by Mantorok Redgormor, Sep 16, 2003.

  1. If I am parsing a config file that uses '#' for comments and the
    config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
    it recommended to use

    a) fgetc (parse a character at a time)
    b) fgets (read in blocks of whatever size)
    c) fread (get the size of the file and fread the entire thing into
    memory)

    and when would it be appropriate to use either a, b, or c?


    nethlek
     
    Mantorok Redgormor, Sep 16, 2003
    #1
    1. Advertising

  2. Mantorok Redgormor <> scribbled the following:
    > If I am parsing a config file that uses '#' for comments and the
    > config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
    > it recommended to use


    > a) fgetc (parse a character at a time)
    > b) fgets (read in blocks of whatever size)
    > c) fread (get the size of the file and fread the entire thing into
    > memory)


    > and when would it be appropriate to use either a, b, or c?


    If the config file's format is so that each VARIABLE=VALUE is on a
    separate line, I definitely recommed b) fgets. Otherwise you're best off
    with c) fread, but the problem is, you'll have to parse the delimiters
    out yourself.

    --
    /-- Joona Palaste () ---------------------------\
    | Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
    | http://www.helsinki.fi/~palaste W++ B OP+ |
    \----------------------------------------- Finland rules! ------------/
    "A bicycle cannot stand up by itself because it's two-tyred."
    - Sky Text
     
    Joona I Palaste, Sep 16, 2003
    #2
    1. Advertising

  3. Joona I Palaste <> spoke thus:

    > If the config file's format is so that each VARIABLE=VALUE is on a
    > separate line, I definitely recommed b) fgets. Otherwise you're best off
    > with c) fread, but the problem is, you'll have to parse the delimiters
    > out yourself.


    Why not

    fscanf( "%[^#=]=%s", &variable, &value );

    ?

    --
    Christopher Benson-Manica | Jumonji giri, for honour.
    ataru(at)cyberspace.org |
     
    Christopher Benson-Manica, Sep 16, 2003
    #3
  4. On Tue, 16 Sep 2003 16:50:56 UTC, (Mantorok
    Redgormor) wrote:

    > If I am parsing a config file that uses '#' for comments and the
    > config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
    > it recommended to use
    >
    > a) fgetc (parse a character at a time)


    Maybe a good choice because you handles each char that comes in
    directly. No need to mess up with buffer sizes for a whole line.

    > b) fgets (read in blocks of whatever size)


    Maybe a good choice because you can thereafter handle the line as such
    as you likes.
    Maybe a bad choice because it may be possible that the buffer you
    gives fgets is too small

    > c) fread (get the size of the file and fread the entire thing into
    > memory)


    Maybe a good choice when you knows the whole size of the file. Anyway
    it costs more memory as absolutely required.

    >
    > and when would it be appropriate to use either a, b, or c?


    Does you like to handle undersized input buffers? Then use b).
    Does you have quick access to the size of the file? Then use c)
    Don't you like to handle dynamic input buffers only to get a line
    coplete because it is longer than you had think it should be? And is
    your memory limited in size (wheras your progam may not the only that
    runs on the mashine)?
    Or is it even not so easy to determine the size of the file in a
    manner that you can allocate a buffer big enough to read it in at
    once?
    If the anywer you gives to one of the questions above is yes then a)
    is your choice.

    --
    Tschau/Bye
    Herbert

    eComStation 1.1 Deutsch Beta ist ver├╝gbar
     
    The Real OS/2 Guy, Sep 16, 2003
    #4
  5. Mantorok Redgormor

    Joe Wright Guest

    The Real OS/2 Guy wrote:
    >
    > On Tue, 16 Sep 2003 16:50:56 UTC, (Mantorok
    > Redgormor) wrote:
    >
    > > If I am parsing a config file that uses '#' for comments and the
    > > config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
    > > it recommended to use
    > >
    > > a) fgetc (parse a character at a time)

    >
    > Maybe a good choice because you handles each char that comes in
    > directly. No need to mess up with buffer sizes for a whole line.
    >
    > > b) fgets (read in blocks of whatever size)

    >
    > Maybe a good choice because you can thereafter handle the line as such
    > as you likes.
    > Maybe a bad choice because it may be possible that the buffer you
    > gives fgets is too small
    >
    > > c) fread (get the size of the file and fread the entire thing into
    > > memory)

    >
    > Maybe a good choice when you knows the whole size of the file. Anyway
    > it costs more memory as absolutely required.
    >
    > >
    > > and when would it be appropriate to use either a, b, or c?

    >
    > Does you like to handle undersized input buffers? Then use b).
    > Does you have quick access to the size of the file? Then use c)
    > Don't you like to handle dynamic input buffers only to get a line
    > coplete because it is longer than you had think it should be? And is
    > your memory limited in size (wheras your progam may not the only that
    > runs on the mashine)?
    > Or is it even not so easy to determine the size of the file in a
    > manner that you can allocate a buffer big enough to read it in at
    > once?
    > If the anywer you gives to one of the questions above is yes then a)
    > is your choice.
    >

    Herbert, I disagree. Choice b) is the only choice. Choice a) is too ugly
    for a mother to love. Choice c), fread() a text file and then parse it,
    uses lots of memory and complicates things more than necessary.

    The configuration file as described, defines variables in 'key=value'
    format, line at a time. It is fgets() that reads a file 'line at a
    time'. It is trivial to determine comment lines beginning with '#' or';'
    or whatever and skip them.

    Everyone please note that in order to read any file correctly, you must
    know how it was written, ie. its format. There are 'rules' to writing
    ..cfg or .ini (or other) files which you must know exactly before you can
    read them successfully.
    --
    Joe Wright mailto:
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
     
    Joe Wright, Sep 16, 2003
    #5
  6. Mantorok Redgormor

    Jack Klein Guest

    On Tue, 16 Sep 2003 17:26:12 +0000 (UTC), Christopher Benson-Manica
    <> wrote in comp.lang.c:

    > Joona I Palaste <> spoke thus:
    >
    > > If the config file's format is so that each VARIABLE=VALUE is on a
    > > separate line, I definitely recommed b) fgets. Otherwise you're best off
    > > with c) fread, but the problem is, you'll have to parse the delimiters
    > > out yourself.

    >
    > Why not
    >
    > fscanf( "%[^#=]=%s", &variable, &value );
    >
    > ?


    Because any *scanf with "%s" lacking a size specifier is just another
    name for gets(), a nasty buffer overrun just waiting to happen.

    Thus are worms born...

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c /faq
     
    Jack Klein, Sep 17, 2003
    #6
  7. Mantorok Redgormor

    Mike Wahler Guest

    Joe Wright <> wrote in message
    news:...
    > The Real OS/2 Guy wrote:
    > >
    > > On Tue, 16 Sep 2003 16:50:56 UTC, (Mantorok
    > > Redgormor) wrote:
    > >
    > > > If I am parsing a config file that uses '#' for comments and the
    > > > config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is
    > > > it recommended to use
    > > >
    > > > a) fgetc (parse a character at a time)

    > >
    > > Maybe a good choice because you handles each char that comes in
    > > directly. No need to mess up with buffer sizes for a whole line.
    > >
    > > > b) fgets (read in blocks of whatever size)

    > >
    > > Maybe a good choice because you can thereafter handle the line as such
    > > as you likes.
    > > Maybe a bad choice because it may be possible that the buffer you
    > > gives fgets is too small
    > >
    > > > c) fread (get the size of the file and fread the entire thing into
    > > > memory)

    > >
    > > Maybe a good choice when you knows the whole size of the file. Anyway
    > > it costs more memory as absolutely required.
    > >
    > > >
    > > > and when would it be appropriate to use either a, b, or c?

    > >
    > > Does you like to handle undersized input buffers? Then use b).
    > > Does you have quick access to the size of the file? Then use c)
    > > Don't you like to handle dynamic input buffers only to get a line
    > > coplete because it is longer than you had think it should be? And is
    > > your memory limited in size (wheras your progam may not the only that
    > > runs on the mashine)?
    > > Or is it even not so easy to determine the size of the file in a
    > > manner that you can allocate a buffer big enough to read it in at
    > > once?
    > > If the anywer you gives to one of the questions above is yes then a)
    > > is your choice.
    > >

    > Herbert, I disagree. Choice b) is the only choice. Choice a) is too ugly
    > for a mother to love. Choice c), fread() a text file and then parse it,
    > uses lots of memory and complicates things more than necessary.
    >
    > The configuration file as described, defines variables in 'key=value'
    > format, line at a time. It is fgets() that reads a file 'line at a
    > time'. It is trivial to determine comment lines beginning with '#' or';'
    > or whatever and skip them.
    >
    > Everyone please note that in order to read any file correctly, you must
    > know how it was written, ie. its format. There are 'rules' to writing
    > .cfg or .ini (or other) files which you must know exactly before you can
    > read them successfully.


    And robust code that reads them should be able to handle
    corrupt or incorrectly formatted data (e.g. by assuming
    'defaults', or giving an error message, terminating, etc.)

    -Mike
     
    Mike Wahler, Sep 17, 2003
    #7
  8. Jack Klein <> spoke thus:

    > Because any *scanf with "%s" lacking a size specifier is just another
    > name for gets(), a nasty buffer overrun just waiting to happen.


    Well, considering the OP was just parsing a config file, the chances for an
    exploit shouldn't be too high, eh? Thanks, though, for I had forgotten about
    that... Although something like

    fscanf( "%20[^=]s=%20s", &s1, &s2 );

    would fail if it weren't given exactly 20 characters for the first string,
    right...?

    --
    Christopher Benson-Manica | Jumonji giri, for honour.
    ataru(at)cyberspace.org |
     
    Christopher Benson-Manica, Sep 17, 2003
    #8
  9. Jack Klein <> spoke thus:

    > Because any *scanf with "%s" lacking a size specifier is just another
    > name for gets(), a nasty buffer overrun just waiting to happen.


    Well, considering the OP was just parsing a config file, the chances for an
    exploit shouldn't be too high, eh? Thanks, though, for I had forgotten about
    that... Although something like

    fscanf( "%20[^#=]s=%20s", &s1, &s2 );

    would fail if it weren't given exactly 20 characters for the first string,
    right...? Maybe I should just be quiet now...

    --
    Christopher Benson-Manica | Jumonji giri, for honour.
    ataru(at)cyberspace.org |
     
    Christopher Benson-Manica, Sep 17, 2003
    #9
  10. On Wed, 17 Sep 2003 16:30:22 +0000 (UTC), Christopher Benson-Manica
    <> wrote:

    > Jack Klein <> spoke thus:
    >
    > > Because any *scanf with "%s" lacking a size specifier is just another
    > > name for gets(), a nasty buffer overrun just waiting to happen.

    >
    > Well, considering the OP was just parsing a config file, the chances for an
    > exploit shouldn't be too high, eh? Thanks, though, for I had forgotten about
    > that... Although something like
    >
    > fscanf( "%20[^#=]s=%20s", &s1, &s2 );
    >
    > would fail if it weren't given exactly 20 characters for the first string,
    > right...? Maybe I should just be quiet now...


    Not right; no comment on whether you should. A width specifier on any
    *scanf conversion is an upper limit, although %Nc will always read to
    the upper limit or end-of-input/error.

    Also the %20s on the right side won't allow whitespace in the value,
    which I would want to; %20[^\n] will. And either of those will
    normally leave the newline in the input stream, which is probably OK
    if you want to handle # lines with a getc or similar rather than
    another (prior?) fscanf; either will also leave any text exceeding the
    limit, and %20s any text following a whitespace; adding %*[^\n] would
    reduce the number of different cases you have to handle.

    - David.Thompson1 at worldnet.att.net
     
    Dave Thompson, Sep 22, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CSharpner
    Replies:
    0
    Views:
    1,145
    CSharpner
    Apr 9, 2007
  2. M Irfan
    Replies:
    2
    Views:
    3,999
    M Irfan
    Apr 18, 2007
  3. Ollie Riches
    Replies:
    1
    Views:
    1,690
    Gregory A. Beamer
    Dec 4, 2008
  4. CSharpner
    Replies:
    0
    Views:
    457
    CSharpner
    Apr 19, 2004
  5. kampy
    Replies:
    9
    Views:
    375
    Steven D'Aprano
    Oct 19, 2012
Loading...

Share This Page