Question regarding fgets and new lines

Discussion in 'C Programming' started by mellyshum123@yahoo.ca, Nov 24, 2006.

  1. Guest

    I need to read in a comma separated file, and for this I was going to
    use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
    I noticed that the document said:

    "Reads characters from stream and stores them in string until (num -1)
    characters have been read or a newline or EOF character is reached,
    whichever comes first."

    My question is that if it stops at a new line character (LF?) then how
    does one read a file with multiple new line characters?

    Another question. The syntax is:

    char * fgets (char * string , int num , FILE * stream);

    but you have to allot a size for the string before this. Would you just
    use the same num as used in the fgets? So char stringexample[num] ?
     
    , Nov 24, 2006
    #1
    1. Advertising

  2. wrote:
    > I need to read in a comma separated file, and for this I was going to
    > use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
    > I noticed that the document said:
    >
    > "Reads characters from stream and stores them in string until (num -1)
    > characters have been read or a newline or EOF character is reached,
    > whichever comes first."
    >
    > My question is that if it stops at a new line character (LF?) then how
    > does one read a file with multiple new line characters?


    You call it multiple times, until you've read the entire file.

    > Another question. The syntax is:
    >
    > char * fgets (char * string , int num , FILE * stream);
    >
    > but you have to allot a size for the string before this. Would you just
    > use the same num as used in the fgets? So char stringexample[num] ?
    >


    Yes. You're essentially telling fgets: "OK, I've set this much space
    aside for you to read into, give me that many characters (minus 1 for
    the NUL terminator) or the first line, whichever comes first."


    --
    Clark S. Cox III
     
    Clark S. Cox III, Nov 24, 2006
    #2
    1. Advertising

  3. wrote:
    > I need to read in a comma separated file, and for this I was going to
    > use fgets.


    You may be better off parsing such files one character at a time.

    > I was reading about it at http://www.cplusplus.com/ref/ and
    > I noticed that the document said:
    >
    > "Reads characters from stream and stores them in string until (num -1)
    > characters have been read or a newline or EOF character is reached,
    > whichever comes first."
    >
    > My question is that if it stops at a new line character (LF?) then how
    > does one read a file with multiple new line characters?


    By making multiple calls to fgets().

    The problem though is cases like Excel which allow newlines in
    individual
    field records. Such fields are 'quoted' with a leading double quote
    ("), and
    an embedded double quote is escaped as two double quotes. Hence my
    comment that you may be better off with a simple state machine parsing
    one character at a time.

    > Another question. The syntax is:
    >
    > char * fgets (char * string , int num , FILE * stream);
    >
    > but you have to allot a size for the string before this. Would you just
    > use the same num as used in the fgets? So char stringexample[num] ?


    Yes. Sample use is...

    char line[256];
    while (fgets(line, sizeof line, stdin))
    {
    /* ... */
    }

    Though more serious programs will roll their own fgets() that
    dynamically
    allocates storage for a line, rather than fixing the size of the
    buffer.
    [Such programs still need to be mindful of the idiots that will pump a
    large \n free binary file through stdin.]

    --
    Peter
     
    Peter Nilsson, Nov 24, 2006
    #3
  4. Guest

    Peter Nilsson wrote:
    > You may be better off parsing such files one character at a time.



    I guess maybe using fgetc?
     
    , Nov 24, 2006
    #4
  5. Eric Sosman Guest

    wrote:
    > I need to read in a comma separated file, and for this I was going to
    > use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
    > I noticed that the document said:
    >
    > "Reads characters from stream and stores them in string until (num -1)
    > characters have been read or a newline or EOF character is reached,
    > whichever comes first."
    >
    > My question is that if it stops at a new line character (LF?) then how
    > does one read a file with multiple new line characters?


    One line at a time. Read a line, process it as you see fit,
    and then proceed to the next line. Lather, rinse, repeat.

    > Another question. The syntax is:
    >
    > char * fgets (char * string , int num , FILE * stream);
    >
    > but you have to allot a size for the string before this. Would you just
    > use the same num as used in the fgets? So char stringexample[num] ?


    Yes. The problem of how big to make `num' can be a
    vexing one: If you make it 80 you can handle lines of up
    to 78 "payload" characters plus a newline and a '\0', but
    if the input stream supplies a longer line you've got a
    bit of a problem. You could make `num' 1000000, but do you
    really want to spend a megabyte as insurance against long
    lines? (And there's still the nagging possibility that the
    input might hold a 1000001-character line ...)

    One plausible way to proceed is to make `num' moderately
    larger than the longest line you expect to encounter, call
    fgets(), and then check whether the buffer contains a '\n'.
    If it does not (and if neither end-of-input nor an I/O error
    occurred, which you can test with feof() and ferror()), then
    the file contains a longer-than-anticipated line. The first
    part of that line has been stored in the buffer, and the tail
    end is still "pending," available to be read.

    What to do next? If you were expecting lines of up to
    around 100 characters and you used a 1000-character buffer
    just to be on the safe side and you ran into a line longer
    than 1000 characters -- more than ten times what you thought
    the maximum length would be -- you might well conclude that
    there's something wrong with the input: Maybe the file you've
    been handed really isn't a CSV file at all. It would be
    perfectly plausible to blurt out an error message and stop
    processing, or to blurt an error and throw the offending line
    away (remember to "drain" the unread tail by reading until
    you get '\n' or EOF).

    If you've used malloc() to obtain memory for the buffer,
    another possibility is to use realloc() to make the buffer
    larger (preserving the already-read portion) and call fgets()
    again to read the tail of the line into the tail of the expanded
    buffer. If necessary, you can expand again and again until you
    finally get a big enough buffer (or run out of memory). In my
    opinion it's a little easier to implement this scheme by using
    getc() to read a character at a time instead of using fgets()
    to read a batch of characters, but either way it's fairly
    straightforward.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 24, 2006
    #5
  6. Guest

    wrote:
    > I need to read in a comma separated file, and for this I was going to
    > use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
    > I noticed that the document said:
    >
    > "Reads characters from stream and stores them in string until (num -1)
    > characters have been read or a newline or EOF character is reached,
    > whichever comes first."
    >
    > My question is that if it stops at a new line character (LF?) then how
    > does one read a file with multiple new line characters?


    Well presumably you would just read line after line. (fgets() can be
    called iteratively.)

    > Another question. The syntax is:
    >
    > char * fgets (char * string , int num , FILE * stream);
    >
    > but you have to allot a size for the string before this. Would you just
    > use the same num as used in the fgets? So char stringexample[num] ?


    Somehow you are just supposed to know the length. You have to guess --
    usually you just overestimate or something like that. If its too small
    then you get truncated results. Yeah, it doesn't make much more sense
    to me either. This is just a design stupidity of the C language.

    You can save yourself a lot of grief and just download The Better
    String Library and its examples. Its open source and includes an Excel
    compatible CSV reader. You can get it from here:

    http://bstring.sf.net/

    It also includes more logical line reading functions like bgets which
    you use via something like:

    bstring b = ((bNgetc) fgetc, stdin, '\n');

    Which will read a line of text from the standard input into the bstring
    b which will be sized as required. Or if you just want to deal with
    the whole thing at once:

    struct bstrlist * sl=bsplit(b=bread ((bNread)fread,stdin),'\n');

    Which will read the whole file into the bstring b, and split it into
    individual sub-strings seperated by '\n's stored in sl.

    Of course, as I said, neither of these things are quite correct for
    parsing CSV that can include quotation, however the examples give a
    mechanism for this:

    struct bStream * s = bsopen ((bNread) fread, stdin);
    struct CSVStream * csv = parseCSVOpen (s);
    struct CSVEntry entry; /* contents, mode */
    /*...*/
    parseCSVNextEntry (&entry, csv); /* Grab an entry */
    /*...*/
    parseCSVClose (csv);

    Its fast and correct.

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
     
    , Nov 24, 2006
    #6
  7. CBFalconer Guest

    Eric Sosman wrote:
    > wrote:
    >
    >> I need to read in a comma separated file, and for this I was going
    >> to use fgets. I was reading about it at http://www.cplusplus.com/ref/
    >> and I noticed that the document said:
    >>
    >> "Reads characters from stream and stores them in string until
    >> (num -1) characters have been read or a newline or EOF character
    >> is reached, whichever comes first."
    >>

    .... snip ...
    >
    > If you've used malloc() to obtain memory for the buffer,
    > another possibility is to use realloc() to make the buffer
    > larger (preserving the already-read portion) and call fgets()
    > again to read the tail of the line into the tail of the expanded
    > buffer. If necessary, you can expand again and again until you
    > finally get a big enough buffer (or run out of memory). In my
    > opinion it's a little easier to implement this scheme by using
    > getc() to read a character at a time instead of using fgets()
    > to read a batch of characters, but either way it's fairly
    > straightforward.


    Or simply download and use the public domain ggets, at:

    <http://cbfalconer.home.att.net/download/>

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
     
    CBFalconer, Nov 24, 2006
    #7
  8. On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
    >Or simply download and use the public domain ggets, at:
    >
    > <http://cbfalconer.home.att.net/download/>


    "The storage has been allocated within fggets ... Freeing of assigned
    storage is the callers responsibility".

    This programming style is not used by the Standard C library (and
    other well-known libraries). I'd be reluctant to use it in my
    programs.

    Best regards,
    Roland Pibinger
     
    Roland Pibinger, Nov 24, 2006
    #8
  9. Roland Pibinger wrote:
    > On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
    > >Or simply download and use the public domain ggets, at:
    > >
    > > <http://cbfalconer.home.att.net/download/>

    >
    > "The storage has been allocated within fggets ... Freeing of assigned
    > storage is the callers responsibility".
    >
    > This programming style is not used by the Standard C library (and
    > other well-known libraries).


    For two simple examples, the style's used by POSIX's strdup, and GNU's
    asprintf. I'd say both are rather well-known.

    > I'd be reluctant to use it in my
    > programs.


    That, of course, is your right.
     
    =?utf-8?B?SGFyYWxkIHZhbiBExLNr?=, Nov 24, 2006
    #9
  10. On Fri, 24 Nov 2006 08:50:13 GMT, (Roland Pibinger)
    wrote:

    >On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
    >>Or simply download and use the public domain ggets, at:
    >>
    >> <http://cbfalconer.home.att.net/download/>

    >
    >"The storage has been allocated within fggets ... Freeing of assigned
    >storage is the callers responsibility".
    >
    >This programming style is not used by the Standard C library (and
    >other well-known libraries). I'd be reluctant to use it in my
    >programs.


    Isn't strdup posix and isn't that well known?


    Remove del for email
     
    Barry Schwarz, Nov 24, 2006
    #10
  11. On 24 Nov 2006 05:00:37 -0800, <truedfx@...com> wrote:
    >Roland Pibinger wrote:
    >> This programming style is not used by the Standard C library (and
    >> other well-known libraries).

    >
    >For two simple examples, the style's used by POSIX's strdup, and GNU's
    >asprintf. I'd say both are rather well-known.


    Guess why there is no strdup (and no asprintf) in the ISO C Standard?

    Best regards,
    Roland Pibinger
     
    Roland Pibinger, Nov 24, 2006
    #11
  12. Richard Bos Guest

    (Roland Pibinger) wrote:

    > On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
    > >Or simply download and use the public domain ggets, at:
    > >
    > > <http://cbfalconer.home.att.net/download/>

    >
    > "The storage has been allocated within fggets ... Freeing of assigned
    > storage is the callers responsibility".
    >
    > This programming style is not used by the Standard C library (and
    > other well-known libraries).


    Isn't it? I can't say that I'm unfamiliar with it.

    > I'd be reluctant to use it in my programs.


    Then you're going to have a right hassle implementing con- and
    destructors for, e.g., linked lists.

    Richard
     
    Richard Bos, Nov 24, 2006
    #12
  13. CBFalconer Guest

    Roland Pibinger wrote:
    > On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
    >
    >> Or simply download and use the public domain ggets, at:
    >>
    >> <http://cbfalconer.home.att.net/download/>

    >
    > "The storage has been allocated within fggets ... Freeing of
    > assigned storage is the callers responsibility".
    >
    > This programming style is not used by the Standard C library (and
    > other well-known libraries). I'd be reluctant to use it in my
    > programs.


    Why not? If you malloc something, you know you need to free it
    when no longer needed. If you use ggets, you know you need to free
    the line when no longer needed. This is not a massive memory
    leap. Meanwhile you don't have to worry about buffer sizes, etc.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
     
    CBFalconer, Nov 24, 2006
    #13
  14. On 23 Nov 2006 23:17:06 -0800, websnarf@...com wrote:
    >mellyshum123@...ca wrote:
    >> but you have to allot a size for the string before this. Would you just
    >> use the same num as used in the fgets? So char stringexample[num] ?

    >
    >Somehow you are just supposed to know the length. You have to guess --
    >usually you just overestimate or something like that. If its too small
    >then you get truncated results.


    Not necessarily. You only need to know if you are done (if the line is
    entirely read) or not. If not, read again until the rest of the line
    is read. Your code basically becomes a loop. Just assume that the
    buffer is always too small to read the line in one pass.

    >Yeah, it doesn't make much more sense
    >to me either. This is just a design stupidity of the C language.


    Live with, not against, your limits.

    >You can save yourself a lot of grief and just download The Better
    >String Library and its examples.


    Best regards,
    Roland Pibinger
     
    Roland Pibinger, Nov 24, 2006
    #14
  15. Roland Pibinger wrote:
    > On 24 Nov 2006 05:00:37 -0800, <truedfx@...com> wrote:
    > >Roland Pibinger wrote:
    > >> This programming style is not used by the Standard C library (and
    > >> other well-known libraries).

    > >
    > >For two simple examples, the style's used by POSIX's strdup, and GNU's
    > >asprintf. I'd say both are rather well-known.

    >
    > Guess why there is no strdup (and no asprintf) in the ISO C Standard?


    See the C99 rationale, section 0.
     
    =?utf-8?B?SGFyYWxkIHZhbiBExLNr?=, Nov 24, 2006
    #15
  16. Bill Reid Guest

    <> wrote in message
    news:...
    > I need to read in a comma separated file, and for this I was going to
    > use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
    > I noticed that the document said:
    >
    > "Reads characters from stream and stores them in string until (num -1)
    > characters have been read or a newline or EOF character is reached,
    > whichever comes first."
    >
    > My question is that if it stops at a new line character (LF?) then how
    > does one read a file with multiple new line characters?
    >
    > Another question. The syntax is:
    >
    > char * fgets (char * string , int num , FILE * stream);
    >
    > but you have to allot a size for the string before this. Would you just
    > use the same num as used in the fgets? So char stringexample[num] ?
    >

    OK, I've read the other responses to this and they were...shall we
    say, regrettable? Except for "pathological" cases, here's all you
    need to do:

    #define LINEMAX 512

    char csv_line[LINEMAX];
    FILE *csv_fptr;

    <get or create a string here that is the path to the CSV file>

    if((csv_fptr=fopen(csv_filepath,"r"))!=NULL) {

    while((fgets(csv_line,LINEMAX,csv_fptr))!=NULL) {

    <you can parse out the data from each csv_line right here>

    }

    fclose(csv_fptr);
    }

    else printf("\nCouldn't open %s",csv_filepath);

    And you're done! Something basically exactly like this is done
    like a trillion times a day without incident or regret...

    Yes, you do have to declare a character array that is bigger than the
    longest line you expect to encounter (I generally use "512" as my "magic
    number" for that), and fgets() is one of those file-reading functions that
    keeps track of a "pointer" to a position in the file, so every time you use
    it,
    it starts reading at the position where it left off the last time it was
    called...this is why it is easy to use it in a loop like above. (If needed,
    you also can use fseek(), rewind(), and ftell() to move the "pointer"
    around the file to positions you want to read.)

    ---
    William Ernest Reid
     
    Bill Reid, Nov 24, 2006
    #16
  17. CBFalconer <> writes:
    > Roland Pibinger wrote:
    >> On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
    >>
    >>> Or simply download and use the public domain ggets, at:
    >>>
    >>> <http://cbfalconer.home.att.net/download/>

    >>
    >> "The storage has been allocated within fggets ... Freeing of
    >> assigned storage is the callers responsibility".
    >>
    >> This programming style is not used by the Standard C library (and
    >> other well-known libraries). I'd be reluctant to use it in my
    >> programs.

    >
    > Why not? If you malloc something, you know you need to free it
    > when no longer needed. If you use ggets, you know you need to free
    > the line when no longer needed. This is not a massive memory
    > leap. Meanwhile you don't have to worry about buffer sizes, etc.


    Exactly. For any resource, there needs to be a way to allocate it and
    a way to release it. For raw chunks of memory, the allocation and
    deallocation routines are "malloc" and "free". For stdio streams,
    they're called "fopen" and "fclose". For the ggets interface (if I
    understand it correctly), they're called "ggets" and "free".

    It might not have been a bad idea to have a special purpose
    deallocation, say "ggets_release"; it would be a simple wrapper around
    "free", but it would leave room for more complex actions in a future
    version. But I don't think it's really necessary.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Nov 24, 2006
    #17
  18. On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
    >Roland Pibinger wrote:
    >> This programming style is not used by the Standard C library (and
    >> other well-known libraries). I'd be reluctant to use it in my
    >> programs.

    >
    >Why not?


    Because responsibilities become unclear. Simple rules like 'whoever
    allocates something must deallocate it' don't work any more.

    >If you malloc something, you know you need to free it
    >when no longer needed.


    Ok, that's symmetric.

    >If you use ggets, you know you need to free
    >the line when no longer needed.


    That's unsymmetric. The user can easily forget the 'free'.
    It's all about style. Maybe someone can tell the story why strdup was
    excluded from the C Standard (I'm not a C historian and don't want to
    become one).

    Best regards,
    Roland Pibinger
     
    Roland Pibinger, Nov 24, 2006
    #18
  19. (Roland Pibinger) writes:
    > On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
    >>Roland Pibinger wrote:
    >>> This programming style is not used by the Standard C library (and
    >>> other well-known libraries). I'd be reluctant to use it in my
    >>> programs.

    >>
    >>Why not?

    >
    > Because responsibilities become unclear. Simple rules like 'whoever
    > allocates something must deallocate it' don't work any more.
    >
    >>If you malloc something, you know you need to free it
    >>when no longer needed.

    >
    > Ok, that's symmetric.
    >
    >>If you use ggets, you know you need to free
    >>the line when no longer needed.

    >
    > That's unsymmetric. The user can easily forget the 'free'.


    malloc() allocates; free() frees.

    ggets() allocates; free() frees.

    It all seems sufficiently symmetric to me. The user has to remember
    the free() in either case.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Nov 24, 2006
    #19
  20. CBFalconer Guest

    Roland Pibinger wrote:
    > On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
    >> Roland Pibinger wrote:
    >>
    >>> This programming style is not used by the Standard C library (and
    >>> other well-known libraries). I'd be reluctant to use it in my
    >>> programs.

    >>
    >> Why not?

    >
    > Because responsibilities become unclear. Simple rules like 'whoever
    > allocates something must deallocate it' don't work any more.


    They don't work anyhow for anything other than the simplest code.

    >
    >> If you malloc something, you know you need to free it
    >> when no longer needed.

    >
    > Ok, that's symmetric.


    Oh? I would think you would be #defining unmalloc free. How are
    you handling freeing after realloc, or calloc?

    >
    >> If you use ggets, you know you need to free
    >> the line when no longer needed.

    >
    > That's unsymmetric. The user can easily forget the 'free'. It's
    > all about style. Maybe someone can tell the story why strdup was
    > excluded from the C Standard (I'm not a C historian and don't
    > want to become one).


    Well, implementing strdup is much simpler than implementing ggets.
    There also isn't a dangerous version (e.g. gets) to be replaced.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
     
    CBFalconer, Nov 24, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Wright
    Replies:
    0
    Views:
    544
    Joe Wright
    Jul 27, 2003
  2. Paul Hsieh

    Regarding: fgets() replacement

    Paul Hsieh, May 29, 2004, in forum: C Programming
    Replies:
    0
    Views:
    465
    Paul Hsieh
    May 29, 2004
  3. AMT2K5
    Replies:
    6
    Views:
    872
    AMT2K5
    Jul 6, 2005
  4. Justme
    Replies:
    9
    Views:
    627
    clayne
    Oct 1, 2006
  5. AMT2K5
    Replies:
    12
    Views:
    651
    AMT2K5
    Jul 6, 2005
Loading...

Share This Page