Question regarding fgets and new lines

mellyshum123 · Nov 24, 2006

I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Clark S. Cox III · Nov 24, 2006

I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

You call it multiple times, until you've read the entire file.

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Yes. You're essentially telling fgets: "OK, I've set this much space
aside for you to read into, give me that many characters (minus 1 for
the NUL terminator) or the first line, whichever comes first."

Peter Nilsson · Nov 24, 2006

I need to read in a comma separated file, and for this I was going to
use fgets.

You may be better off parsing such files one character at a time.

I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

By making multiple calls to fgets().

The problem though is cases like Excel which allow newlines in
individual
field records. Such fields are 'quoted' with a leading double quote
("), and
an embedded double quote is escaped as two double quotes. Hence my
comment that you may be better off with a simple state machine parsing
one character at a time.

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Yes. Sample use is...

char line[256];
while (fgets(line, sizeof line, stdin))
{
/* ... */
}

Though more serious programs will roll their own fgets() that
dynamically
allocates storage for a line, rather than fixing the size of the
buffer.
[Such programs still need to be mindful of the idiots that will pump a
large \n free binary file through stdin.]

mellyshum123 · Nov 24, 2006

Peter said:
You may be better off parsing such files one character at a time.

I guess maybe using fgetc?

Eric Sosman · Nov 24, 2006

I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

One line at a time. Read a line, process it as you see fit,
and then proceed to the next line. Lather, rinse, repeat.

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Yes. The problem of how big to make `num' can be a
vexing one: If you make it 80 you can handle lines of up
to 78 "payload" characters plus a newline and a '\0', but
if the input stream supplies a longer line you've got a
bit of a problem. You could make `num' 1000000, but do you
really want to spend a megabyte as insurance against long
lines? (And there's still the nagging possibility that the
input might hold a 1000001-character line ...)

One plausible way to proceed is to make `num' moderately
larger than the longest line you expect to encounter, call
fgets(), and then check whether the buffer contains a '\n'.
If it does not (and if neither end-of-input nor an I/O error
occurred, which you can test with feof() and ferror()), then
the file contains a longer-than-anticipated line. The first
part of that line has been stored in the buffer, and the tail
end is still "pending," available to be read.

What to do next? If you were expecting lines of up to
around 100 characters and you used a 1000-character buffer
just to be on the safe side and you ran into a line longer
than 1000 characters -- more than ten times what you thought
the maximum length would be -- you might well conclude that
there's something wrong with the input: Maybe the file you've
been handed really isn't a CSV file at all. It would be
perfectly plausible to blurt out an error message and stop
processing, or to blurt an error and throw the offending line
away (remember to "drain" the unread tail by reading until
you get '\n' or EOF).

If you've used malloc() to obtain memory for the buffer,
another possibility is to use realloc() to make the buffer
larger (preserving the already-read portion) and call fgets()
again to read the tail of the line into the tail of the expanded
buffer. If necessary, you can expand again and again until you
finally get a big enough buffer (or run out of memory). In my
opinion it's a little easier to implement this scheme by using
getc() to read a character at a time instead of using fgets()
to read a batch of characters, but either way it's fairly
straightforward.

websnarf · Nov 24, 2006

I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Well presumably you would just read line after line. (fgets() can be
called iteratively.)

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Somehow you are just supposed to know the length. You have to guess --
usually you just overestimate or something like that. If its too small
then you get truncated results. Yeah, it doesn't make much more sense
to me either. This is just a design stupidity of the C language.

You can save yourself a lot of grief and just download The Better
String Library and its examples. Its open source and includes an Excel
compatible CSV reader. You can get it from here:

http://bstring.sf.net/

It also includes more logical line reading functions like bgets which
you use via something like:

bstring b = ((bNgetc) fgetc, stdin, '\n');

Which will read a line of text from the standard input into the bstring
b which will be sized as required. Or if you just want to deal with
the whole thing at once:

struct bstrlist * sl=bsplit(b=bread ((bNread)fread,stdin),'\n');

Which will read the whole file into the bstring b, and split it into
individual sub-strings seperated by '\n's stored in sl.

Of course, as I said, neither of these things are quite correct for
parsing CSV that can include quotation, however the examples give a
mechanism for this:

struct bStream * s = bsopen ((bNread) fread, stdin);
struct CSVStream * csv = parseCSVOpen (s);
struct CSVEntry entry; /* contents, mode */
/*...*/
parseCSVNextEntry (&entry, csv); /* Grab an entry */
/*...*/
parseCSVClose (csv);

Its fast and correct.

CBFalconer · Nov 24, 2006

Eric said:
.... snip ...

If you've used malloc() to obtain memory for the buffer,
another possibility is to use realloc() to make the buffer
larger (preserving the already-read portion) and call fgets()
again to read the tail of the line into the tail of the expanded
buffer. If necessary, you can expand again and again until you
finally get a big enough buffer (or run out of memory). In my
opinion it's a little easier to implement this scheme by using
getc() to read a character at a time instead of using fgets()
to read a batch of characters, but either way it's fairly
straightforward.

Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

Roland Pibinger · Nov 24, 2006

Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Best regards,
Roland Pibinger

Guest · Nov 24, 2006

Roland said:
"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries).

For two simple examples, the style's used by POSIX's strdup, and GNU's
asprintf. I'd say both are rather well-known.

I'd be reluctant to use it in my
programs.

That, of course, is your right.

Barry Schwarz · Nov 24, 2006

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Isn't strdup posix and isn't that well known?

Remove del for email

Roland Pibinger · Nov 24, 2006

For two simple examples, the style's used by POSIX's strdup, and GNU's
asprintf. I'd say both are rather well-known.

Guess why there is no strdup (and no asprintf) in the ISO C Standard?

Best regards,
Roland Pibinger

Richard Bos · Nov 24, 2006

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries).

Isn't it? I can't say that I'm unfamiliar with it.

I'd be reluctant to use it in my programs.

Then you're going to have a right hassle implementing con- and
destructors for, e.g., linked lists.

Richard

CBFalconer · Nov 24, 2006

Roland said:
"The storage has been allocated within fggets ... Freeing of
assigned storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Why not? If you malloc something, you know you need to free it
when no longer needed. If you use ggets, you know you need to free
the line when no longer needed. This is not a massive memory
leap. Meanwhile you don't have to worry about buffer sizes, etc.

Roland Pibinger · Nov 24, 2006

[email protected] said:
[email protected] said:

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Click to expand...

Somehow you are just supposed to know the length. You have to guess --
usually you just overestimate or something like that. If its too small
then you get truncated results.

Not necessarily. You only need to know if you are done (if the line is
entirely read) or not. If not, read again until the rest of the line
is read. Your code basically becomes a loop. Just assume that the
buffer is always too small to read the line in one pass.

Yeah, it doesn't make much more sense
to me either. This is just a design stupidity of the C language.

Live with, not against, your limits.

You can save yourself a lot of grief and just download The Better
String Library and its examples.

Best regards,
Roland Pibinger

Guest · Nov 24, 2006

Roland said:
Guess why there is no strdup (and no asprintf) in the ISO C Standard?

See the C99 rationale, section 0.

Bill Reid · Nov 24, 2006

I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

OK, I've read the other responses to this and they were...shall we
say, regrettable? Except for "pathological" cases, here's all you
need to do:

#define LINEMAX 512

char csv_line[LINEMAX];
FILE *csv_fptr;

<get or create a string here that is the path to the CSV file>

if((csv_fptr=fopen(csv_filepath,"r"))!=NULL) {

while((fgets(csv_line,LINEMAX,csv_fptr))!=NULL) {

<you can parse out the data from each csv_line right here>

}

fclose(csv_fptr);
}

else printf("\nCouldn't open %s",csv_filepath);

And you're done! Something basically exactly like this is done
like a trillion times a day without incident or regret...

Yes, you do have to declare a character array that is bigger than the
longest line you expect to encounter (I generally use "512" as my "magic
number" for that), and fgets() is one of those file-reading functions that
keeps track of a "pointer" to a position in the file, so every time you use
it,
it starts reading at the position where it left off the last time it was
called...this is why it is easy to use it in a loop like above. (If needed,
you also can use fseek(), rewind(), and ftell() to move the "pointer"
around the file to positions you want to read.)

Keith Thompson · Nov 24, 2006

CBFalconer said:
Why not? If you malloc something, you know you need to free it
when no longer needed. If you use ggets, you know you need to free
the line when no longer needed. This is not a massive memory
leap. Meanwhile you don't have to worry about buffer sizes, etc.

Exactly. For any resource, there needs to be a way to allocate it and
a way to release it. For raw chunks of memory, the allocation and
deallocation routines are "malloc" and "free". For stdio streams,
they're called "fopen" and "fclose". For the ggets interface (if I
understand it correctly), they're called "ggets" and "free".

It might not have been a bad idea to have a special purpose
deallocation, say "ggets_release"; it would be a simple wrapper around
"free", but it would leave room for more complex actions in a future
version. But I don't think it's really necessary.

Roland Pibinger · Nov 24, 2006

Why not?

Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.

If you malloc something, you know you need to free it
when no longer needed.

Ok, that's symmetric.

If you use ggets, you know you need to free
the line when no longer needed.

That's unsymmetric. The user can easily forget the 'free'.
It's all about style. Maybe someone can tell the story why strdup was
excluded from the C Standard (I'm not a C historian and don't want to
become one).

Best regards,
Roland Pibinger

Keith Thompson · Nov 24, 2006

Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.

Ok, that's symmetric.

That's unsymmetric. The user can easily forget the 'free'.

malloc() allocates; free() frees.

ggets() allocates; free() frees.

It all seems sufficiently symmetric to me. The user has to remember
the free() in either case.

CBFalconer · Nov 24, 2006

Roland said:
Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.

They don't work anyhow for anything other than the simplest code.

Ok, that's symmetric.

Oh? I would think you would be #defining unmalloc free. How are
you handling freeing after realloc, or calloc?

That's unsymmetric. The user can easily forget the 'free'. It's
all about style. Maybe someone can tell the story why strdup was
excluded from the C Standard (I'm not a C historian and don't
want to become one).

Well, implementing strdup is much simpler than implementing ggets.
There also isn't a dangerous version (e.g. gets) to be replaced.

understanding fgets()	11	Jul 26, 2011
question on fgets	13	Jul 27, 2008
newbie question: fgets() and feof() read last line twice	3	Mar 19, 2009
fgets	8	Aug 27, 2004
fgets - design deficiency: no efficient way of finding last character read	43	Apr 11, 2012
fgets problem	23	Dec 22, 2008
fgets() equivalent?	26	Nov 30, 2007
Replacing fgets	32	Sep 17, 2006

Question regarding fgets and new lines

mellyshum123

Clark S. Cox III

Peter Nilsson

mellyshum123

Eric Sosman

websnarf

CBFalconer

Roland Pibinger

Guest

Barry Schwarz

Roland Pibinger

Richard Bos

CBFalconer

Roland Pibinger

Guest

Bill Reid

Keith Thompson

Roland Pibinger

Keith Thompson

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads