fgets - design deficiency: no efficient way of finding last character read

Keith Thompson · Nov 14, 2013

James Kuyper said:
For many functions with return values, the set of values that can be
represented in the return type can be partitioned into three subsets:
A) Values indicating successful operation
B) Values indicating some kind of a problem
C) Values that should not be returned.

If you assume that nothing can go wrong, checking whether the return
value is in set A is the same as checking whether the return value is
not in set B. People routinely write code based upon that assumption,
testing whichever condition is easier to express or evaluate.
I don't like making such assumptions. Among other possibilities:

1. I typed the wrong function name.
2. I remembered incorrectly which return values are in each of the three
categories.
3. The code got linked to a different library than it should have been,
containing a different function with the same name.
4. Some other part of the code contains a defect rendering the behavior
of the code undefined, and the first testable symptom of that fact is
that this particular function returns a value it's not supposed to be
able to return.
5. Other.

Therefore, I prefer to write my code so that values in set C get treated
the same way as values in set B. The problem I'm trying to deal with is
very rare, so I don't bother doing this if doing so would make the code
significantly more complicated. However, adding "== buf" falls below my
threshold for "significantly more complicated".

[...]

Interesting approach. It seems (to me) slightly obscure because I tend
not to think about fgets() returning its first argument, since it's not
a particularly useful value to return.

A more paranoid approach would be:

if ((result = fgets(buf, sizeof buf, fp)) == buf) {
/* ok */
}
else if (result == NULL) {
/* end-of-file or error */
}
else {
/* THIS SHOULD NEVER HAPPEN, print a stern warning and abort */
}

But unless you're writing a test suite for the standard library,
checking for illegal results from standard library functions probably
isn't worth the extra effort.

James Kuyper · Nov 14, 2013

For many functions with return values, the set of values that can be
represented in the return type can be partitioned into three subsets:
A) Values indicating successful operation
B) Values indicating some kind of a problem
C) Values that should not be returned.

If you assume that nothing can go wrong, checking whether the return
value is in set A is the same as checking whether the return value is
not in set B. People routinely write code based upon that assumption,
testing whichever condition is easier to express or evaluate.
I don't like making such assumptions. Among other possibilities:

1. I typed the wrong function name.
2. I remembered incorrectly which return values are in each of the three
categories.
3. The code got linked to a different library than it should have been,
containing a different function with the same name.
4. Some other part of the code contains a defect rendering the behavior
of the code undefined, and the first testable symptom of that fact is
that this particular function returns a value it's not supposed to be
able to return.
5. Other.

Therefore, I prefer to write my code so that values in set C get treated
the same way as values in set B. The problem I'm trying to deal with is
very rare, so I don't bother doing this if doing so would make the code
significantly more complicated. However, adding "== buf" falls below my
threshold for "significantly more complicated".

Click to expand...

[...]

Interesting approach. It seems (to me) slightly obscure because I tend
not to think about fgets() returning its first argument, since it's not
a particularly useful value to return.

Yes, a pointer to (or perhaps, just after?) the last character written
to the buffer might be be more useful, in some circumstances. The same
issue of relatively useless return values is ubiquitous in the
string-handling functions.

A more paranoid approach would be:

if ((result = fgets(buf, sizeof buf, fp)) == buf) {
/* ok */
}
else if (result == NULL) {
/* end-of-file or error */
}
else {
/* THIS SHOULD NEVER HAPPEN, print a stern warning and abort */
}

But unless you're writing a test suite for the standard library,
checking for illegal results from standard library functions probably
isn't worth the extra effort.

Yes, going that far would exceed my threshold for "significantly more
complicated". For a third party library that was notorious for being
poorly implemented, such an approach might be more reasonable (assuming
that you had to use it, despite that notoriety).

jononanon · Nov 14, 2013

[...]
Interesting approach. It seems (to me) slightly obscure because I tend

Click to expand...

not to think about fgets() returning its first argument, since it's not

Click to expand...

a particularly useful value to return.

Click to expand...

Yes, a pointer to (or perhaps, just after?) the last character written

to the buffer might be be more useful, in some circumstances. The same

issue of relatively useless return values is ubiquitous in the

string-handling functions.

How about this:
fgets2 should return a pointer to the final '\0' if it was written, or elsereturn NULL if feof() or ferror() is set.

(Oh and fgets2(buf, n, fp) should definately not write a '\0' if n = 1, but instead write nothing and then always return NULL).

But in any case, it's too late to change the standardized fgets().

GNU recommends using getline instead.
http://www.gnu.org/software/libc/manual/html_node/Line-Input.html

Comparing GNU's implementation of fgets to Dinkumware's is interesting! Dinkumware's fgets actually uses memchr(pt, '\n', len) to locate a '\n' and follows this by memcpy(s, pt, m), meaning that it iterates over the same buffer two times: first searching, then copying. Slow.
But it is very nicely readable I must say!

GNU introduced low-level functions.
For fgets()
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofgets.c;hb=HEAD
they use _IO_getline() which looks ummm... nice (but is less readable I think)
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iogetline.c;hb=HEAD
but to my shock also has memchr() followed by memcpy(). Slow.

At least GNU's memcpy copies wordwise on longword boundaries.
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memchr.c
I'm not sure if Dinkumware does this.

If I were to roll my own library implementation, I'd do one that searches the stream's internal buffer (reading longwords and accessing the bytes), and immediately copies the longword if neither EOF nor found '\n'... etc.
i.e. iterating over stuff only once, and doing nice alignment etc.
Something like that.

Malcolm McLean · Nov 14, 2013

If I were to roll my own library implementation, I'd do one that searches
the stream's internal buffer (reading longwords and accessing the bytes),
and immediately copies the longword if neither EOF nor found '\n'... etc.

i.e. iterating over stuff only once, and doing nice alignment etc.
Something like that.

You can certainly return a quad from the buffer, then test for '\n' using
four masks and comparators. But EOF is harder to code.
Of course it depends on whether you expect to be in an environment which makes
much use of physical input streams or not. If you expect most input to be
via Unix like pipes and so on, it makes sense to optimise fgets(). But not if
you're reading from a keyboard.

Unable to read input from keyboard, in below C code, for a BST.	0	Jul 20, 2025
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
newbie question: fgets() and feof() read last line twice	3	Mar 19, 2009
How to accept text and put each letter into a 2d matrix?	0	Jun 3, 2022
fgets behaviour with strncmp	8	Jun 17, 2008
fgets question	20	Jun 8, 2008
RSA implementation issues in public key pem loader function	0	May 21, 2025
Can't solve problems! please Help	0	Sep 26, 2022

fgets - design deficiency: no efficient way of finding last character read

Keith Thompson

James Kuyper

jononanon

Malcolm McLean

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads