Newbie-question: scanf alternatives?

CBFalconer · Sep 21, 2004

Dan said:
.... snip ...

fgets() would be a nice function if it did that. Unfortunately, the
non-consumed input from the same line is left into stream, to be read
as a brand new line, by the next fgets() or whatever call.

It would be a terrible function if it did that. Incoming
information would be discarded with no warning, leaving no good
fundamental string inputting routine. As it is, a truncating
version is easily built from it, which also strips the '\n's :

char *fgetstrunc(char *buf, size_t sz, FILE *f)
{
char *rv, *eolp;
int ch;

rv = fgets(buf, sz, f);
if (rv) {
if (eolp = strchr(buf, '\n')) *eolp = '\0';
else {
while (EOF != (ch = getc(f)) && ('\n' != ch)) continue;
/* Handle unterminated final lines */
if (EOF == ch) ungetc(ch, f); /* questionable coding */
}
}
return rv;
} /* untested */

Dan Pop · Sep 21, 2004

In said:
It would be a terrible function if it did that. Incoming
information would be discarded with no warning,

If this is not acceptable, you don't use it. It's as simple as that.

For most applications, a line of input exceeding a certain (application
specific) size is garbage. That's why a truncating fgets() would be a
much better choice than the current one.

leaving no good fundamental string inputting routine.

When did they drop fscanf() from the standard?

As it is, a truncating
version is easily built from it, which also strips the '\n's :

char *fgetstrunc(char *buf, size_t sz, FILE *f)
{
char *rv, *eolp;
int ch;

rv = fgets(buf, sz, f);
if (rv) {
if (eolp = strchr(buf, '\n')) *eolp = '\0';
else {
while (EOF != (ch = getc(f)) && ('\n' != ch)) continue;

Because \n is more likely to be encountered first, it makes more sense
to test for it first. The code would also better reflect your actual
intentions (you're *really* looking for a newline character and an EOF
value is the exception that must be dealt with).

/* Handle unterminated final lines */
if (EOF == ch) ungetc(ch, f); /* questionable coding */

It's actually brain dead coding:

4 If the value of c equals that of the macro EOF, the operation
fails and the input stream is unchanged.

}
}
return rv;
} /* untested */

Try to implement it without using fgets() at all and compare the two
versions. Did the usage of fgets() buy you anything at all?

After that, compare to an fscanf-based solution, that doesn't need any
loops and/or nested ifs.

Dan

Malcolm · Sep 21, 2004

Edmund Bacon said:
Perhaps you just need to know how to use fgets() appropriately:

You weren't around here about four years ago. You'd be amazed how many
experienced programmers didn't understand that the trailing newline must be
checked, and not discarded, despite having it pointed out time after time
again.

The attitude was, it doesn't show UB (unlike gets()), therefore it must be
OK.

while( buff[strlen(buff) -1 ] != '\n')
{
fgets(buff, sizeof buff, stdin);
len += strlen(buff);
ptr = realloc(ptr, len+1); /* error checking omitted */
strcat(ptr, buff);
}

if(ptr)
ptr[len-1] = '\0'; /* strip trailing newline */

return ptr;

That's the sort of thing you've got to do, but you've written ten lines of
code to do it, and it still isn't complete (realloc() not checked for out of
memory). It is also a rather inefficient algorithm.
It's actually probably easier to build a line reader on top of fgetc().

Malcolm · Sep 21, 2004

Richard Bos said:
Don't bother. Use fgets() instead. Dan Pop is just about the only poster
in this group who thinks fgets() is broken; most of the rest of us think
scanf() is unnecessarily complicated unless your requirements involve
single line lengths only, and fgets() does exactly what it should do.

I wouldn't go so far as to say fgets() is "broken", but it presents
temptations to bad coding which many programmers fall into, even regs.

CBFalconer · Sep 21, 2004

Dan said:
.... snip ...

When did they drop fscanf() from the standard?

Notice the word 'fundamental'. That was intended to prevent silly
comments.

Because \n is more likely to be encountered first, it makes more sense
to test for it first. The code would also better reflect your actual
intentions (you're *really* looking for a newline character and an EOF
value is the exception that must be dealt with).

If you look closely above you will see \n is tested first. The
loop which is flushing overlong lines is not as likely to occur,
and will almost certainly be dominated by the getc call. Your
comment, while accurate, is a chimera.

It's actually brain dead coding:

4 If the value of c equals that of the macro EOF, the operation
fails and the input stream is unchanged.

You failed to show the full fgets return specification, below:

Returns

[#3] The fgets function returns s if successful. If end-of-
file is encountered and no characters have been read into
the array, the contents of the array remain unchanged and a
null pointer is returned. If a read error occurs during the
operation, the array contents are indeterminate and a null
pointer is returned.

The 'questionable coding' is only because the action of ungetting
EOF is not guaranteed. The whole system action with a final
unterminated line is not well defined. Systems exist, in fact
they are common, where an EOF is automatically cancelled after
emission. The code attempts to simulate that missing final \n and
preserve a future EOF for other coding, including a further call
to this routine.

Try to implement it without using fgets() at all and compare the
two versions. Did the usage of fgets() buy you anything at all?

Possibly, it depends on the unspecified relative performance of
getc and fgets. However the point of the article was to
demonstrate that a truncating fgets can be built from the existing
routine, while the reverse is not possible. This would not be
overly evident if the routine did not use fgets at all.

c453___ · Sep 21, 2004

NEVER, NEVER, NEVER use gets. Look up fgets instead.
M$ lsass.exe & netrap.dll (wXP,2k,2k3) must have been written with fgets()

we have blaster & sasser thanks to that

Allin Cottrell · Sep 21, 2004

Dan said:
fgets() would be a nice function if it did that. Unfortunately, the
non-consumed input from the same line is left into stream, to be read
as a brand new line, by the next fgets() or whatever call.

fgets() is a great function for applications that only need to read one
line of input from each stream and don't care if the line was too long
to fit in the buffer.

Point taken, but I think overstated. In a text-processing context, it's
often reasonable to assume that a "line" should not be more than, say,
1024 bytes in length, and 1k is by no means an excessive size for
an automatic buffer to read into using fgets.

Provided the code has reasonable error checking -- not hard to arrange
if in context one has fairly strong expectations of what a "line" might
contain -- it should be possible to reject input that contains
multi-kilobyte "lines" as malformed.

Allin Cottrell

Malcolm · Sep 22, 2004

Allin Cottrell said:
Point taken, but I think overstated. In a text-processing context, it's
often reasonable to assume that a "line" should not be more than, say,
1024 bytes in length, and 1k is by no means an excessive size for
an automatic buffer to read into using fgets.

Exactly. It may be necessary to restrict names of people to 31 characters.
However if the line containing the name is over 31 characters, then the
input could still be legitimate - it may be someone with a very long name
who doesn't know that he has to shorten it. We don't necessarily want to
reject the whole input just because one person has a name too long to fit.
On the other hand, if the name is over 1024 characters, it is impossible
that this input is legitimate. Either someone has fed a corrupted file to
the program, or there is a malicious exploit attempt going on. Either way,
it is probably OK to reject the whole file.

Giorgos Keramidas · Sep 22, 2004

CBFalconer said:
It would be a terrible function if it did that. Incoming
information would be discarded with no warning, leaving no good
fundamental string inputting routine. As it is, a truncating
version is easily built from it, which also strips the '\n's :

char *fgetstrunc(char *buf, size_t sz, FILE *f)
{
char *rv, *eolp;
int ch;

rv = fgets(buf, sz, f);
if (rv) {
if (eolp = strchr(buf, '\n')) *eolp = '\0';
else {
while (EOF != (ch = getc(f)) && ('\n' != ch)) continue;
/* Handle unterminated final lines */
if (EOF == ch) ungetc(ch, f); /* questionable coding */

Very questionable indeed. I'm not sure what happens when you ungetc(EOF).
Mostly because this is part of the ungetc() manpage here:

If a character is successfully pushed-back, the end-of-file indicator
for the stream is cleared.

}
}
return rv;
} /* untested */

Getting a full line, even if fgets() for some reason doesn't get one in a
single call by calling fgets() repeatedly with an advancing pointer inside
buf[], until either '\n' or EOF is met, is not so hard. I commonly found
myself writing something similar to this:

: char *
: getline(char *buf, size_t len, FILE *fp)
: {
: char *p;
: size_t nbytes;
:
: if (fp == NULL || buf == NULL || len == 0)
: return NULL;
:
: p = buf;
: while (len > 0 && !ferror(fp) && !feof(fp) &&
: fgets(p, len, fp) != NULL) {
: nbytes = strlen(p);
: if (p[nbytes - 1] == '\n') {
: p[nbytes - 1] = '\0';
: nbytes--;
: }
: len -= nbytes;
: }
: if (ferror(fp)) {
: buf[0] = '\0';
: return NULL;
: }
: }

This, of course, doesn't solve the problem of having a too small buf[]
in the first place, so I later developer a version that allocates the
returned string buffer.

Ben Pfaff · Sep 22, 2004

CBFalconer said:
if (EOF == ch) ungetc(ch, f); /* questionable coding */

Questionable indeed:

7.19.7.11 The ungetc function
Synopsis
1 #include <stdio.h>
int ungetc(int c, FILE *stream);
Description
[...]
4 If the value of c equals that of the macro EOF, the operation
fails and the input stream is unchanged.

Richard Bos · Sep 22, 2004

c453___ said:
M$ lsass.exe & netrap.dll (wXP,2k,2k3) must have been written with fgets=
() =

we have blaster & sasser thanks to that

I think you mean they must have been written with gets(). fgets()
wouldn't have caused any buffer overflows; gets() does.
But I don't think they were; M$ probably used their own, unportable,
Windows-specific file reading functions. Since those functions take a
byte count just like fgets() makes it all the more inexcusable that they
still get this wrong after so many years; and that they get it wrong
again, and again, and again.

Richard

Joe Wright · Sep 22, 2004

Ben said:
if (EOF == ch) ungetc(ch, f); /* questionable coding */

Click to expand...

Questionable indeed:

7.19.7.11 The ungetc function
Synopsis
1 #include <stdio.h>
int ungetc(int c, FILE *stream);
Description
[...]
4 If the value of c equals that of the macro EOF, the operation
fails and the input stream is unchanged.

The input stream is unchanged because, at EOF, there is no input to
change. I suppose another getc(stream) would return EOF again. The
casual observer wouldn't know it didn't work.

Dan Pop · Sep 22, 2004

In said:
Notice the word 'fundamental'. That was intended to prevent silly
comments.

Please engage your brain and explain what makes fscanf() any less
fundamental than any other standard C library function. Chapter and
verse welcome.

If you look closely above you will see \n is tested first.

Not in the code I was commenting about.

The
loop which is flushing overlong lines is not as likely to occur,
and will almost certainly be dominated by the getc call. Your
comment, while accurate, is a chimera.

Had you engaged your brain, you'd have noticed that I've never invoked
code performance as an argument in my comment. If writing code that
best reflects your intentions is a chimera for you, maybe you should
stop posting code here...

It's actually brain dead coding:

4 If the value of c equals that of the macro EOF, the operation
fails and the input stream is unchanged. ^^^^^^^^^^^^^

Click to expand...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You failed to show the full fgets return specification, below:

Returns

[#3] The fgets function returns s if successful. If end-of-
file is encountered and no characters have been read into
the array, the contents of the array remain unchanged and a
null pointer is returned. If a read error occurs during the
operation, the array contents are indeterminate and a null
pointer is returned.

None of which is relevant in context.

The 'questionable coding' is only because the action of ungetting
EOF is not guaranteed.

Are you reading impaired?!? It *is* guaranteed to fail.

The whole system action with a final
unterminated line is not well defined. Systems exist, in fact
they are common, where an EOF is automatically cancelled after
emission. The code attempts to simulate that missing final \n and
preserve a future EOF for other coding, including a further call
to this routine.

This simulation is guaranteed to fail, according to the standard.
What part of "the operation fails and the input stream is unchanged" was
too difficult for you to understand?

Possibly, it depends on the unspecified relative performance of
getc and fgets.

Where did I mention the word or concept of "performance" in my post?
Since this is I/O bound code, it should have been obvious that the
overriding concern was code complexity and not code performance.

However the point of the article was to
demonstrate that a truncating fgets can be built from the existing
routine, while the reverse is not possible. This would not be
overly evident if the routine did not use fgets at all.

Your article failed to show the necessity of fgets, as currently defined
by the C standard, in the first place. It successfully showed the
relatively large overheads involved by the safe usage of this function,
due to its poor design. Which was actually my point: using fscanf instead
results in significantly simpler code.

Dan

Dan Pop · Sep 22, 2004

In said:
Point taken, but I think overstated. In a text-processing context, it's
often reasonable to assume that a "line" should not be more than, say,
1024 bytes in length, and 1k is by no means an excessive size for
an automatic buffer to read into using fgets.

Provided the code has reasonable error checking -- not hard to arrange
if in context one has fairly strong expectations of what a "line" might
contain -- it should be possible to reject input that contains
multi-kilobyte "lines" as malformed.

But even then, the code is simpler if it doesn't use fgets() in the
first place. Try it and see for yourself. Which prompts the question:
what is fgets() good for? The *only* answer I could find can be still
seen in the included text, above.

Dan

CBFalconer · Sep 22, 2004

Giorgos said:
.... snip ...

Getting a full line, even if fgets() for some reason doesn't get
one in a single call by calling fgets() repeatedly with an
advancing pointer inside buf[], until either '\n' or EOF is met,
is not so hard. I commonly found myself writing something
similar to this:

.... snip ...

I suggest you try ggets, available at:

<http://cbfalconer.home.att.net/download/>

CBFalconer · Sep 22, 2004

Ben said:
CBFalconer said:

if (EOF == ch) ungetc(ch, f); /* questionable coding */

Click to expand...

Questionable indeed:

7.19.7.11 The ungetc function
Synopsis
1 #include <stdio.h>
int ungetc(int c, FILE *stream);
Description
[...]
4 If the value of c equals that of the macro EOF, the
operation fails and the input stream is unchanged.

See, you agree with me. Questionable in that it is very likely
not to have the desired effect, but it will not crash the
program. The desired effect is to preserve the EOF condition for
any other i/o call.

Felipe Magno de Almeida · Sep 22, 2004

Malcolm said:
I wouldn't go so far as to say fgets() is "broken", but it presents
temptations to bad coding which many programmers fall into, even regs.

which temptations to bad coding?

--
Felipe Magno de Almeida
UIN: 2113442
email: felipe.almeida@ic unicamp br, felipe.m.almeida@gmail com
I am a C, modern C++, MFC, ODBC, Windows Services, MAPI developer
from synergy, and Computer Science student from State
University of Campinas(UNICAMP).
To know more about:
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
current work: http://www.mintercept.com

Felipe Magno de Almeida · Sep 22, 2004

Chris said:
fgets() will fix this problem, but adds a new one. What if over 1023
characters are entered, and the partly-read input is processed as whole? The
results are quite likely to be much worse than the undefined behaviour that
results from using gets() ...

s/worse/better/

Seriously, which is more annoying: that your program produces bad
output based on bad input, or that the latest malware takes over
your machine, pops up 10,000,000 porn windows, etc? I would much
rather have garbage output from garbage input, than the latest
security breach.

Fortunately, fgets() leaves a trailing newline in the buffer, to indicate
that it has read the line correctly.

So what we need to do is

if(!strrchr(name, '\n'))
{
fprintf(stderr, "Input too long\n");
exit(EXIT_FAILURE);
}

Or, instead of exiting, just consume up to and including the next
newline:

void discard_a_line(FILE *fp) {
int c;

while ((c = getc(fp)) != '\n' && c != EOF)
continue;
}

and do something reasonable -- whatever that may be -- with the
partial input line in "name".[/QUOTE]
well, just discard the line that is greater than the buffer, and warns
the user, that is not hard to do at all...

--
Felipe Magno de Almeida
UIN: 2113442
email: felipe.almeida@ic unicamp br, felipe.m.almeida@gmail com
I am a C, modern C++, MFC, ODBC, Windows Services, MAPI developer
from synergy, and Computer Science student from State
University of Campinas(UNICAMP).
To know more about:
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
current work: http://www.mintercept.com

Felipe Magno de Almeida · Sep 22, 2004

Edmund said:
The danger with fgets() is that it simply truncates over-long input. For
some applications, this can result in reasonable-seeming but wrong values,
which is the worst possible case (if you send a gas bill for six billion
dollars to an old granny then it is merely embarssing, if you send a bill
for two hundred and thirty dollars when the real amount is one hundred and
ninety, you could easily end up in court facing awkward questions).

Click to expand...

Perhaps you just need to know how to use fgets() appropriately:

consider:

$ cat sample.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *get_input(void)
{
char *ptr = 0;
char buff[10] = {0};
size_t len = 0;

fputs("--> ", stdout);
fflush(stdout);

while( buff[strlen(buff) -1 ] != '\n')
{
fgets(buff, sizeof buff, stdin);
len += strlen(buff);
ptr = realloc(ptr, len+1); /* error checking omitted */
strcat(ptr, buff);
}

if(ptr)
ptr[len-1] = '\0'; /* strip trailing newline */

return ptr;
}

int main()
{

while( !feof(stdin) )
{
char *ptr = get_input();

if(ptr)
{
printf( "input was: \"%s\" : %d characters\n",
ptr, strlen(ptr));
free(ptr);
}
}

return 0;
}

$ gcc sample.c -o sample -Wall -W -pedantic -ansi

$ sample
--> a
input was: "a" : 1 characters
--> abcdefgji
input was: "abcdefgji" : 9 characters
--> abcdefghijklmnopqrstuvwxyz
input was: "abcdefghijklmnopqrstuvwxyz" : 26 characters
-->

It doesn't appear that "fgets() is truncating over long input."
At least not on my system.

If this were going into production, I'd want to handle feof() more
gracefully, I'd definitely want to do something if realloc() failed,
and I would set my input buffer to a reasonably large number (perhaps
1024), so that I'm only calling realloc once for most input. But I
don't have to worry about buffer over-runs, or (within the limits of
alloc()) truncating user input.

I liked this idea

--
Felipe Magno de Almeida
UIN: 2113442
email: felipe.almeida@ic unicamp br, felipe.m.almeida@gmail com
I am a C, modern C++, MFC, ODBC, Windows Services, MAPI developer
from synergy, and Computer Science student from State
University of Campinas(UNICAMP).
To know more about:
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
current work: http://www.mintercept.com

Randy Howard · Sep 22, 2004

while( buff[strlen(buff) -1 ] != '\n')
{
fgets(buff, sizeof buff, stdin);
len += strlen(buff);
ptr = realloc(ptr, len+1); /* error checking omitted

Click to expand...

Click to expand...

as you say, failure would need to be dealt with.

You had better use a tmp pointer for this, or you're doomed
if it fails.

scanf()	32	Aug 6, 2012
question about scanf	11	Apr 16, 2014
Newbie to coding and wanting to learn	2	Apr 16, 2022
Why does spacing matter in this context?	0	Aug 1, 2022
Facial Recognition Door Handles	1	May 27, 2024
Help for a newbie	13	Feb 13, 2023
Anaconda Alternative Question	1	Dec 19, 2023
question on assignment suppression in scanf	2	Oct 18, 2008

Newbie-question: scanf alternatives?

CBFalconer

Dan Pop

Malcolm

Malcolm

CBFalconer

c453___

Allin Cottrell

Malcolm

Giorgos Keramidas

Ben Pfaff

Richard Bos

Joe Wright

Dan Pop

Dan Pop

CBFalconer

CBFalconer

Felipe Magno de Almeida

Felipe Magno de Almeida

Felipe Magno de Almeida

Randy Howard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads