How to check for errors when inputting a number

B

Ben Bacarisse

Albert said:
Here's a really easy problem:
The input will consist of a single line containing a single integer.
If I may copy Ben's supplied number-input function
int read_num(FILE *fp)
{
static unsigned long line = 0;
int n;
line += 1;
char first[2];
if (fscanf(fp, "%1[0-9+-]", first) == 1 &&
ungetc(first[0], fp) != EOF &&
fscanf(fp, "%d\n", &n) == 1)
return n;

fprintf(stderr, "Input error on line %lu.\n", line);
exit(EXIT_FAILURE);
}

This is not a brilliant function for the input in question. If the
spec. is to be read as there being not extraneous characters on the
line then it might be better to write:

if (fscanf(fp, "%1[0-9+-]", first) == 1 &&
ungetc(first[0], fp) != EOF &&
fscanf(fp, "%1[\n]", &n, first) == 1)
return n;

("dummy" might be a better name than "first"). This checks that there
is one and only one newline after the input number.

If there is not need to check the validity of the input (i.e. if there
are not invalid inputs in the test cases) then a simple scanf("%d",
&n) is all you need.
 
A

Albert

If there is not need to check the validity of the input (i.e. if there
are not invalid inputs in the test cases) then a simple scanf("%d",
&n) is all you need.

Yes, but Richard Heathfield wants me (I presume) to program like a
real-world programmer.
fscanf(fp, "%1[\n]", &n, first) == 1)

Why are you allowed to pass two addresses when you only have one %?

TIA
Albert
 
B

Ben Bacarisse

Albert said:
Yes, but Richard Heathfield wants me (I presume) to program like a
real-world programmer.

If I recall the issue it was about testing for errors -- in particular
input errors. You *must* test for input:

if (scanf("%d", &n) == 1) {
/* Go ahead an use it */
}
else {
/* maybe report an error. If the spec does not
say what should happen when there is no input,
you can perfectly well do nothing.
*/
}

My comment was that the extra work needed to check that the input is
in exactly the format specified may well be over-kill. Real-world
programmers *always* check for input, but they will often accept data
in a more relaxed way that specified -- provided they can be sure that
such data is usable.
fscanf(fp, "%1[\n]", &n, first) == 1)

Why are you allowed to pass two addresses when you only have one %?

I am not. It is typo. I intended to write:

fscanf(fp, "%d%1[\n]", &n, first) == 2)

Note the 2 as well. This checks that the number is read in and that
it is followed by exactly one newline. (Note to others not following
the snipping -- that line does not stand on its own!)
 
A

Albert

Both spellings are accepted in different parts of the English speaking
world;

Yes, but I like to point out that I like initialised with a 's'
instead of a 'z' when others use a 'z'.
 
N

Nate Eldredge

Albert said:
Yes, but I like to point out that I like initialised with a 's'
instead of a 'z' when others use a 'z'.

Do you really pronounce it with a hard "s", that is, an unvoiced
consonant, like "initial iced"? To me (native speaker of American
English) that is hard to say and sounds unnatural, and I can't recall
hearing a native speaker of any other dialect pronounce it that way.
 
A

Albert

Do you really pronounce it with a hard "s", that is, an unvoiced
consonant, like "initial iced"?  To me (native speaker of American
English) that is hard to say and sounds unnatural, and I can't recall
hearing a native speaker of any other dialect pronounce it that way.

No I don't 'REALLY' pronounce it with a hard "s" but I don't nearly
emphasise (with an S :D) the iZed (in this case necessary) part of
initialised (with an S) as compared to most Americans' pronunciation.
 
D

David Thompson

On Sat, 10 Jan 2009 22:58:16 -0800 (PST), Albert
int read_num(FILE *fp)
{
static unsigned long line = 0;
int n;
line += 1;
char first[2];
if (fscanf(fp, "%1[0-9+-]", first) != 1 ||
ungetc(first[0], fp) == EOF ||
fscanf(fp, "%d\n", &n) != 1) {
fprintf(stderr, "Input error on line %lu.\n", line);

Note that the \n in *scanf does not look for specifically a newline,
but for any whitespace, which could be multiple newlines or none.
Only if the input is always one integer per line does this correctly
count lines -- and if you get the error, that's a pretty strong clue
that the input format wasn't right. At a minimum I would make the
message "error on >value< #" and leave it for the person to understand
that value N might be on line N or line M !!= N. ("very unequal said:
exit(EXIT_FAILURE);
}
return n;

}
<snip rest>
 
B

Ben Bacarisse

David Thompson said:
On Sat, 10 Jan 2009 22:58:16 -0800 (PST), Albert
int read_num(FILE *fp)
{
static unsigned long line = 0;
int n;
line += 1;
char first[2];
if (fscanf(fp, "%1[0-9+-]", first) != 1 ||
ungetc(first[0], fp) == EOF ||
fscanf(fp, "%d\n", &n) != 1) {
fprintf(stderr, "Input error on line %lu.\n", line);

Note that the \n in *scanf does not look for specifically a newline,
but for any whitespace, which could be multiple newlines or none.
Only if the input is always one integer per line does this correctly
count lines -- and if you get the error, that's a pretty strong clue
that the input format wasn't right. At a minimum I would make the
message "error on >value< #" and leave it for the person to understand
that value N might be on line N or line M !!= N. ("very unequal"
<G>)

Agreed. I think the problem did have one number per line which I had
chosen to relax with consequent inaccuracy in the error message. I
subsequently suggested replacing the last fscanf with

fscanf("%d%1[\n]", &n, first) != 2

to be more strict, but the idea of reporting the error by numbering
the inputs rather than the lines is a neat one!
 
R

Richard Bos

Albert said:
Yes, but Richard Heathfield wants me (I presume) to program like a
real-world programmer.

Then I suggest that, in this simple case, you use fgets() and strtol()
instead. That combination is more easily bullet-proofed than any
*scanf() call, and it more than suffices in this case.

Richard
 
C

CBFalconer

Richard said:
Then I suggest that, in this simple case, you use fgets() and
strtol() instead. That combination is more easily bullet-proofed
than any *scanf() call, and it more than suffices in this case.

I disagree. getc() is an even better interface, and allows very
detailed error checking. For example, the following code can do
the input, will signal all overflows, and can be checked for
whatever termination is desired after return [by ch = getc(...);
unget(ch, ...); if (XX != ch) badinput();] where the unget is only
needed if you still want that termination char.

This also has the significant (to me) advantage of using the
streamed input, and not requiring any long input string buffers.

It is fairly easily revised to use longs, or even long longs. I am
unhappy with the method of propagating signs in rdxint, since I
want it to ignore the underlying forms (1's, 2's, sign/mag).

/* ------------------------------------------------- *
* File txtinput.c *
* ------------------------------------------------- */

#include <limits.h> /* xxxx_MAX, xxxx_MIN */
#include <ctype.h> /* isdigit, isblank, isspace */
#include <stdio.h> /* FILE, getc, ungetc */
#include "txtinput.h"

/* For licensing restrictions (GPL) see readme.txt in:
* <http://cbfalconer.home.att.net/download/txtio.zip>
*
* These stream input routines are written so that simple
* conditionals can be used:
*
* if (readxint(&myint, stdin)) {
* do_error_recovery; normally_abort_to_somewhere;
* }
* else {
* do_normal_things; usually_much_longer_than_bad_case;
* }
*
* They allow overflow detection, and permit other routines to
* detect the character that terminated a numerical field. No
* string storage is required, thus there is no limitation on
* the length of input fields. For example, a number entered
* with a string of 1000 leading zeroes will not annoy these.
*
* The numerical input routines *NEVER* absorb a terminating
* char (including '\n'). Thus a sequence such as:
*
* err = readxint(&myint, stdin);
* flushln(stdin);
*
* will always consume complete lines, and after execution of
* readxint a further getc (or fgetc) will return the character
* that terminated the numeric field.
*
* They are also re-entrant, subject to the limitations of file
* systems. e.g interrupting readxint(v, stdin) operation with
* a call to readxwd(wd, stdin) would not be well defined, if
* the same stdin is being used for both calls. If ungetc is
* interruptible the run-time system is broken.
*
* Originally issued 2002-10-07
*
* Revised 2006-01-15 so that unsigned entry overflow (readxwd)
uses the normal C modulo (UINT_MAX + 1) operation. readxwd
still rejects an initial sign as an error.
*
* Modified to allow free use - C.B. Falconer 2008-01-16
*/

/* -------------------------------------------------------------
* Skip to non-blank on f, and return that char. or EOF The next
* char that getc(f) will return is unknown. Local use only.
*/
static int ignoreblks(FILE *f)
{
int ch;

do {
ch = getc(f);
} while ((' ' == ch) || ('\t' == ch));
/* while (isblank(ch)); */ /* for C99 */
return ch;
} /* ignoreblks */

/*--------------------------------------------------------------
* Skip all blanks on f. At completion getc(f) will return
* a non-blank character, which may be \n or EOF
*
* Skipblks returns the char that getc will next return, or EOF.
*/
int skipblks(FILE *f)
{
return ungetc(ignoreblks(f), f);
} /* skipblks */

/*--------------------------------------------------------------
* Skip all whitespace on f, including \n, \f, \v, \r. At
* completion getc(f) will return a non-blank character, which
* may be EOF
*
* Skipwhite returns the char that getc will next return, or EOF.
*/
int skipwhite(FILE *f)
{
int ch;

do {
ch = getc(f);
} while (isspace(ch));
return ungetc(ch, f);
} /* skipwhite */

/*--------------------------------------------------------------
* Read an unsigned value. Signal error for overflow or no
* valid number found. Returns 1 for error, 0 for noerror, EOF
* for EOF encountered before parsing a value.
*
* Skip all leading blanks on f. At completion getc(f) will
* return the character terminating the number, which may be \n
* or EOF among others. Barring EOF it will NOT be a digit. The
* combination of error, 0 result, and the next getc returning
* \n indicates that no numerical value was found on the line.
*
* If the user wants to skip all leading white space including
* \n, \f, \v, \r, he should first call "skipwhite(f);"
*
* Peculiarity: This specifically forbids a leading '+' or '-'.
*/
int readxwd(unsigned int *wd, FILE *f)
{
unsigned int value, digit;
int status;
int ch;

#define UWARNLVL (UINT_MAX / 10U)
#define UWARNDIG (UINT_MAX - UWARNLVL * 10U)

value = 0; /* default */
status = 1; /* default error */

ch = ignoreblks(f);

if (EOF == ch) status = EOF;
else if (isdigit(ch)) status = 0; /* digit, no error */

while (isdigit(ch)) {
digit = ch - '0';
if ((value > UWARNLVL) ||
((UWARNLVL == value) && (digit > UWARNDIG))) {
status = 1; /* overflow */
value -= UWARNLVL;
}
value = 10 * value + digit;
ch = getc(f);
} /* while (ch is a digit) */

*wd = value;
ungetc(ch, f);
return status;
} /* readxwd */

/*--------------------------------------------------------------
* Read a signed value. Signal error for overflow or no valid
* number found. Returns true for error, false for noerror. On
* overflow either INT_MAX or INT_MIN is returned in *val.
*
* Skip all leading blanks on f. At completion getc(f) will
* return the character terminating the number, which may be \n
* or EOF among others. Barring EOF it will NOT be a digit. The
* combination of error, 0 result, and the next getc returning
* \n indicates that no numerical value was found on the line.
*
* If the user wants to skip all leading white space including
* \n, \f, \v, \r, he should first call "skipwhite(f);"
*
* Peculiarity: an isolated leading '+' or '-' NOT immediately
* followed by a digit will return error and a value of 0, when
* the next getc will return that following non-digit. This is
* caused by the single level ungetc available.
*
* This module needs further work. CBF 2008-01-16
*/
int readxint(int *val, FILE *f)
{
unsigned int value;
int status, negative;
int ch;

*val = value = 0; /* default */
status = 1; /* default error */
negative = 0;

ch = ignoreblks(f);

if (EOF != ch) {
if (('+' == ch) || ('-' == ch)) {
negative = ('-' == ch);
ch = ignoreblks(f); /* absorb any sign */
}

if (isdigit(ch)) { /* digit, no error */
ungetc(ch, f);
status = readxwd(&value, f);
ch = getc(f); /* This terminated readxwd */
}

if (0 == status) {
/* got initial digit and no readxwd overflow */
if (!negative && (value <= INT_MAX))
*val = value;
else if (negative && (value < UINT_MAX) &&
((value - 1) <= -(1 + INT_MIN)))
*val = -value;
else { /* overflow */
status = 1; /* do whatever the native system does */
if (negative) *val = -value;
else *val = value;
}
}
else if (negative) *val = -value;
else *val = value;
}
ungetc(ch, f);
return status;
} /* readxint */

/*-----------------------------------------------------
* Flush input through an end-of-line marker inclusive.
*/
void flushln(FILE *f)
{
int ch;

do {
ch = getc(f);
} while (('\n' != ch) && (EOF != ch));
} /* flushln */

/* End of txtinput.c */
 
C

CBFalconer

Nate said:
Do you really pronounce it with a hard "s", that is, an unvoiced
consonant, like "initial iced"? To me (native speaker of American
English) that is hard to say and sounds unnatural, and I can't
recall hearing a native speaker of any other dialect pronounce it
that way.

Being of Canadian origin, I am always confused, and such things get
spelled in random ways at random times. :) I have fixed the date
confusion by insisting on ISO standard. So have the Japanese.
 
R

Richard Bos

CBFalconer said:
I disagree. getc() is an even better interface, and allows very
detailed error checking. For example, the following code can do
the input, will signal all overflows,

You're insane. That following code was _two hundred_ fecking lines! And
all that to read one single line containing one single integer which
must be between 1 and 10000? When any line differing from that can be
flagged as an error, without needing further processing? Using even two
dozen lines of code for that job is reckless overkill; using two hundred
is Microsoft-level bloatware.

Richard
 
C

CBFalconer

Richard said:
.... snip ...


You're insane. That following code was _two hundred_ fecking
lines! And all that to read one single line containing one single
integer which must be between 1 and 10000? When any line differing
from that can be flagged as an error, without needing further
processing? Using even two dozen lines of code for that job is
reckless overkill; using two hundred is Microsoft-level bloatware.

No, it was source code with an adequate set of comments, meant to
provide a reliable method of extracting numbers from input
streams. If you count lines in the fundamental input routine, I
think you will find 24. And note that the interface uses getc,
ungetc, and ctype. It detects and signals overflow, thus making it
useful in other areas, such as readxint.

I think you will find it much shorter than the mess loaded from a
library to implement scanf.
 
R

Richard

CBFalconer said:
No, it was source code with an adequate set of comments, meant to
provide a reliable method of extracting numbers from input
streams. If you count lines in the fundamental input routine, I
think you will find 24. And note that the interface uses getc,
ungetc, and ctype. It detects and signals overflow, thus making it
useful in other areas, such as readxint.

I think you will find it much shorter than the mess loaded from a
library to implement scanf.

If that was only 24 code lines within 200 lines total then you need to
re-evaluate your coding style and practise. I think Richard is
correct. You really should stop putting yourself up as such an easy
target.
 
A

Albert

Then I suggest that, in this simple case, you use fgets() and strtol()
instead. That combination is more easily bullet-proofed than any
*scanf() call, and it more than suffices in this case.

Then is the following better than the function after it? A reminder:
knowing roughly how far the program gets to before an error occurs is
usually enough to debug clever algorithms for the purpose of gaining
marks in a programming contest.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#define MAXLINELEN 10
#define BASE 10

int read_num(FILE* in)
{
long num;
static unsigned long ncalls = 0;
ncalls += 1;
char str[MAXLINELEN];
if (fgets(str, MAXLINELEN, in) == NULL) {
fprintf(stderr, "Reading input error on call no. %d to
read_num.\n", ncalls);
exit(EXIT_FAILURE);
}

num = strtol(str, NULL, BASE);
if (num == 0 && str[0] != 0) { // If input was zero, then strtol
returning zero is OK
fprintf(stderr, "No conversion performed on call no. %d to
read_num.\n", ncalls);
exit(EXIT_FAILURE);
} else if ((num == LONG_MAX || num == LONG_MIN) && errno ==
ERANGE) {
fprintf(stderr, "Input would overflow a long on call no. %d to
read_num.\n", ncalls);
exit(EXIT_FAILURE);
}
return (int) x;
}

#include <stdio.h>
#include <stdlib.h>

int read_num(FILE *fp)
{
static unsigned long ncalls = 0;
int n;
ncalls += 1;
char first[2];
if (fscanf(fp, "%1[0-9+-]", first) == 1 &&
ungetc(first[0], fp) != EOF &&
fscanf(fp, "%d\n", &n) == 1) {
return n;
}

fprintf(stderr, "Input error on call no. to read_num\n", ncalls);
exit(EXIT_FAILURE);
}

And BTW, has Richard Heathfield been posting regularly in comp.lang.c?
 
B

Ben Bacarisse

Albert said:
On Jan 21, 7:27 am, (e-mail address removed) (Richard Bos) wrote:

Then is the following better than the function after it?

There is only an answer to such questions if enough can be said about
the criteria for evaluating "better". I will offer a few comments
anyway. Only you can decide which one is better.
A reminder:
knowing roughly how far the program gets to before an error occurs is
usually enough to debug clever algorithms for the purpose of gaining
marks in a programming contest.

Both function have a fatal flaw for general, production use: they
don't signal errors but instead just stop the program. They may work
for a competition, but in other situations you usually need to know if
there was a number, and if not, why not. A utility function that
calls exit it much less useful.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#define MAXLINELEN 10

This is too short for many common integer sizes. I'd want to be able
to read at least -2147483648\n\0 into a line buffer, but I'd go even
bigger myself.
#define BASE 10

int read_num(FILE* in)
{
long num;
static unsigned long ncalls = 0;
ncalls += 1;
char str[MAXLINELEN];
if (fgets(str, MAXLINELEN, in) == NULL) {
fprintf(stderr, "Reading input error on call no. %d to
read_num.\n", ncalls);

You need %lu when printing unsigned long here (and in two places
below).
exit(EXIT_FAILURE);
}

num = strtol(str, NULL, BASE);
if (num == 0 && str[0] != 0) { // If input was zero, then strtol
returning zero is OK

I don't see how it can be an error when num == 0 and str[0] != 0. Did
you mean != '0'? even so, what about "+0" and "-0"? strtol will also
accept " 0" and return zero. If you want to stop this, your need some
other sort of test.
fprintf(stderr, "No conversion performed on call no. %d to
read_num.\n", ncalls);
exit(EXIT_FAILURE);
} else if ((num == LONG_MAX || num == LONG_MIN) && errno ==
ERANGE) {
fprintf(stderr, "Input would overflow a long on call no. %d to
read_num.\n", ncalls);
exit(EXIT_FAILURE);
}
return (int) x;

You risk all sorts of trouble if long can hold larger values than
int. It is probably better to test against INT_MAX and INT_MIN so you
can report a result that overflows rather than just returning it.
}

#include <stdio.h>
#include <stdlib.h>

int read_num(FILE *fp)
{
static unsigned long ncalls = 0;
int n;
ncalls += 1;
char first[2];
if (fscanf(fp, "%1[0-9+-]", first) == 1 &&
ungetc(first[0], fp) != EOF &&
fscanf(fp, "%d\n", &n) == 1) {

To compare, you should have quoted my second version that has:

fscanf(fp, "%d%1[\n]", &n, first) == 2) {

here. Its behaviour is much closer to your function above.
return n;
}

fprintf(stderr, "Input error on call no. to read_num\n", ncalls);

There should be a %lu in there somewhere.
exit(EXIT_FAILURE);
}

And BTW, has Richard Heathfield been posting regularly in
comp.lang.c?

Only you can say if his posting pattern counts as regular. Do you not
see his posts?
 
A

Albert

There is only an answer to such questions if enough can be said about
the criteria for evaluating "better".  I will offer a few comments
anyway.  Only you can decide which one is better.

I'll modify the question to provide a slightly more detailed question:
If you HAD to choose between the two functions above to utilise in a
programming contest, which one would you choose and why?
        exit(EXIT_FAILURE);
    }
    num = strtol(str, NULL, BASE);
    if (num == 0 && str[0] != 0) { // If input was zero, then strtol
returning zero is OK

I don't see how it can be an error when num == 0 and str[0] != 0.  Did
you mean != '0'?  even so, what about "+0" and "-0"?  strtol will also
accept " 0" and return zero.  If you want to stop this, your need some
other sort of test.

Would the pseudocode be then
IF NUM = 0 AND STR[0] != '0' AND STR[0] != '+0' AND STR[0] != '-0'
....HANDLE ERRROR

to stop an error message appearing because 0, +0 or -0 was inputted?
 
B

Ben Bacarisse

Albert said:
I'll modify the question to provide a slightly more detailed question:
If you HAD to choose between the two functions above to utilise in a
programming contest, which one would you choose and why?

Mine because I write it, I understand it, and it does not have any
serious errors in it. I am sure other people might prefer yours
(errors corrected) but I suspect they have long since abandoned this
sub thread.
        exit(EXIT_FAILURE);
    }
    num = strtol(str, NULL, BASE);
    if (num == 0 && str[0] != 0) { // If input was zero, then strtol
returning zero is OK

I don't see how it can be an error when num == 0 and str[0] != 0.  Did
you mean != '0'?  even so, what about "+0" and "-0"?  strtol will also
accept " 0" and return zero.  If you want to stop this, your need some
other sort of test.

Would the pseudocode be then
IF NUM = 0 AND STR[0] != '0' AND STR[0] != '+0' AND STR[0] != '-0'
...HANDLE ERRROR

to stop an error message appearing because 0, +0 or -0 was inputted?

Only if you want to work really hard since there are other cases you'd
have to cover as well. strtol tells you what you need to know. It is
fiddly to use it correctly but the basics are:

(a) Set errno = 0 before the call.
(b) Pass a non-zero endptr because you need that tell what a zero
return means:

errno = 0;
n = strtol(string, &end, base);

Now you can test for whatever you like:

if (end == string)
/* no conversion */

if (errno == ERANGE)
/* result out of range */

if (!errno && !*end && end != string)
/* all of string was used to convert a number that was in range */

and so on.
 
M

Mark Wooding

Ben Bacarisse said:
Albert said:
int read_num(FILE* in)
{
long num;
static unsigned long ncalls = 0;
ncalls += 1;
char str[MAXLINELEN];
if (fgets(str, MAXLINELEN, in) == NULL) {
fprintf(stderr, "Reading input error on call no. %d to
read_num.\n", ncalls);

You need %lu when printing unsigned long here (and in two places
below).
exit(EXIT_FAILURE);
}

errno is not explicitly set at this point.
num = strtol(str, NULL, BASE);
if (num == 0 && str[0] != 0) { // If input was zero, then strtol
returning zero is OK

strtol is happy to skip leading whitespace. This will therefore accept
such bogosity as " ". You also won't detect rubbish like "1234foo" --
for that, you need to retrieve the end-pointer.

I usually call

l = strtol(p, &q, 0);

(allowing input in usual bases), and then check whether *q is zero on
exit. This will detect trailing junk, or stuff like " - "; but it fails
to notice empty strings and treats them as equivalent to zero. I'm not
sure I care enough to do something about this.
I don't see how it can be an error when num == 0 and str[0] != 0.

Suppose the input is the string "bletch".
Did you mean != '0'? even so, what about "+0" and "-0"? strtol will
also accept " 0" and return zero. If you want to stop this, your need
some other sort of test.

Indeed. It's actually surprisingly hard to get a reliable idea of
whether strto... did anything useful. It tells you

And here you might report a false error if some previous operation
failed with ERANGE, and the input was a representation of LONG_MIN or
LONG_MAX. Set errno = 0 first.
You risk all sorts of trouble if long can hold larger values than
int. It is probably better to test against INT_MAX and INT_MIN so you
can report a result that overflows rather than just returning it.

Good advice.

-- [mdw]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,607
Members
45,240
Latest member
pashute

Latest Threads

Top