Fibonacci number

C

CBFalconer

Why? Consider the following:

/* assuming ULONG_MAX == 4294967295 */
unsigned long ul1 = -2UL; // perfectly good, ul1 = 4294967294
unsigned long ul2 = 4294967297UL; // error, overflow

The first is actually -(2UL), which has been specifically defined
to be interpreted with modulo arithmetic. However the string "-2"
represents a value outside the range of an unsigned long. The
purpose of error detection in strtoul() is to detect the supply of
an invalid value by the user.

I would suggest that the proper interpretation of that string is
to set endptr to point after the 2 (which may be to a '\0'),
return ULONG_MAX, and set errno to ERANGE.

To achieve the modulo interpretation, the programmer should have
to write:

unsigned long u;

u = -2;
or
u = strtol("-2", NULL, 10);

which last will have exactly the same values to use in the actual
assignment.
 
T

Thad Smith

CBFalconer said:
... the string "-2"
represents a value outside the range of an unsigned long. The
purpose of error detection in strtoul() is to detect the supply of
an invalid value by the user.

I would suggest that the proper interpretation of that string is
to set endptr to point after the 2 (which may be to a '\0'),
return ULONG_MAX, and set errno to ERANGE.

Basically, the argument is for a change of the standard to disallow the
optional - sign. I agree, and Doug implied earlier that such an
interpretation would make sense if we start defining the strtoul
function from scratch. Since that would be a quiet change for working
programs that don't check ERRNO, and it is easy to avoid in the program
by testing, I don't expect it to fly.

Another example: application programs which accept 012 as a user input
and convert to 10, not the intended 12.

I chalk both problems up to baggage that C carries around because it is
has a legacy. The practical alternatives are to add your own checks or
to use non-Standard-C robust input routines.

Thad
 
C

CBFalconer

Thad said:
Basically, the argument is for a change of the standard to disallow the
optional - sign. I agree, and Doug implied earlier that such an
interpretation would make sense if we start defining the strtoul
function from scratch. Since that would be a quiet change for working
programs that don't check ERRNO, and it is easy to avoid in the program
by testing, I don't expect it to fly.

Well, not quite. I am willing to face the idea that - signs are
accepted on input, and that the string is then parsed up to the
first non-digit (per base). I could even be persuaded to accept
the modulo arithmetic. But errno should be set to ERANGE for all
of them. Then it is possible to detect user errors. The routine
can be used as the basis for strtol in addition.

As it is I cannot use strtoul to input from somewhere, and either
write out the same value or announce some form of input error.
 
N

nrk

CBFalconer said:
Well, not quite. I am willing to face the idea that - signs are
accepted on input, and that the string is then parsed up to the
first non-digit (per base). I could even be persuaded to accept
the modulo arithmetic. But errno should be set to ERANGE for all
of them. Then it is possible to detect user errors. The routine
can be used as the basis for strtol in addition.

As it is I cannot use strtoul to input from somewhere, and either
write out the same value or announce some form of input error.

You have the input in the first place. All you'd have to do is check the
first character for a '-' (for the specific case you mention), once you get
a return and check endptr and errno for other possible errors in the input.
This doesn't seem too unreasonable to me, once you take the step of
accepting modulo arithmetic from strtoul.

-nrk.
 
C

CBFalconer

nrk said:
You have the input in the first place. All you'd have to do is check
the first character for a '-' (for the specific case you mention),
once you get a return and check endptr and errno for other possible
errors in the input. This doesn't seem too unreasonable to me, once
you take the step of accepting modulo arithmetic from strtoul.

It is not quite that simple. You also have to scan off any
leading blanks, and possibly all leading white space. And if "-2"
is a legal input field for strtoul, what about "--2"?
 
C

Chris Torek

It is not quite that simple. You also have to scan off any
leading blanks, and possibly all leading white space.

True (provided you do want to allow leading whitespace;
otherwise a simple isdigit() test will suffice for bases
up to 10). If you are in the C locale and have a pointer
to the start of the string, you can skip whitespace with:

s += strspn(s, " \t\n\r\f\v\b");
And if "-2" is a legal input field for strtoul, what about "--2"?

Read the strtoul() specification (or below).

This text is from the BSD/OS strtol() manual page:

The string may begin with an arbitrary amount of white space (as deter-
mined by isspace(3)) followed by a single optional `+' or `-' sign. If
base is zero or 16, the string may then include a `0x' prefix, and the
number will be read in base 16; otherwise, a zero base is taken as 10
(decimal) unless the next character is `0', in which case it is taken as
8 (octal).
...
Upon success the strtoul() and strtoull() functions return either the re-
sult of the conversion or, if there was a leading minus sign, the nega-
tion of the result of the conversion, unless the original (non-negated)
value would overflow. In the case of an overflow the functions return
ULONG_MAX and UQUAD_MAX respectively and the global variable errno is set
to ERANGE.

(Note that C99 uses ULLONG_MAX, not UQUAD_MAX. Our strtoull()
predates C99 and was originally spelled strtoq(). I originally
wrote this manual page for 4.3BSD, where we used "quad_t" as the
name for 64-bit integers starting in the early 1990s.)
 
N

nrk

CBFalconer said:
It is not quite that simple. You also have to scan off any
leading blanks, and possibly all leading white space. And if "-2"
is a legal input field for strtoul, what about "--2"?

Right. Let's take this one thing at a time. "--2", we know, according to
specification, doesn't qualify as an integer for either strtol or strtoul
since both allow exactly *one* optional sign preceding the number (see my
recent thread on a broken implementation of strtol that doesn't grok this
correctly). So, if we do use endptr correctly to track the call to
strtoul, this case will be handled.

Let's take the case of leading spaces. I had forgotten all about them in my
initial remedy to check for a '-' sign. Chris has pointed out how you can
skip leading white space if any. However, turns out that things are much
simpler than that if you merely wish to detect the presence of the '-'.
Use strchr and check the return against NULL and endptr! Thusly, we can
come up with something along the lines of (it is even simpler if the entire
string is supposed to be converted):

unsigned long safe_strtoul(const char *src, char **endptr, int base) {
unsigned long ret;
int old_errno = errno;
char *cptr;

errno = 0;
ret = strtoul(src, endptr, base);

if ( errno )
return ret; /* lets not bother checking anything */

if ( src == *endptr ) {
/* no conversion, throw back at user */
errno = old_errno;
return ret;
}

cptr = strchr(src, '-');
if ( cptr && cptr < *endptr ) {
/* conversion, but there was a '-' at start */
errno = ERANGE;
return ULONG_MAX;
/* I think at this point we've achieved the CBF strtoul */
}

errno = old_errno;
return ret;
}

-nrk.
 
D

Douglas A. Gwyn

Thad said:
Another example: application programs which accept 012 as a user input
and convert to 10, not the intended 12.
I chalk both problems up to baggage that C carries around because it is
has a legacy.

And some of that legacy has origins that predate Unix and C.
One saw 012 = 10. and ASCIZ (null-terminated) strings in DEC
software well before C appeared.
 
D

Douglas A. Gwyn

CBFalconer said:
As it is I cannot use strtoul to input from somewhere, and either
write out the same value or announce some form of input error.

Sure you can, you just have to do part of the checking
that you seem to want for your application yourself.

Really good input checking and validation requires much
more than any Standard C library function provides.
 
D

Dan Pop

In said:
Really good input checking and validation requires much
more than any Standard C library function provides.

And this is a deficiency of the C standard that the committee doesn't
bother to fix. Why should each robust C application have to invent its
own wheel?

Dan
 
P

P.J. Plauger

In <[email protected]> "Douglas A. Gwyn"

And this is a deficiency of the C standard that the committee doesn't
bother to fix. Why should each robust C application have to invent its
own wheel?

Reading and parsing input is effectively recognizing a mini language.
It's pretty hard to generalize the appropriate error recovery for
malformed input, and even harder to make an all purpose parser to
do the heavy lifting. scanf is one attempt at generalizing what
we learned form decades of reading formatted input. The standardized
form is pretty comprehensive and consistent, but nobody in X3J11
working on the first C Standard was a fan of it, or felt it was
adequate for serious programming. Equally, none of us knew how to
make a significantly more robust input parser.

So I guess each robust C application will have to continue to invent
its own wheel until someone puts forth a wheel worth standardizing.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
R

Roc

Douglas A. Gwyn said:
And some of that legacy has origins that predate Unix and C.
One saw 012 = 10. and ASCIZ (null-terminated) strings in DEC
software well before C appeared.

Someone provide the history on how 012 was interpreted as 10, please?
 
J

James Kuyper

Roc said:
Someone provide the history on how 012 was interpreted as 10, please?

For pretty much as long as C has been in existence, integer constants
that start with 0, when the second character is neither 'x' nor 'X', are
interpreted as being in octal. Octal 12 = 8*1+2 = 10 decimal. See
section 6.4.4.1.
 
D

Dave Hansen

For pretty much as long as C has been in existence, integer constants
that start with 0, when the second character is neither 'x' nor 'X', are
interpreted as being in octal. Octal 12 = 8*1+2 = 10 decimal. See
section 6.4.4.1.

Doug's claim is that 012 = 10 was around "well before C appeared."
I'm not saying he's wrong (in fact, I think he's right), but I, too,
would be curious where "leading 0 means octal" cam e from, if not from
C.

Regards,

-=Dave
 
D

Dan Pop

In said:
It is not quite that simple. You also have to scan off any
leading blanks, and possibly all leading white space.

char first;
sscanf(input, " %c", &first)

Will do the job for you. Rocket science, indeed.
And if "-2"
is a legal input field for strtoul, what about "--2"?

Are you reading impaired or what?

3 If the value of base is zero, the expected form of the subject
sequence is that of an integer constant as described in 6.4.4.1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
optionally preceded by a plus or minus sign, but not including
^^^^^^^^^^^^^^^^^^^^^^^=^^^^^^^^^^^^^^^^^^^
an integer suffix. If the value of base is between 2 and 36
(inclusive), the expected form of the subject sequence is a
sequence of letters and digits representing an integer with the
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
radix specified by base, optionally preceded by a plus or minus
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^=^^^^^^^^^^^^^^
sign, but not including an integer suffix.
^^^^
Dan
 
C

CBFalconer

P.J. Plauger said:
Reading and parsing input is effectively recognizing a mini
language. It's pretty hard to generalize the appropriate error
recovery for malformed input, and even harder to make an all
purpose parser to do the heavy lifting. scanf is one attempt at
generalizing what we learned form decades of reading formatted
input. The standardized form is pretty comprehensive and
consistent, but nobody in X3J11 working on the first C Standard
was a fan of it, or felt it was adequate for serious
programming. Equally, none of us knew how to make a
significantly more robust input parser.

So I guess each robust C application will have to continue to
invent its own wheel until someone puts forth a wheel worth
standardizing.

However I had always thought, until recently, that strtoul was a
fundamental spoke for such a wheel. It seems that it isn't. I
have no problems with needing to design such routines, but I do
have problems with suddenly finding out that it is necessary.
Especially when the only change really needed is to ban the
leading sign.

Are there really programs out there that take advantage of this
provision? This would be something the standards body might
investigate, with a view to making a change.

Apart from exponential notation for reals, I see no problem
building iron-clad numeric input parsers with one-char look
ahead. For reals, the only means I see is to define a faulty
exponent entry as being considered zero (apart from multi-char
push back, which is a major change).
 
P

P.J. Plauger

However I had always thought, until recently, that strtoul was a
fundamental spoke for such a wheel.

It is.
It seems that it isn't.

It seems *to you* that it isn't. I kinda like it.
I
have no problems with needing to design such routines, but I do
have problems with suddenly finding out that it is necessary.
Especially when the only change really needed is to ban the
leading sign.

Or skip it yourself. Why is that so hard?

Look, even good old scanf does a prescan to build a candidate
field before calling the relevant strto* function. You too can
do so. strtoul is arguably more useful if it includes the ability
to process a minus sign, which is easily skipped, than if it lacked
this ability, which you would then have to supply. But in any case
it does what it does, and what it does has (IMO) a reasonable
rationale.
Are there really programs out there that take advantage of this
provision?
Yes.

This would be something the standards body might
investigate, with a view to making a change.

Not a prayer, after all these years.
Apart from exponential notation for reals, I see no problem
building iron-clad numeric input parsers with one-char look
ahead. For reals, the only means I see is to define a faulty
exponent entry as being considered zero (apart from multi-char
push back, which is a major change).

Deciding where to end the parse and what to report back are
only part of a complete breakfast. Recovering from faulty
input is almost always situation specific, and a nontrivial
part of the program structure/strategy.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
D

Dan Pop

In said:
Reading and parsing input is effectively recognizing a mini language.
It's pretty hard to generalize the appropriate error recovery for
malformed input, and even harder to make an all purpose parser to
do the heavy lifting. scanf is one attempt at generalizing what
we learned form decades of reading formatted input. The standardized
form is pretty comprehensive and consistent, but nobody in X3J11
working on the first C Standard was a fan of it, or felt it was
adequate for serious programming. Equally, none of us knew how to
make a significantly more robust input parser.

Actually, scanf is quite good and well designed. I have only two
complaints about it:

1. Undefined behaviour on numeric input overflow. Making the conversion
fail would have been a lot more useful.

2. No escape sequence for specifying a null character in a scanset (\0
would terminate the format string).

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,112
Latest member
BrentonMcc
Top