Converting strings to int

  • Thread starter allthecoolkidshaveone
  • Start date
A

allthecoolkidshaveone

I want to convert a string representation of a number ("1234") to an
int, with overflow and underflow checking. Essentially, I'm looking
for a strtol() that converts int instead of long. The problem with
strtol() is that a number that fits into a long might be too big for
an int. sscanf() doesn't seem to do the over/underflow checking.
atoi(), of course, doesn't do any checking. I've long thought it odd
that there aren't strtoi() and friends for int and short types in the
standard.

Any suggestions?
 
I

Ian Collins

I want to convert a string representation of a number ("1234") to an
int, with overflow and underflow checking. Essentially, I'm looking
for a strtol() that converts int instead of long. The problem with
strtol() is that a number that fits into a long might be too big for
an int. sscanf() doesn't seem to do the over/underflow checking.
atoi(), of course, doesn't do any checking. I've long thought it odd
that there aren't strtoi() and friends for int and short types in the
standard.

Any suggestions?
Use strtol() and check the result to see if it fits in an int.
 
A

allthecoolkidshaveone

I want to convert a string representation of a number ("1234") to an
int, with overflow and underflow checking.

Of course, 30 seconds later, I think to myself "Why not convert to a
long and see if it's between INT_MIN and INT_MAX and if so return
that value casted to an int?"
 
M

Martin Ambuhl

I want to convert a string representation of a number ("1234") to an
int, with overflow and underflow checking. Essentially, I'm looking
for a strtol() that converts int instead of long. The problem with
strtol() is that a number that fits into a long might be too big for
an int. sscanf() doesn't seem to do the over/underflow checking.
atoi(), of course, doesn't do any checking. I've long thought it odd
that there aren't strtoi() and friends for int and short types in the
standard.

Check the long value against INT_MAX and INT_MIN.
Then you have the value (if the conversion to long worked), even if out
of range for an int, and the error checking you want.
 
R

Richard Heathfield

(e-mail address removed) said:
Of course, 30 seconds later, I think to myself "Why not convert to a
long and see if it's between INT_MIN and INT_MAX and if so return
that value casted to an int?"

If it is between those values, you don't need a cast. And if it isn't, a
cast won't do any good anyway.
 
K

Keith Thompson

Richard Heathfield said:
(e-mail address removed) said:

If it is between those values, you don't need a cast. And if it isn't, a
cast won't do any good anyway.

But if you want to store the result in an int, you *will* need a
conversion. This conversion will be done implicitly when you assign
the value.

A lot of people aren't aware that the term "cast" refers *only* to the
explicit cast operator, using a type name in parentheses.
 
R

Richard Heathfield

Keith Thompson said:
Richard Heathfield said:
(e-mail address removed) said:
[...] "Why not convert to a
long and see if it's between INT_MIN and INT_MAX and if so return
that value casted to an int?"

If it is between those values, you don't need a cast. And if it
isn't, a cast won't do any good anyway.

But if you want to store the result in an int, you *will* need a
conversion. This conversion will be done implicitly when you assign
the value.

Or you can simply return it:

int foo(const char *s)
{
long int x = whatever(s);
validate_or_die(x);
return x;
}
A lot of people aren't aware that the term "cast" refers *only* to the
explicit cast operator, using a type name in parentheses.

Yes, sure, but do we really need to include a full chapter of
explanation in every single reply we post?
 
K

Keith Thompson

Richard Heathfield said:
Keith Thompson said:
Richard Heathfield said:
(e-mail address removed) said:
[...] "Why not convert to a
long and see if it's between INT_MIN and INT_MAX and if so return
that value casted to an int?"

If it is between those values, you don't need a cast. And if it
isn't, a cast won't do any good anyway.

But if you want to store the result in an int, you *will* need a
conversion. This conversion will be done implicitly when you assign
the value. [...]
A lot of people aren't aware that the term "cast" refers *only* to the
explicit cast operator, using a type name in parentheses.

Yes, sure, but do we really need to include a full chapter of
explanation in every single reply we post?

No, but it seemed reasonable in this case. The OP incorrectly thought
he needed a cast; the common confusion between "cast" and "conversion"
is a likely explanation of his confusion.
 
D

David Tiktin

I want to convert a string representation of a number ("1234") to an
int, with overflow and underflow checking. Essentially, I'm looking
for a strtol() that converts int instead of long. The problem with
strtol() is that a number that fits into a long might be too big for
an int. sscanf() doesn't seem to do the over/underflow checking.
atoi(), of course, doesn't do any checking. I've long thought it odd
that there aren't strtoi() and friends for int and short types in the
standard.

Any suggestions?

It's actually harder than it looks to use strtol() properly. Here's
the guts a wrapper function I wrote for ints. The wrapper returns 1
if the conversion was OK, 0 otherwise and outputs the value through a
parameter:

Code:
char *  end = NULL;
long    value;

errno = 0;
value = strtol(str, &end, base);

/*
     end == NULL if the base is invalid.
     end == str  if no conversion was done.
    *end == '\0' or *end is whitespace if the number was
            whitespace delimited (a reasonable assumption).
    errno is 0 if no overflow or underflow occurred.
*/
if (end != NULL && end != str && errno == 0 &&
     (*end == '\0' || isspace(*end)))
{
    if (INT_MIN <= value && value <= INT_MAX)
    {
        *integer = (int) value;

        return 1;
    }
}

return 0;

I wonder if anyone would care to comment on whether this method is
adequate.

Dave
 
R

Richard Heathfield

David Tiktin said:

if (end != NULL && end != str && errno == 0 &&
(*end == '\0' || isspace(*end)))

I wonder if anyone would care to comment on whether this method is
adequate.

A cursory glance reveals to me only that you are perhaps a little
optimistic in passing *end to isspace(), which requires that its
parameter be representable as an unsigned char. If, for example, *end
were -1, this would not qualify, and the behaviour would be undefined.

This is one of those very rare and bizarre cases where it is actually a
*good* idea to use a cast - isspace((unsigned char)*end) - and the
normal promotion rules will of course take care of the conversion to
int for you.
 
D

David Tiktin

David Tiktin said:





A cursory glance reveals to me only that you are perhaps a little
optimistic in passing *end to isspace(), which requires that its
parameter be representable as an unsigned char. If, for example,
*end were -1, this would not qualify, and the behaviour would be
undefined.

This is one of those very rare and bizarre cases where it is
actually a *good* idea to use a cast - isspace((unsigned
char)*end) - and the normal promotion rules will of course take
care of the conversion to int for you.

Good catch! I actually "knew" that ;-) I have a bunch of macros
like:

#define TO_LOWER(c) ((char) tolower((unsigned char)(c)))

But not for isspace(). I can't figure out why. Fixed now, though.

Thanks!

Dave
 
P

Peter Nilsson

David Tiktin said:
Good catch! I actually "knew" that ;-) I have a bunch
of macros like:

#define TO_LOWER(c) ((char) tolower((unsigned char)(c)))

How is the (char) cast useful?

P.S. I find the (unsigned char) application above
contentious in that it assumes that 1c and sm
implementations will make plain char unsigned.
 
D

David Tiktin

How is the (char) cast useful?

In it's typical use:

char * ptr = str;

while (*ptr)
{
*ptr = TO_LOWER(*ptr);
ptr++;
}

some compilers I've used over the years complain about the assignment
of an int to a char due to loss of precision. I generally run with
the highest warning levels I can get, so the cast silences a warning
I've investigated and found not to be a problem in this situation.
P.S. I find the (unsigned char) application above
contentious in that it assumes that 1c and sm
implementations will make plain char unsigned.

Sorry, I don't understand your point here or where that assumption is
made.

Is there a problem that the code should be:

*ptr = tolower((int)(*ptr) & 0xFF);

to assure the passed value and result are in the range 0-255 even if
CHAR_BITS is greater than 8?

Dave
 
P

Peter Nilsson

David Tiktin said:
In it's typical use:

char * ptr = str;

while (*ptr)
{
*ptr = TO_LOWER(*ptr);
ptr++;
}

There is no semantic difference.
some compilers I've used over the years complain about the
assignment of an int to a char due to loss of precision.

Assignment of int values to a char is probably the most
fundamental of useful constructs that C has. Putting a
warning on that is to me like putting a warning on every
#include asking if that's the file you actually meant to
include.
I generally run with the highest warning levels I can get,

A good move, but you shouldn't change code to silence one
compiler's warnings unecessarily. Different compilers will
issue warnings for different reasons and two different
compilers can even issue warnings for opposing reasons.
so the cast silences a warning I've investigated and found
not to be a problem in this situation.

The simpler option is to acknowledge that no action is
required as a consequence of the warning.

It's easy to fall into the belief that the absense of
warnings is a strong measure of correctness. 'Clean'
compiles give a sense of confidence. But it's a small
step away from introducing bugs, just to silence a
compiler.
Sorry, I don't understand your point here or where that
assumption is made.

Depending on how you use them, input routines often read
and store bytes, not (plain) chars. On an sm machine
interpreting an input byte as a char representation and
converting it to an unsigned char can potentially yield
a different character code to the original for some
characters outside the basic character set.

It's a highly unlikely scenario, and it's dismissed with
a little handwaving about QoI guaranteeing that 1c and sm
machines will always make plain char unsigned.
Is there a problem that the code should be:

*ptr = tolower((int)(*ptr) & 0xFF);

to assure the passed value and result are in the range
0-255 even if CHAR_BITS is greater than 8?

No. I'm suggesting, in some cases, it should be...

*ptr = tolower(* (unsigned char *) ptr);

Obviously that's not as aesthetic as the direct conversion
(unsigned char) *ptr, but it does have the advantage of
working on the hypothetical machines (contrived if you
like) as well as the vanilla ones.
 
D

David Tiktin

There is no semantic difference.

Semantic difference between what:

*ptr = (char) tolower(c);

and

*ptr = tolower(c);

?
Assignment of int values to a char is probably the most
fundamental of useful constructs that C has. Putting a
warning on that is to me like putting a warning on every
#include asking if that's the file you actually meant to
include.

Sorry, I just don't agree. How many times have we seen code in this
group that goes:

[bad code]

char c;

while ((c = getc()) != EOF)
{
/* infinite loop */
}

[/bad code]

I suspect that the int -> char warnings are there to prevent things
like this.
A good move, but you shouldn't change code to silence one
compiler's warnings unecessarily. Different compilers will
issue warnings for different reasons and two different
compilers can even issue warnings for opposing reasons.


The simpler option is to acknowledge that no action is
required as a consequence of the warning.

It's easy to fall into the belief that the absense of
warnings is a strong measure of correctness. 'Clean'
compiles give a sense of confidence. But it's a small
step away from introducing bugs, just to silence a
compiler.

I *never* fall into that belief ;-) What I do assume is that code
that compiles *with* warnings is likely *not* correct. I routinely
compile with at least 4 different compilers on 4 different platforms
(2 of them big-endian). I expect my code to compile without warnings
on all of them (and to be correct on all of them ;-) Yes, that
sometimes means adding a cast for a "picky" compiler. It also
sometimes means changing the code to something simpler, clearer and
better. But if I don't fix the code to silence the warnings, even if
they don't signal a real problem, I'll continue to get the warnings
and waste time looking at things I've already thought about, tested
and fixed. I don't do this in a calavier manner, but when I rebuild
a 50 file project, I need to be able to *see* it builds warning free.
Depending on how you use them, input routines often read
and store bytes, not (plain) chars. On an sm machine
interpreting an input byte as a char representation and
converting it to an unsigned char can potentially yield
a different character code to the original for some
characters outside the basic character set.

It's a highly unlikely scenario, and it's dismissed with
a little handwaving about QoI guaranteeing that 1c and sm
machines will always make plain char unsigned.

OK, thanks for the warning ;-) I've never had to code for a platform
that's 1s complement or sign-magnitude, but if I did, I imagine I'd
have more to worry about that int -> char casts. I know of at least
one piece of code I have that explicitly assumes 2s complement, and
I'm sure *none* of my networking code would work!

Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top