integer overflow in atoi

B

bartc

Ben Pfaff said:
happy said:
int atoi(char str[])
{
int i, num;
for( num = 0, i = 0 ; isdigit(str) ; i++)
num = num * 10 + str - '0' ;
return num;
}

Here I wnat to know is there any way to tell user if num goes out of
range of int ?


if (num > INT_MAX / 10) {
...handle overflow...


Let's say INT_MAX is 2147483647 for example. So you're doing:

if (num>214748364) ... overflow.

However this might not detect overflows for inputs of "2147483648" and
"2147483649" (assuming you're doing the test with at least one more digit in
hand).
 
B

Ben Pfaff

bartc said:
Ben Pfaff said:
happy said:
int atoi(char str[])
{
int i, num;
for( num = 0, i = 0 ; isdigit(str) ; i++)
num = num * 10 + str - '0' ;
return num;
}

Here I wnat to know is there any way to tell user if num goes out of
range of int ?


if (num > INT_MAX / 10) {
...handle overflow...


Let's say INT_MAX is 2147483647 for example. So you're doing:

if (num>214748364) ... overflow.

However this might not detect overflows for inputs of "2147483648" and
"2147483649" (assuming you're doing the test with at least one more digit in
hand).


Telling whether you can add a digit between 0 and 9 is the easy
and obvious part :)
 
E

Eric Sosman

Jens Thoms Toerring said:
[...]
char im[ ( CHAR_BIT * sizeof( int ) ) / 3 + 1 ];
sprintf( im, "%d\n", INT_MAX );
if ( strlen( str ) > strlen( im ) || strcmp( str, im ) > 0 )
puts( "overflow" );

I use a similar technique in a real application. But I only do the
strcmp() test when both strings are the same length, otherwise it goes
wrong I think.

So if im[] holds "2147483647" and str is "11111111111111111"
you'll conclude that all's well?
 
B

bartc

Eric Sosman said:
Jens Thoms Toerring said:
[...]
char im[ ( CHAR_BIT * sizeof( int ) ) / 3 + 1 ];
sprintf( im, "%d\n", INT_MAX );
if ( strlen( str ) > strlen( im ) || strcmp( str, im ) > 0 )
puts( "overflow" );

I use a similar technique in a real application. But I only do the
strcmp() test when both strings are the same length, otherwise it goes
wrong I think.

So if im[] holds "2147483647" and str is "11111111111111111"
you'll conclude that all's well?

I didn't say I'd leave out the strlen(str)>strlen(im) part. Only that I
would then test for strlen(str)==strlen(im) before calling strcmp().
 
J

Jens Thoms Toerring

Eric Sosman said:
Jens Thoms Toerring said:
[...]
char im[ ( CHAR_BIT * sizeof( int ) ) / 3 + 1 ];
sprintf( im, "%d\n", INT_MAX );
if ( strlen( str ) > strlen( im ) || strcmp( str, im ) > 0 )
puts( "overflow" );

I use a similar technique in a real application. But I only do the
strcmp() test when both strings are the same length, otherwise it goes
wrong I think.
So if im[] holds "2147483647" and str is "11111111111111111"
you'll conclude that all's well?

I guess that bartc meant that when the length isn't equal you
actually don't have to do anymore checks since then the question
is already answered (but then, as I just realized, there is ano-
ther point that needs to be taken care about: what about strings
that start with one or more '0'?). Just the concern about the string
being shorter than a "stringified" version of INT_MAX is something I
don't get since strcmp() will return -1 if 'str' is shorter than 'im'.

Regards, Jens
 
B

bartc

Jens Thoms Toerring said:
Eric Sosman said:
[...]
char im[ ( CHAR_BIT * sizeof( int ) ) / 3 + 1 ];
sprintf( im, "%d\n", INT_MAX );
if ( strlen( str ) > strlen( im ) || strcmp( str, im ) > 0 )
puts( "overflow" );

I use a similar technique in a real application. But I only do the
strcmp() test when both strings are the same length, otherwise it goes
wrong I think.
So if im[] holds "2147483647" and str is "11111111111111111"
you'll conclude that all's well?

I guess that bartc meant that when the length isn't equal you
actually don't have to do anymore checks since then the question
is already answered (but then, as I just realized, there is ano-
ther point that needs to be taken care about: what about strings
that start with one or more '0'?). Just the concern about the string
being shorter than a "stringified" version of INT_MAX is something I
don't get since strcmp() will return -1 if 'str' is shorter than 'im'.

My strcmp() behaves differently from yours then:

printf("%d\n", strcmp("9","214748647"));

outputs 1 on my machine.
 
P

Phil Carmody

pete said:
Phil said:
pete said:
christian.bau wrote:
The result according to the C Standard:

1. Ignore leading white space, convert the optional sign ('+' or '-')
plus all digits to a number.
2. If the result does not fit into the range from LONG_MIN to LONG_MAX
then replace it with LONG_MIN or LONG_MAX.
You just made that up.

No he didn't. He plucked it mostly from the standard. OK, from the
bit of the standard that describes the behaviour of strtol rather
than ato[il].

Yes, the bit of the standard that describes
the behavior of strtol which is different from atoi.

Erm, I think that's what I said. Is you a bit fick?
 
G

gil_johnson

[...]
Here I wnat to know is there any way to tell user if num goes out of
range of int ?
[...]

Richard Heathfield wrote:
The maximum value str can have (within the loop) is '9', so
str - '0' is at most 9. Therefore, the result of num * 10 can't
exceed INT_MAX - 9, so num mustn't exceed (INT_MAX - 9) / 10.

Willem wrote:
And what if a user enters a number that's (almost) exactly INT_MAX?
You're trading simplicity for a slightly shorter range.

Jens wrote:
You could check if num is larger than INT_MAX/10 before you multiply
it by 10. If that isn't the case and num has been multiplied by 10
you then have to check if INT_MAX minus the new digit is not smaller
than 'num'.
----------------------------------------------------------------------

Putting the ideas above together, and noting ((INT_MAX/10)-1) gives
the same result as (INT_MAX-9)/10, you can get the complete range
with:

If num <= ((INT_MAX/10)-1) there can't be an overflow. For 32 bit
signed int this is 214,748,363 and the maximum result is 2,147,483,639

Else if num = (INT_MAX/10) = 214,748,364 and (str-'0') < 8 you are
OK - this gives numbers from 2,147,483,640 to 2,147,483,647

Else overflow is certain if you continue.
----------------------------------------------------------------------

Note:
As long as the bitlength of the integer is a power of 2, the magic
number 8 in the 'Else if' line works for signed integers:
2^7 = 128 (bitlength = 8, INT_MAX = 127)
2^15 = 32768 (bitlength = 16, etc.)
2^31 = 2147483648
2^63 = 9223372036854775808
All numbers = 2^((2^n)-1) where n > 1 end in '8' (See Note)

If you go to an unsigned int, all numbers = 2^(2^n) where n > 1
end in '6', and you need to change INT_MAX to UNSIGNED_INT_MAX (I'm
guessing about that, it might be U_INT_MAX or whatever) and change
'8' to '6' in the 'Else If' line.
2^8 = 256 (UNSIGNED_INT_MAX = 255, bitlength = 8)
2^16 = 65536 (etc.)
2^32 = 4294967296
2^64 = 18446744073709551616

If you aren't trusting enough to just believe my statements about
2^(2^n) and 2^((2^n)-1), consider that the numbers 2^(2^n) are
generated by squaring the previous number in the series. The
square of any number in '6' also ends in '6':

x = 10*y + 6 (x, y integers)
x^2 = 100y^2 + 6*2*10y + 36
Since (100y^2 + 6*2*10y) is a multiple of 10, ending in '0', the sum
with 36 ends in '6'.

Dividing the numbers 2^(2^n) by 2 gives the second series 2^((2^n)-1),
and the result of dividing a number ending in '6' can only end in '3'
or '8'. Since 3 is odd and the powers of 2 are even (VERY even) the
result must end in '8'.

Finally, if you subtract 9 from a number ending in '0' through '8'
you need to borrow from the second integer from the right. Integer
division by 10 then gives you the same result as (number/10)-1,
which does the 'borrow' after the division.

I hope this is clear and correct, I've been looking at it too long.

Gil
 
F

Flash Gordon

Keith said:
jacob navia said:
happy a écrit :
I want to implement atoi function which converts string to an integer.
So I did this :

int atoi(char str[])
{
int i, num;
for( num = 0, i = 0 ; isdigit(str) ; i++)
num = num * 10 + str - '0' ;
return num;
}

Here I wnat to know is there any way to tell user if num goes out of
range of int ?
I mean we can't check num after overflow as it will be UB so is there
a way to check if str[] contains large integer ?

Use a double for comparing num*10 against INT_MAX:

double d = num;
if (d*10 > INT_MAX)
etc


That can fail if double has fewer value bits than int does (e.g., if
int and double are both 64 bits).

Aside from that, I'd write the above as:

if (num * 10.0 > INT_MAX) /* ... */


For using double I would do the same thing. However, if writing the
function for real I would simple pre-calculate INT_MAX/10 and INT_MAX%10
so I can do a couple of quick integer comparisons...
if ((num > 0 &&
num < IMAX_DIV_10 ||
num == IMAX_DIV_10 && dig <= IMAX_MOD_10) ||
(num < 0 &&
num > IMIN_DIV_10 ||
num == IMIN_DIV_10 && dig <= IMIN_MOD_10))
/* all OK */

Obviously on pre-calculating you need to ensure you use round towards 0,
where I believe C90 allows round towards -inf, so you need a little
care. It should also allow the common case to fall out fast (if you need
the efficiency), and depending on how you handle negative numbers in
your atoi implementation you could move the sign check else where.
 
J

Jens Thoms Toerring

bartc said:
Jens Thoms Toerring said:
Eric Sosman said:
On 1/21/2010 6:05 PM, bartc wrote:
[...]
char im[ ( CHAR_BIT * sizeof( int ) ) / 3 + 1 ];
sprintf( im, "%d\n", INT_MAX );
if ( strlen( str ) > strlen( im ) || strcmp( str, im ) > 0 )
puts( "overflow" );

I use a similar technique in a real application. But I only do the
strcmp() test when both strings are the same length, otherwise it goes
wrong I think.
So if im[] holds "2147483647" and str is "11111111111111111"
you'll conclude that all's well?

I guess that bartc meant that when the length isn't equal you
actually don't have to do anymore checks since then the question
is already answered (but then, as I just realized, there is ano-
ther point that needs to be taken care about: what about strings
that start with one or more '0'?). Just the concern about the string
being shorter than a "stringified" version of INT_MAX is something I
don't get since strcmp() will return -1 if 'str' is shorter than 'im'.
My strcmp() behaves differently from yours then:
printf("%d\n", strcmp("9","214748647"));
outputs 1 on my machine.

Ooops, no, your strcmp() doesn't work differently from mine;-)
- I just considered cases like strcmp("1", "123") only, stupid
me.... So, yes, a correct version will need to compare strings
only of they have exactly the same lengths (and avoiding an un-
necessary strcmp() is better anyway when the result can already
be derived from the lengths).
Regards, Jens
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top