Newbie - itoa implementation redux

G

gareth.p.davies

My latest version below.

#include <stdio.h>
#include <limits.h>

char * itoa(int a)
{
static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];

int i = sizeof(str) - 1;

unsigned int abs_a = a;

int neg = 0;

if(a < 0)
{
neg = 1;

abs_a = -a ; //(1U - (1U + a));
}

do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i >= 0);

if(neg)
{
str[--i] = '-';
}

return &str;
}



int main(void)
{
char buffer[50];

sprintf(buffer, "%d", INT_MIN);
printf("sprintf 'says' %s\n", buffer);

sprintf(buffer, "%d", INT_MAX);
printf("sprintf 'says' %s\n", buffer);

puts(itoa(INT_MIN));
puts(itoa(INT_MAX));
puts(itoa(12));
puts(itoa(1));
puts(itoa(123));

return 0;
}


As I'm now using Michael's abs_a idea, I believe that I no longer need:

abs_a = (1U - (1U + a));

and that

abs_a = -a ;

is ok?

How does my function look now please? Any improvment ideas, or ideas
on how to reduce its size?

Thanks to all those that have been so willing to provide help.

gareth
 
A

Andrew Poelstra

My latest version below.

#include <stdio.h>
#include <limits.h>

char * itoa(int a)
{
static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];

You probably want to put a comment there explaining this line; it's hardly
obvious to someone who hasn't been following this thread. Also, I'm not
sure if the static qualifier here is absolutely necessary. (If you're
returning a pointer to this data, you'd probably be better off using
malloc() and returning the value of that instead.)
int i = sizeof(str) - 1;

The parentheses aren't necessary here, nor is the -1. i should be a size_t
instead of an int.
unsigned int abs_a = a;

This line doesn't do what you think it does. Try:
unsigned abs_a = (a < 0) ? -a : a;
int neg = 0;

if(a < 0)
{
neg = 1;

abs_a = -a ; //(1U - (1U + a));
}

This whole section could be replaced with:
int neg = (a < 0);
and placed above the previous line, which would become:
unsigned abs_a = neg ? -a : a;

This is much more compact, and IMHO, clearer.
do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i >= 0);

Thanks to the way str[] is defined, you don't need to check that i is still
above 0:

str[--i] = 0;
while (abs_a)
{
str[--i] = '0' + abs_a % 10;
abs_a /= 10;
}
if(neg)
{
str[--i] = '-';
}

return &str;
}


That's all good, although once again I caution against using a static buffer
in this function. The way you use it here, though, it is actually a good
design decision.
int main(void)
{
char buffer[50];

sprintf(buffer, "%d", INT_MIN);
printf("sprintf 'says' %s\n", buffer);

sprintf(buffer, "%d", INT_MAX);
printf("sprintf 'says' %s\n", buffer);

puts(itoa(INT_MIN));
puts(itoa(INT_MAX));
puts(itoa(12));
puts(itoa(1));
puts(itoa(123));

return 0;
}
 
M

Michael Mair

My latest version below.

#include <stdio.h>
#include <limits.h>

char * itoa(int a)
{
if(a < 0)
{
neg = 1;

abs_a = -a ; //(1U - (1U + a));
}
As I'm now using Michael's abs_a idea, I believe that I no longer need:

abs_a = (1U - (1U + a));

and that

abs_a = -a ;

is ok?

No. Now your function works on your current system for your current
compiler version.
Conversions in C are hard to understand in the beginning.

Reread the answers in the original thread after this explanation:
- signed overflow leads to undefined behaviour
- unary minus does not change the type of the operand
- this makes the above equivalent to
int temp = -a;
unsigned int abs_a = temp;
- i.e., you already _had_ your signed overflow before the assignment
happened. On your platform, UINT_MAX + (overflow version of -INT_MIN)
happens to give you the actual absolute value of INT_MIN. Even though
this is no coincidence, you cannot rely on that if you want to be on
the safe side.
Imagine saturating semantics:
INT_MIN is in your case -INT_MAX - 1; as -INT_MIN cannot be
represented as int, your implementation takes the largest
possible int value, INT_MAX, instead. Then you convert this to
unsigned int -- the value does not change, thus you get itoa to
produce a string with a representation of -INT_MAX instead of
-INT_MAX - 1...
- 1U-(1U+a) calculates 1U + a already in unsigned int (the a is
converted), i.e. no overflow happens. All intermediate results
are of type unsigned int.
- instead of '1U-(1U+a)' you could also use '1U-(1+a)' -- why?
another alternative is '-(unsigned int)a' -- why?

How does my function look now please?

There is one thing you changed from my suggestion -- and we both
were wrong... ;-)
do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i >= 0);

I had an
.... && i >= neg);
if(neg)
{
str[--i] = '-';
}

So, what's the deal? If the loop terminates because of the
second condition, then i == -1. This means that during the
last iteration, you accessed str[-1] -- this is a
Very Bad Thing.
Now, the correct thing seems to be to write 'i > 0' as second
condition. Okay, so we terminate with i == 0. However, if
neg == 1 (neg can only take the values 1 and 0), then you once
again access str[-1].
If you write 'i > neg' instead, then you make sure that
you only can access str[neg] in the loop and return &str[0]
if the loop terminated due to the second condition.

Even though you chose str to be large enough to hold -INT_MIN,
it is a Good Idea to write the code in such a way that even
if you made a mistake in your choice of sizeof str you cannot
access the memory before (or after) str -- for example, someone
might copy your code and adapt it for 'long a' but forget to
correct the size of the buffer.
Any improvment ideas, or ideas on how to reduce its size?

*g* No. I have an idea how you can increase its size:
Extend your code in a way that you can represent "a" in
an arbitrary base representation between 2 and 36; use 'a'
through 'z' to stand in for digits representing 10 through
35...
Hint:
const char * const digits = "0123456789abcdefghijklmnopqrstuvwxyz";


Cheers
Michael
 
E

Eric Sosman

Andrew Poelstra wrote On 08/15/06 17:00,:
The parentheses aren't necessary here, nor is the -1. i should be a size_t
instead of an int.

The -1 *is* necessary. Read his code more carefully.
This line doesn't do what you think it does. Try:
unsigned abs_a = (a < 0) ? -a : a;

I think you don't know what he's thinking. Again, read
his code.
This whole section could be replaced with:
int neg = (a < 0);
and placed above the previous line, which would become:
unsigned abs_a = neg ? -a : a;

This is much more compact, and IMHO, clearer.

... and perpetuates the same bug. As in the original,
evaluating `-a' when `a < -INT_MAX' can have undesirable
consequences.
do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i >= 0);


Thanks to the way str[] is defined, you don't need to check that i is still
above 0:

str[--i] = 0;

This is only necessary in *your* version of the code;
it was not needed in the original.
while (abs_a)
{
str[--i] = '0' + abs_a % 10;
abs_a /= 10;
}

Um, er, what string should be returned if `a' is zero?
 
A

Andrew Poelstra

Andrew Poelstra wrote On 08/15/06 17:00,:

The -1 *is* necessary. Read his code more carefully.

-1 and --i? I would think that it'd be -1 /or/ --i, the alternative being
i++. My bad.
I think you don't know what he's thinking. Again, read
his code.

This is a useless assignment if a < 0; having read the code top-to-bottom
as posting as I went, it seemed like a bad idea. My bad again.
... and perpetuates the same bug. As in the original,
evaluating `-a' when `a < -INT_MAX' can have undesirable
consequences.

It will result in a being 0, correct? Is there any case where that would
not be true?

And more importantly, how would one fix this bug? Since there's no type
guaranteed to be wider than an unsigned int, you'd have to handle negative
numbers as they are... which makes me wonder why it is necessary to take
the absolute value at all!
do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i >= 0);


Thanks to the way str[] is defined, you don't need to check that i is still
above 0:

str[--i] = 0;

This is only necessary in *your* version of the code;
it was not needed in the original.

Really? Where in the original is str[] null-terminated?
while (abs_a)
{
str[--i] = '0' + abs_a % 10;
abs_a /= 10;
}

Um, er, what string should be returned if `a' is zero?

Now there's a bug I should have seen. Now I know why a do..while was
used instead of a while.
 
A

Andrew Poelstra

-1 and --i? I would think that it'd be -1 /or/ --i, the alternative being
i++. My bad.

I meant i--. The important aspect being that it's a postfix operator.
 
E

Eric Sosman

Andrew said:
Andrew Poelstra wrote On 08/15/06 17:00,:
[...]
This whole section could be replaced with:
int neg = (a < 0);
and placed above the previous line, which would become:
unsigned abs_a = neg ? -a : a;

This is much more compact, and IMHO, clearer.

... and perpetuates the same bug. As in the original,
evaluating `-a' when `a < -INT_MAX' can have undesirable
consequences.
It will result in a being 0, correct? Is there any case where that would
not be true?

It is possible that the result might be zero -- or forty-two,
or eleventy-one, or massive memory meltdown. Section 6.5, para 5:

"If an /exceptional condition/ occurs during the
evaluation of an expression (that is, if the result
is not mathematically defined or not in the range of
representable values for its type), the behavior is
undefined."

So: If `a' is less than `-INT_MAX', then `-a' is greater than
`INT_MAX', which means it's outside the range of representable
values of `int'. The fact that the result (whatever it might
be) is later converted to `unsigned int' is of no consequence;
the damage has already occurred.

This can't happen on signed-magnitude or ones'-complement
systems, but it does happen on two's-complement machines where
`INT_MIN' equals `-INT_MAX-1'. Most implementations simply
ignore the overflow and produce the low-order bits of the too-
wide result, after which conversion to `unsigned int' will give
the "right" answer (which is certainly not zero). This outcome
is not guaranteed, though, and I have used machines that trapped
on integer overflow (granted, I wasn't using C).
And more importantly, how would one fix this bug? Since there's no type
guaranteed to be wider than an unsigned int, you'd have to handle negative
numbers as they are... which makes me wonder why it is necessary to take
the absolute value at all!

It's not necessary. In "The Standard C Library" P.J. Plauger
shows how to do the conversion by using non-positive numbers. His
code is a bit cumbersome because of C89's wishy-washiness about
how negative values work with the division and modulus operators;
C99 has tightened this up and his code could now be simplified.

The O.P.'s code had a comment showing a method that will work
> abs_a = -a ; //(1U - (1U + a));

.... and I think that changing it to `1U - (1 + a)' would make it
completely bullet-proof. (The issue: UINT_MAX is required to be
at least as large as INT_MAX, but I don't think it's required to
be larger -- although it has been on every machine I've used.)
str[--i] = 0;

This is only necessary in *your* version of the code;
it was not needed in the original.

Really? Where in the original is str[] null-terminated?

The original str[] has static duration and no explicit
initializer, hence it's initialized to all-elements-zero
before the program starts operating.

My personal preference would be to do it your way, just
in case a caller overwrites the '\0' at the end of the returned
string, leaving a time bomb for subsequent callers. Better to
refresh the '\0' each time than to hope it never gets clobbered --
still, as the code stood the '\0' was already present.
 
G

gareth.p.davies

Michael said:
No. Now your function works on your current system for your current
compiler version.
Conversions in C are hard to understand in the beginning.

Reread the answers in the original thread after this explanation:
- signed overflow leads to undefined behaviour
- unary minus does not change the type of the operand
- this makes the above equivalent to
int temp = -a;
unsigned int abs_a = temp;
- i.e., you already _had_ your signed overflow before the assignment
happened. On your platform, UINT_MAX + (overflow version of -INT_MIN)
happens to give you the actual absolute value of INT_MIN. Even though
this is no coincidence, you cannot rely on that if you want to be on
the safe side.
Imagine saturating semantics:
INT_MIN is in your case -INT_MAX - 1; as -INT_MIN cannot be
represented as int, your implementation takes the largest
possible int value, INT_MAX, instead. Then you convert this to
unsigned int -- the value does not change, thus you get itoa to
produce a string with a representation of -INT_MAX instead of
-INT_MAX - 1...
- 1U-(1U+a) calculates 1U + a already in unsigned int (the a is
converted), i.e. no overflow happens. All intermediate results
are of type unsigned int.
- instead of '1U-(1U+a)' you could also use '1U-(1+a)' -- why?
another alternative is '-(unsigned int)a' -- why?

'1U-(1+a)'

Because on the left of the '-' we have an unsigned type, and so the rhs
expression/sub-expressions will be promoted before they're evaluated?

'-(unsigned int)a'

Because we're explicitly promoting a's type before it is evaluated for
the assignment?
How does my function look now please?

There is one thing you changed from my suggestion -- and we both
were wrong... ;-)
do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i >= 0);

I had an
.... && i >= neg);
if(neg)
{
str[--i] = '-';
}

So, what's the deal? If the loop terminates because of the
second condition, then i == -1. This means that during the
last iteration, you accessed str[-1] -- this is a
Very Bad Thing.
Now, the correct thing seems to be to write 'i > 0' as second
condition. Okay, so we terminate with i == 0. However, if
neg == 1 (neg can only take the values 1 and 0), then you once
again access str[-1].
If you write 'i > neg' instead, then you make sure that
you only can access str[neg] in the loop and return &str[0]
if the loop terminated due to the second condition.

Even though you chose str to be large enough to hold -INT_MIN,
it is a Good Idea to write the code in such a way that even
if you made a mistake in your choice of sizeof str you cannot
access the memory before (or after) str -- for example, someone
might copy your code and adapt it for 'long a' but forget to
correct the size of the buffer.
Any improvment ideas, or ideas on how to reduce its size?

*g* No. I have an idea how you can increase its size:
Extend your code in a way that you can represent "a" in
an arbitrary base representation between 2 and 36; use 'a'
through 'z' to stand in for digits representing 10 through
35...
Hint:
const char * const digits = "0123456789abcdefghijklmnopqrstuvwxyz";

Thanks for your continued input and explanations.

Here's the latest:

char * itoa(int a)
{
static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];

int i;

unsigned int abs_a = a;

str[i = sizeof str - 1] = '\0';

int neg = 0;

if(a < 0)
{
neg = 1;

abs_a = (1U - (1U + a));
}

do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10) && i > 0);

if(neg)
{
if(i)
{
str[--i] = '-';
}

else

{
fputs("Error in itoa - negative i", stderr);
}
}

return &str;
}

I took note of the comment made elsewhere about initialising the array
so that it's NULL terminated:

I first did this simply using -

int i = sizeof str - 1;

str = '\0';

But then combined the two things using -

str[i = sizeof str - 1] = '\0';

Is this just too ugly/unreadable - should I just use the simpler method
and go for the extra line?
From you comments, I think I now have this right:

while ((abs_a = abs_a / 10) && i > 0);

However, following on from what you said, in the following 'if(neg)', I
thought I should check that --i isn't going to hit -1 in the case where
the previous loop terminated via the second condition, i.e. that i was
0. If that were the case, then

if(neg)
{
str[--i] = '-';

Would be 'bad' (as you say).

However, I'm beginning to wonder whether I'm getting paranoid here - ok
the tests and logic seem fine, but I wonder in what circumstances the
value of i could actually hit 0, and then -1, i.e., what could be
passed into itoa, or what could go wrong internally that i could cause
an out-of-bounds? Leaving the 'i concern' out, this would simplify the
code to:

do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10)); // gcc like the extra () here.

if(neg)
{
str[--i] = '-';
}

But would it be as robust?

Continued thanks.

gareth
 
M

Michael Mair

<snip: itoa() implementation and the benefits of assigning
the int argument to an unsigned auxiliary variable via>
>
> '1U-(1+a)'
>
> Because on the left of the '-' we have an unsigned type, and so the
> rhs expression/sub-expressions will be promoted before they're
> evaluated?

Yes. The right hand side, 1+a, is "safe" from signed integer overflow
as "a" is negative and increased by an amoung that cannot make it
greater than INT_MAX.
> '-(unsigned int)a'
>
> Because we're explicitly promoting a's type before it is evaluated for
> the assignment?

Exactly.

A little bit about nomenclature (not in Standardese, so you may
see responses that tell you about the fine points I missed to tell
you about):
- The "integer promotions" are the mechanism of automatic conversion
from a value of a type with lower "conversion rank" than signed or
unsigned int to signed int (if signed int can represent the value)
or unsigned int (otherwise). This means that every calculation
takes place at least in signed or unsigned int "width".
- The above cast to unsigned int is an explicit conversion of a's
value to type unsigned int and not a promotion.
- The conversion of (1+a) to unsigned int is an implicit conversion
and one of the so-called "usual arithmetic conversions"; these
essentially tell you that whenever you have an operation expecting
numbers as operands in which way the effective types of the operands
are determined (i.e. which conversions will happen prior to
calculation):
+ At least one long double operand: All others will be converted
to long double; otherwise:
+ At least one double operand: All others will be converted
to the type in which double calculations take place; otherwise:
+ At least one float operand: All others will be converted
to the type in which float calculations take place; otherwise:
+ Apply the integer promotions; if
* both types are equal, use the promoted type
* both types are different unsigned integer types: Convert to
the one with higher conversion rank
* one is an unsigned integer type and has conversion rank >= the
other, then convert the other to this type
* ....
The last rule is the one that applies for us.


> Thanks for your continued input and explanations.
>
> Here's the latest:
>
> char * itoa(int a)
> {
> static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];
>
> int i;
>
> unsigned int abs_a = a;
>
> str[i = sizeof str - 1] = '\0';
>
> int neg = 0;
>
> if(a < 0)
> {
> neg = 1;
>
> abs_a = (1U - (1U + a));
> }
>
> do
> {
> str[--i] = '0' + abs_a % 10;
> }
> while ((abs_a = abs_a / 10) && i > 0);
>
> if(neg)
> {
> if(i)
> {
> str[--i] = '-';
> }
>
> else
>
> {
> fputs("Error in itoa - negative i", stderr);
> }
> }
>
> return &str;
> }
>
> I took note of the comment made elsewhere about initialising the array
> so that it's NULL terminated:


Another "wrong name":
'\0' can be called "null character" or string terminator
but should not be called NULL.
NULL is a macro defined in several standard C headers which
yields a so-called null pointer constant; it could be defined
as 0 or (void *)0 -- or even '\0' as '\0' is a character constant
with value 0.

> I first did this simply using -
>
> int i = sizeof str - 1;
>
> str = '\0';
>
> But then combined the two things using -
>
> str[i = sizeof str - 1] = '\0';
>
> Is this just too ugly/unreadable - should I just use the simpler
> method and go for the extra line?


If _you_ can read it without thinking twice about it, then it
is okay. However, if you want others to read your code, then I'd
suggest that you take the two line version.
C is a great language for doing ten different things in one line.
That does not mean that it is a good idea to go and do that...
It is easy enough to make mistakes in the harmless looking lines.
>>From you comments, I think I now have this right:
>
> while ((abs_a = abs_a / 10) && i > 0);
>
> However, following on from what you said, in the following 'if(neg)',
> I thought I should check that --i isn't going to hit -1 in the case
> where the previous loop terminated via the second condition, i.e.
> that i was 0. If that were the case, then
>
> if(neg)
> {
> str[--i] = '-';
>
> Would be 'bad' (as you say).
>
> However, I'm beginning to wonder whether I'm getting paranoid here -
> ok the tests and logic seem fine, but I wonder in what circumstances
> the value of i could actually hit 0, and then -1, i.e., what could be
> passed into itoa, or what could go wrong internally that i could cause
> an out-of-bounds? Leaving the 'i concern' out, this would simplify
> the code to:
>
> do
> {
> str[--i] = '0' + abs_a % 10;
> }
> while ((abs_a = abs_a / 10)); // gcc like the extra () here.
>
> if(neg)
> {
> str[--i] = '-';
> }
>
> But would it be as robust?

Essentially: Yes.
You explicitly made sure that sizeof str is large enough for
whatever value "a" can have.
There is one arch-nemesis of programming that can bite you in
this respect: Copy and paste.
If the size of str is not adapted, then you can be unlucky
in that the function still seems to work most of the time
as the out-of-array access does not "overwrite" anything
sufficiently critical -- but if it one day does, it is at
the worst possible moment.
If your itoa() is not critical for the execution time of a
programme, I'd stay with the "paranoia checks".
The more important part is being aware that it could go
wrong -- especially if the control flow between function
entry and exit becomes more complicated. I often start with
writing down paranoia checks and replacing them with comments
explaining why they are not necessary; then someone (maybe even
me after three years) trying to figure out why I "forgot" the
checks knows why I thought them unnecessary.

In your case:
do
{
str[--i] = '0' + abs_a % 10;
}
while ((abs_a = abs_a / 10));
/* Here: i >= 1 due to the size of str[] */

Some notes about your comment
// gcc like the extra () here.
- C99 style line comments are a common compiler extension for
C90 compilers but are not part of C90; this means that their
use can make it a little bit harder to port a programme to
a C90 compiler.
- Line comments are bad for posts on usenet as you may run
into unfortunate line breaks changing the compilability or
even the meaning of your posted code.
- If you use gcc and you are willing to learn standard C, then
invoke it (at least) with the
-std=c89 -pedantic -Wall -O
options for C90 or
-std=c99 -pedantic -Wall -O
for C99; gcc does not fully support the latter.
The warnings and errors you get from this sharpen your awareness
for when you are relying on C and when you are relying on
compiler extensions. Knowing the difference can be important
later on when you have to make do with standard C only.
- doing something to "shut up" the compiler is not necessarily a
good idea; in your case, the longer
do
{
str[--i] = '0' + abs_a % 10;
abs_a = abs_a / 10;
}
while (0 != abs_a);
can be optimized just as well by the compiler and may be easier
on the eyes of some readers. Another alternative is
do
{
str[--i] = '0' + abs_a % 10;
}
while (0 != (abs_a /= 10));
Both say explicitly what you want as well without catering to
a special compiler.


Cheers
Michael
 
E

ena8t8si

Here's the latest:

...

Please study the following:

const char * itoa( int a ) {
static char buffer[CHAR_BIT * sizeof a * 16ul / 53 + 3 ];
const int negative = a < 0;
char *p;
int r;

p = buffer + sizeof buffer, *--p = 0;

if (a==0) *--p = '0';

if (negative) {
r = a%-10, a /= -10;
if (r > 0) r -= 10, a -= 1;
*--p = '0' - r;
}

while (a) {
r = a%10, a /= 10;
*--p = '0' + r;
}

if (negative) *--p = '-';

return p;
}

Just a few remarks:

Defined and correct behavior for all inputs.

Using 'sizeof a' guarantees size of buffer
matches type of parameter.

The 'if (r > 0) ...' accounts for C90 division
possibly rounding the wrong way.

Extra blank lines don't always help readability;
put in blank lines where they help, leave them
out where they don't.

And a note of encouragement - keep trying, you're
making good progress.

(Thanks to Michael Mair for his suggestion to
divide a negative number by -10.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top