Newbie - itoa implementation

K

Keith Thompson

Frederick Gotham said:
Andrew Poelstra posted:

This does *not* give you the amount of value representation bits (exclusive
of the sign bit). See my explanation elsethread.

It does if there are no padding bits. It doesn't if there are padding
bits. If you had written "This does not *necessarily* give you ...",
you would have been correct; in fact, it *does* give you the number of
value representation bits on many systems.

See also the explanation elsethread that, though your point is
correct, it's not entirely relevant. If you allocate a string big
enough to hold the representation of any possible int value *assuming
it has no padding bits*, it will also be big enough to hold any
possible int value even if it does have padding bits. It may not be
the *minimal* length, but that's not a real problem unless you're
*very* short on space.
 
K

Keith Thompson

Snis Pilbor said:
Boy am I glad I don't work under you, since you seem to think you're
the sole arbiter of good design.

Yes, MQ is the sole arbiter of good design. So am I. So are you.
:cool:}

MQ makes a valid point; there are definite drawbacks to that
particular design (returning a pointer to a static buffer). Note that
there are functions in the C standard library that do exactly this,
such asctime() and ctime() -- which means you have to be very careful
in how you use them.

There's a big difference between expression an opinion about something
and claiming that your own opinion is the ultimate truth. MQ merely
did the former; you're reacting as if he'd done the latter.

[...]
If I wanted a language which was always like "Hey! That's, um...
questionable, because you might not know what you're doing... so um,
it's forbidden!" then I would switch to Java.

Nobody suggested forbidding anything.
Also the fact you suggested returning a pointer to static memory might
be undefined behavior, is extremely misleading and probably adds new
lots of new confusion to anyone who hasn't been doing C long enough to
immediately see it as the nonsense it is.

It was a mistake. Mistakes can easily be corrected by posting the
correct information (in this case, returning a pointer to static data
doesn't invoke undefined behavior, though it can lead to it, or to
other bad outcomes, if you're not careful).

Another approach I've used in the past is to declare a static array of
strings within a function (actually a 2-dimensional array of char),
and a static integer index that specifies which one to use next. The
function, which returns a char*, initializes the string indicated by
the current index and returns a pointer to it while incrementing the
index for next time, resetting it to 0 when it goes past the end of
the array. With, say, half a dozen or so such rotating buffers, you
can do things like:

printf("%s %s %s %s\n", func(1), func(2), func(3), func(4));

without requiring the caller to worry about where to store the result.
It's not a perfectly clean solution, but it can be useful sometimes
(and, of course, you *still* have to be careful not to retain more
than N results).
 
F

Frederick Gotham

Keith Thompson posted:
It does if there are no padding bits. It doesn't if there are padding
bits. If you had written "This does not *necessarily* give you ...",
you would have been correct; in fact, it *does* give you the number of
value representation bits on many systems.


In the context of comp.lang.c, the "necessarily" is redundant. A commonly
heard phrase here is:

"The code invokes undefined behaviour."

Note how we don't say:

"The code might invoke undefined behaviour on some systems."

In the context of comp.lang.c, we deal with code which is not ill-formed,
which doesn't invoke undefined behaviour, and which behaves as intended on
every implementation. If we start saying "necessarily" and "might invoke
undefined behaviour", then we may aswell start recommending things that
work on most systems:

int SomeFunc(void); /* Defined elsewhere */

int main(void)
{
int const i = SomeFunc();

int *p = (int*)&i;

*p = 5;
}

By this rationale, I re-iterate:

"This does *not* give you the amount of value representation bits."

See also the explanation elsethread that, though your point is
correct, it's not entirely relevant. If you allocate a string big
enough to hold the representation of any possible int value *assuming
it has no padding bits*, it will also be big enough to hold any
possible int value even if it does have padding bits. It may not be
the *minimal* length, but that's not a real problem unless you're
*very* short on space.


Yes I acknowledge this. Notwithstanding the snippet which was under
discussion, I corrected the misinformation pertaining to the calculation of
the quantity of value representation bits.
 
M

Michael Mair

Barry said:
Just as a learning exercise, I've come up with an implementation of
itoa() without using something like sprintf. However, I can't figure
out how to extend it to deal with negative numbers - any help
appreciated!

char * itoa(int a)
{
#define BUFFSIZE 50

static char str[BUFFSIZE];

int i = sizeof(str) - 1;

int neg = 0;
if (a < 0) {neg = 1; a = -a;}

This may result in a signed overflow for INT_MIN
on 2's complement machines which in turn yields wrong values.
Consider instead
unsigned int absA;
if (a>=0)
absA = a;
else
absA = 1U - (1U + a);
and going on with absA.

@OP: Barry probably knows this but you may wonder what is so
different about that...
1U + a
always yields a result between 0 and 2^n-1, where n is the
number of value bits for unsigned int, i.e. for a < -1, we have
1U + a == UINT_MAX + (1U + a) + 1U
and, consequently,
1U - (1U + a) == UINT_MAX + (1U - (UINT_MAX + (1U + a) + 1U)) + 1U
which indeed yields the value of -a but is computed with
well-defined overflow semantics.
If you have INT_MIN = -INT_MAX - 1 which is quite likely for a
modern PC, then -INT_MIN cannot be represented as int, i.e. cannot
be stored in "a". Depending on the kind of signed integer
overflow semantics of your implementation (there need not be any
as this is undefined behaviour in C's book), a's value may be
- unchanged (pretend unsigned arithmetics and use the representation
as signed int)
- INT_MAX (saturation)
- whatever your implementation thinks is a good idea for the value

or an exception may be raised or sudden death of your computer
may ensue or ...
do
{
str[--i] = '0' + a % 10;
}
while ((a = a / 10) && i >= 0);

if (neg) str[--i] = '-';
return &str;
}


<snip>

Cheers
Michael
 
M

Michael Mair

A standard way of dealing with integer conversion
so negative numbers work is like so:

negative = 0;
if(a<0) negative = 1;
else a = -a;

... now print digits for a, a always <= 0 ...

if(negative) add the leading '-'

The reason for always using negatives is to avoid
problems dealing with a most negative number, which
often can't be made positive.

For printing the digits of a negative number, the
technique of dividing the number by 10 and using
the remainder to calculate the next digit still
works, except it has to be adapted because the
remainders can be negative. Probably you can
figure out how to do that if you play around with
it.

Note: C90 in principle allows "round down"-behaviour
for integer division yielding negative results
(i.e. division by a power of two always behaves like
an arithmetic right shift and -1/anythingPositive == -1).
An initial division of the negative number "a" by -10
and going on in the (now safe) positive range is another
feasible way.


Cheers
Michael
 
K

Keith Thompson

Frederick Gotham said:
Keith Thompson posted:

In the context of comp.lang.c, the "necessarily" is redundant.

No, it really isn't.
A commonly heard phrase here is:

"The code invokes undefined behaviour."

Note how we don't say:

"The code might invoke undefined behaviour on some systems."

Right, because "undefined behavior" already covers all possibilities.

The given expression gives you the number of value representation bits
on some systems. On others, it doesn't.
 
C

Clark S. Cox III

Keith Thompson posted:



In the context of comp.lang.c, the "necessarily" is redundant.
A commonly heard phrase here is:

"The code invokes undefined behaviour."

Note how we don't say:

"The code might invoke undefined behaviour on some systems."

Sometimes, we do; there are situations that are undefined on some
systems and not on others. However, it is usually cut and
In the context of comp.lang.c, we deal with code which is not
ill-formed, which doesn't invoke undefined behaviour, and which behaves
as intended on every implementation. If we start saying "necessarily"
and "might invoke undefined behaviour", then we may aswell start
recommending things that work on most systems.

int SomeFunc(void); /* Defined elsewhere */

int main(void)
{
int const i = SomeFunc();

int *p = (int*)&i;

*p = 5;
}

This always invokes undefined behavior; there is no grey area.
By this rationale, I re-iterate:

"This does *not* give you the amount of value representation bits."

It does in some circumstances, it doesn't in others. With that
unqualified "not", your statement is false; plain and simple.
 
E

ena8t8si

Michael said:
Note: C90 in principle allows "round down"-behaviour
for integer division yielding negative results
(i.e. division by a power of two always behaves like
an arithmetic right shift and -1/anythingPositive == -1).

Right! Part of the fun is writing code that
works no matter which way the rounding happens.
An initial division of the negative number "a" by -10
and going on in the (now safe) positive range is another
feasible way.

That's a nice idea. Still has to deal with two
possible rounding directions though for that
first division.
 
P

pete

Frederick Gotham wrote:
I was merely correcting a false assumption.
The following does not yield the
amount of value representation bits in an integer type:

CHAR_BIT * sizeof(IntType)

But, the false assumption was yours.

CHAR_BIT * sizeof(IntType)
is the maximum amount of representation bits possible,
and that's what it's supposed to be.
 
F

Frederick Gotham

pete posted:
But, the false assumption was yours.


I assumed that an accurate figure was sought, and that memory was not to be
wasted for no good reason.

CHAR_BIT * sizeof(IntType)
is the maximum amount of representation bits possible,
and that's what it's supposed to be.


IMAX_BITS gives an exact figure.
 
P

pete

Frederick said:
pete posted:


I assumed that an accurate figure was sought,
and that memory was not to be
wasted for no good reason.
IMAX_BITS gives an exact figure.

A wise man once said,
"As long as that just wastes a bit of memory when it's wrong,
that's my preference too.
IMAX_BITS is a fun little hack,
but not exactly readable."
 
M

MQ

Snis said:
Boy am I glad I don't work under you, since you seem to think you're
the sole arbiter of good design.

No, try rereading my post. I expressed an opinion. I did not issue an
edict.
Also the fact you suggested returning a pointer to static memory might
be undefined behavior, is extremely misleading and probably adds new
lots of new confusion to anyone who hasn't been doing C long enough to
immediately see it as the nonsense it is.

Again, you need to take care in reading posts. I did not declare it as
undefined behaviour, I *quiet clearly* stated that I had not
investigated it further and that it *might* be undefined behaviour.

MQ
 
F

Frederick Gotham

pete posted:
IMAX_BITS is a fun little hack, but not exactly readable."

Agreed, it's not very readable. However, we don't necessarily have to read it
-- there's nothing wrong with hiding it behind a curtain:

#include <imax_bits.h>
 
F

Frederick Gotham

pete posted:
A wise man once said,
"As long as that just wastes a bit of memory when it's wrong,
that's my preference too.


I myself always prefer execution speed over memory consumption (to a point),
but in this case, we don't suffer any reduction in execution speed, and so
there's no reason not to minimise memory usage.

If anything, it's fun to make an algorithm as efficient as possible.
 
M

Michael Mair

Right! Part of the fun is writing code that
works no matter which way the rounding happens.
*g*


That's a nice idea. Still has to deal with two
possible rounding directions though for that
first division.

This is true -- one can of course get the rounding
direction from the corresponding % operation.


Cheers
Michael
 
G

gareth.p.davies

Hi.

Just as a learning exercise, I've come up with an implementation of
itoa() without using something like sprintf. However, I can't figure
out how to extend it to deal with negative numbers - any help
appreciated!

<snip>

Hi - thanks for all the comments and explanations!

I now have this as my itoa function ...

char * itoa(int a)
{
static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];

int i = sizeof(str) - 1;

if(a < 0)
{
printf("-");

a = -a;
}

do
{
str[--i] = '0' + a % 10;
}
while ((a = a / 10) && i >= 0);

return &str;
}

I also have a version with ...

if(a < 0)
{
printf("-");

a = (1U - (1U + a));
}

Following Michael Mair's suggestion for a modification.
This may result in a signed overflow for INT_MIN on 2's complement machines which in
turn yields wrong values.

However - I don't understand the comment - can anyone explain this
please?

MQ wrote
You can't pass a pointer to a static char array. It will not work how
you expect it to work. If you make three calls to the function

Eg.
char * a = itoa(6);
char * b = itoa(7);
char * c = itoa(8);

<snip>

Yes, I agree that this is not so 'normal', but I decided to go this way
having pretty much come up with the same arguments/comments as Snis
Pilbor did.

Interestingly (I think), I had a look on Krugle (www.krugle.com) for
other itoa implementations, and there seem to be a lot that rely on the
caller freeing a buffer that's been malloc'ed. A further refined
search (itoa.c malloc) shows this is pretty popular and I was wondering
whether this approach (using malloc) was thought of as 'more risky'
than using a static internal buffer?

Also, I haven't yet found a Krugle example that uses Michael Mairs'
suggestion, i.e., most use the if(a < 0) a = -a; manipulation - so, I
was wondering about their robustness (the source found via Krugle)?

Cheers,

gareth
 
M

Michael Mair

>>Just as a learning exercise, I've come up with an implementation of
>>itoa() without using something like sprintf. However, I can't figure
>>out how to extend it to deal with negative numbers - any help
>>appreciated!
>
> <snip>
>
> Hi - thanks for all the comments and explanations!
>
> I now have this as my itoa function ...
>
> char * itoa(int a)
> {
> static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];
>
> int i = sizeof(str) - 1;

int neg = 0;
>
> if(a < 0)
> {
> printf("-");

Instead:
neg = 1;
>
> a = -a;
> }
>
> do
> {
> str[--i] = '0' + a % 10;
> }
> while ((a = a / 10) && i >= 0);

...... i >= neg);

if (neg) {
str[--i] = '-';
}
>
> return &str;
> }
>
> I also have a version with ...
>
> if(a < 0)
> {
> printf("-");
>
> a = (1U - (1U + a));
> }
>
> Following Michael Mair's suggestion for a modification.


Note that I suggested
int neg = (a < 0);
unsigned int abs_a;
if(a < 0)
abs_a = (1U - (1U + a));
else
abs_a = a;
and going with abs_a instead of a.

Test the difference with
puts(itoa(INT_MIN));

(Note: This invokes undefined behaviour for "your" version. For me,
the output is
-./,),(-*,(
and
-2147483648
Guess which is which ;-))
>
> However - I don't understand the comment - can anyone explain this
> please?

Please make clear whom you are quoting -- you switched from quoting
yourself to quoting me.

There are essentially three systems of representing signed integers
in C99 (C90 intended the same but got it wrong due to the wording):
- sign+magnitude
- one's complement
- two's complement

The former two have symmetric ranges (<type>_MIN = -<type>_MAX)
whereas two's complement has asymmetric ranges (<type>_MIN =
-<type>_MAX - 1).
This means that you cannot represent -<type>_MIN in type <type>.
What happens is undefined by the C standard. I enumerated some
possible outcomes (a = -a yielding the same value as a = a,
a becoming INT_MAX, some sort of overflow exception, ...).

A recent thread about the relative merits of the systems starts at
<[email protected]>
I suggest reading <[email protected]> and
<[email protected]>.
There are, of course, sources outside comp.lang.c which may explain
that better :)

>
> Interestingly (I think), I had a look on Krugle (www.krugle.com) for
> other itoa implementations, and there seem to be a lot that rely on
> the caller freeing a buffer that's been malloc'ed. A further refined
> search (itoa.c malloc) shows this is pretty popular and I was
> wondering whether this approach (using malloc) was thought of as
> 'more risky' than using a static internal buffer?
>
> Also, I haven't yet found a Krugle example that uses Michael Mairs'
> suggestion, i.e., most use the if(a < 0) a = -a; manipulation - so, I
> was wondering about their robustness (the source found via Krugle)?

Note that all this ado is about one case out of INT_MAX-INT_MIN+1.
Many people plainly just do not think about or bother with corner
cases. Writing itoa() is a classic beginners' exercise, so the code
you can find often is on exactly that level.


Cheers
Michael
 
E

ena8t8si

Hi.

Just as a learning exercise, I've come up with an implementation of
itoa() without using something like sprintf. However, I can't figure
out how to extend it to deal with negative numbers - any help
appreciated!

<snip>

Hi - thanks for all the comments and explanations!

I now have this as my itoa function ...

char * itoa(int a)
{
static char str[(size_t)((sizeof(int) * CHAR_BIT - 1) / 3.3) + 3];

int i = sizeof(str) - 1;

if(a < 0)
{
printf("-");

a = -a;
}

do
{
str[--i] = '0' + a % 10;
}
while ((a = a / 10) && i >= 0);

return &str;
}

I also have a version with ...

if(a < 0)
{
printf("-");

a = (1U - (1U + a));
}

Following Michael Mair's suggestion for a modification.


Both this technique and the simpler one above
can fail for INT_MIN on a twos complement representation.
However - I don't understand the comment - can anyone explain this
please?

Most 2's complement implementations, which is to say most
machines, will turn the negative of INT_MIN (the most negative
value of int) back into itself. So a won't be >= 0 for one
input value of a (on those implementations).
...
Also, I haven't yet found a Krugle example that uses Michael Mairs'
suggestion, i.e., most use the if(a < 0) a = -a; manipulation - so, I
was wondering about their robustness (the source found via Krugle)?

Fails for INT_MIN on most machines.
 
E

ena8t8si

Michael said:
Note that I suggested
int neg = (a < 0);
unsigned int abs_a;
if(a < 0)
abs_a = (1U - (1U + a));
else
abs_a = a;
and going with abs_a instead of a.

This approach still fails on implementations where
unsigned int has only as many value bits as int
(that is, doesn't have a value bit for the sign bit
of int).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,217
Latest member
IRMNikole

Latest Threads

Top