itoa in pure C

Discussion in 'C Programming' started by pete, Jan 25, 2004.

  1. pete

    pete Guest

    I wrote a version of itoa yesterday.

    Features:
    1 No implementation defined arithmetic. All of the division
    and modulus division is done on positive values only.
    2 No object is assumed capable of representing the
    magnitude of INT_MIN.
    3 The string is constructed in place without reversal.
    4 No features from standard library are used.

    sput_i writes the decimal representation of a nonnegative int value
    argument to an array of char.
    sput_ip1 writes the representation of one greater than the
    argument value, for any nonnegative int value argument, to an
    array of char.

    /* BEGIN itoa.c */

    void itoa(int, char *);
    char *sput_i(int, char *);
    char *sput_ip1(int, char *);

    void itoa(int integer, char *string)
    {
    if (0 > integer) {
    ++integer;
    *string++ = '-';
    *sput_ip1(-integer, string) = '\0';
    } else {
    *sput_i(integer, string) = '\0';
    }
    }

    char *sput_i(int integer, char *string)
    {
    if (integer / 10 != 0) {
    string = sput_i(integer / 10, string);
    }
    *string++ = (char)('0' + integer % 10);
    return string;
    }

    char *sput_ip1(int integer, char *string)
    {
    int digit;

    digit = (integer % 10 + 1) % 10;
    if (integer / 10 != 0) {
    string = (digit == 0 ? sput_ip1 : sput_i)(integer / 10, string);
    *string++ = (char)('0' + digit);
    } else {
    if (digit == 0) {
    *string++ = '1';
    }
    *string++ = (char)('0' + digit);
    }
    return string;
    }

    /* END itoa.c */
     
    pete, Jan 25, 2004
    #1
    1. Advertisements

  2. Why is that important? If you want _maximum_ compiler optimisation
    potential, then you should make sput_i take an unsigned argument as
    not all compilers will recognise that the supplied integer will
    always be non-negative.
    But you do assume that the magnitude of INT_MIN+1 (and others) _are_
    representable within an int. James Kuyper has argued in csc that this
    need not be true in C99.

    So, [-1000000..33000] would be a valid range for int!

    I would like the Committee to formally disallow this as it makes
    statements like i |= 1 potential UB even if i is positive. Also, it
    makes division of two arbitrarily signed integers highly problematic.
    Why is this important? [Have you profiled the cost of recursion
    against an iterative reversing of the string?]
    <snip>

    What are your thoughts on...?

    char *itoa(int i, char *s)
    {
    char *p = s;
    char *q = s;

    if (i >= 0)
    {
    do
    {
    *q++ = '0' + (i % 10);
    }
    while (i /= 10);
    }
    else if (-1 % 2 < 0)
    {
    *q++ = '-';
    p++;

    do
    {
    *q++ = '0' - i % 10;
    }
    while (i /= 10);
    }
    else
    {
    *q++ = '-';
    p++;

    do
    {
    int d = i % 10;
    i = i / 10;
    if (d) { i++; d = 10 - d; }
    *q++ = '0' + d;
    }
    while (i);
    }

    for (*q = 0; p < --q; p++)
    {
    char c = *p;
    *p = *q;
    *q = c;
    }

    return s;
    }
     
    Peter Nilsson, Jan 26, 2004
    #2
    1. Advertisements

  3. pete

    pete Guest

    I've seen posted versions with implementation defined behavior.
    It should be unsigned. I used int as a "nothing up my sleeves"
    gesture, to emphasize that the magnitude of INT_MIN
    wasn't be represented by an integer type.
    Well, ... I'm guessing he was arguing with other smart people too.
    It's a novelty feature.
    I don't think anybody was waiting for a new itoa.
    At first glance, it seems to be efficient and thorough.
    I'll take a closer look at it.
     
    pete, Jan 26, 2004
    #3
  4. pete

    Jack Klein Guest

    INT_MIN + 1 is certainly representable in a signed int. So is
    -(INT_MIN + 1). Either you have misunderstood James' arguments, or he
    is just plain wrong.
    No, it is not. INT_MAX must be some power of 2 minus 1. INT_MIN must
    either be -INT_MAX, or -INT_MAX - 1. There is no combination of value
    bits that is invalid in an unsigned integer type or in a positive
    signed integer type.

    There are only two possible combinations of sign and value bits that
    might be trap representations for signed integer types, and only one
    for each of the allowed types of representation. For signed magnitude
    and 2's complement, that is the bit combination of the sign bits 0 and
    all value bits 1. For 1's complement, it is the sign bit 1 and all
    value bits 1.

    Assuming that the value and sign bits do not have the one possible
    invalid value for the representation, the only way to have a trap
    value in a signed integer type is if there is some invalid combination
    of padding bits.

    6.6.2 makes this all quite clear.
    This does make it possible for i |= 1 to produce undefined behavior on
    1's complement implementations, specifically if the prior value of i
    had the sign bit and all but the least significant value bit set to 1
    prior to the operation, it could produce a trap representation.
     
    Jack Klein, Jan 26, 2004
    #4
  5. Well I guess you choose the latter. Here's one thread in which he discusses
    this...

    http://groups.google.com/groups?threadm=

    ITYM 6.2.6, but see the link above.
     
    Peter Nilsson, Jan 26, 2004
    #5
  6. pete

    pete Guest

    The matter of how many value bits are in the unsigned type,
    is irrelevant.
    The signed type has the same number of value bits as itself,
    regardless of whether it's representing a positive or negative value.
     
    pete, Jan 26, 2004
    #6
  7. pete

    Jack Klein Guest

    Yes, I've seen the thread, and it is hair-splitting nonsense.

    Paragraph 5 of 6.2.6.1:

    <quote>
    Certain object representations need not represent a value of the
    object type. If the stored value of an object has such a
    representation and is read by an lvalue expression that does
    not have character type, the behavior is undefined. If such a
    representation is produced by a side effect that modifies all or any
    part of the object by an lvalue expression that does not have
    character type, the behavior is undefined. Such a representation is
    called a trap representation.
    <unquote>

    The contents of an object referenced by an lvalue of a particular type
    (other than unsigned char) can be one of two things: a value or a
    trap representation. It can't be both, it can't be neither.

    Then there are paragraphs 1 & 2 of 6.2.6.2:

    <quote>
    1 For unsigned integer types other than unsigned char, the bits of the
    object representation shall be divided into two groups: value bits and
    padding bits (there need not be any of the latter). If there are N
    value bits, each bit shall represent a different power of 2 between 1
    and 2N-1, so that objects of that type shall be capable of
    representing values from 0 to 2N - 1 using a pure binary
    representation; this shall be known as the value representation. The
    values of any padding bits are unspecified.

    2 For signed integer types, the bits of the object representation
    shall be divided into three groups: value bits, padding bits, and the
    sign bit. There need not be any padding bits; there shall be exactly
    one sign bit. Each bit that is a value bit shall have the same value
    as the same bit in the object representation of the corresponding
    unsigned type (if there are M value bits in the signed type and N in
    the unsigned type, then M <= N). If the sign bit
    is zero, it shall not affect the resulting value. If the sign bit is
    one, the value shall be modified in one of the following ways:

    — the corresponding value with sign bit 0 is negated (sign and
    magnitude);

    — the sign bit has the value -(2N) (two’s complement);

    — the sign bit has the value -(2N - 1) (one’s complement).

    Which of these applies is implementation-defined, as is whether the
    value with sign bit 1 and all value bits zero (for the first two), or
    with sign bit and all value bits 1 (for one’s complement), is a trap
    representation or a normal value. In the case of sign and magnitude
    and one’s complement, if this representation is a normal value it is
    called a negative zero.
    <unquote>

    Each value bit contributes to a value, and the sign bit, when set
    specifically has a value that combines with the value of the value
    bits.

    When the standard provides a list of anything, in this case possible
    combinations of sign and value bits that may be trap representations,
    without a qualifier like "includes", that list is exclusive. Nowhere
    in the standard does it say that any other combinations of sign and
    value bits may be trap representations and not values.
     
    Jack Klein, Jan 27, 2004
    #7
  8. This has bearing on _my_ long-term troubling question about writing
    an honestly portable itoa.

    If I write (in the OTBS):

    char *local_itoa(int i)
    {
    unsigned int x = (i>=0) ? (unsigned) i : -(unsigned) i;
    ....
    am I guaranteed that the absolute value of i can be represented
    in x? A literal reading of this standard would suggest not, if
    M (the _data_bit_ width of signed int) is equal to N (the width
    of unsigned int), and the input value is INT_MIN in twos-complement,
    i.e., -(2^N). In that case the absolute value (-(unsigned)i) should
    be 2^N, but the max value representable is 2^M-1. This possibility
    is unique to architectures with twos-complement signed representations.

    If the standard does not guarantee this, should it? Are there
    any known twos-complement architectures where N==M? Are there
    _any_ architectures where N==M, or did the standard writers glitch
    and mean to write M < N?

    Enquiring minds want to know. ;-)

    - Larry
     
    Larry Doolittle, Jan 27, 2004
    #8
  9. pete

    CBFalconer Guest

    What is the problem? Write a routine for an unsigned integer.
    For integers, resolve the sign, convert to unsigned, and apply
    that routine.

    When you have languages that do not provide unsigned versions
    things are slighly harder, because you have to allow for the fact
    that negative values may have a larger range. In that case write
    a routine to convert negative integers, ignoring the sign, and
    then call that from a routine that handles the sign. This does
    not apply to C.
     
    CBFalconer, Jan 28, 2004
    #9
  10. pete

    pete Guest

    Can't do it if UINT_MAX == 65535, and INT_MIN == -65536.

    CHAR_BIT == 16
    INT_MAX == 65535
    sizeof(unsigned) == 2
    sizeof(int) == 2

    unsigned isn't guaranteed to be able to represent
    the magnitude of INT_MIN.
     
    pete, Jan 28, 2004
    #10
  11. Pete replied (using an forged e-mail address, and in a posting
    that never showed up on my news server):
    Try as I might, I can't think of a machine that
    could or would set things up like this. Is there
    one, or is the standard unnecessarily permissive?

    I guess my code should really look like:

    #if -(INT_MIN)>UINT_MAX
    #error this computer may be conforming, but it is too strange for this program
    #endif
    char *local_itoa(int i)
    {
    unsigned int x = (i>=0) ? (unsigned) i : -(unsigned) i;

    - Larry
     
    Larry Doolittle, Jan 29, 2004
    #11
  12. [I haven't checked Pete's references, but I'll take it as given
    for the moment that he's right.]
    Sounds unnecessarily permissive to me. But if the Standard
    allows it, then you can bet the DS9000 implements it. ;-)
    Huh? On a "normal" two's-complement machine, this would produce
    undefined behavior when you tried to negate INT_MIN. "Fixing" the
    code to read

    #if (0u - INT_MIN) > UINT_MAX

    just makes it sillier: the left-hand side evaluates to some unsigned
    int, which by definition cannot be greater than UINT_MAX. So I
    have no idea how you thought this code could work -- and even less
    idea of how to make it compute what you apparently think it's computing.

    -Arthur
     
    Arthur J. O'Dwyer, Jan 30, 2004
    #12
  13. Looking at this closer, this sounds like 16 pad bits on unsigned,
    and 15 pad bits plus 1 sign bit on signed. Implausible, but
    maybe not impossible.
    I wasn't aware that the preprocessor suffered the same potential
    warts as the execution model. I'll have to go read the standard
    more carefully.
    While I would prefer the standard guarantee that -INT_MIN could be
    represented as an unsigned int, I should at least be able to detect
    a failure of this assumption at compile time. How about

    #if -(INT_MIN+1) > (UINT_MAX-1)

    ?

    - Larry
     
    Larry Doolittle, Jan 30, 2004
    #13
  14. pete

    pete Guest

    bottom line:
    The standard doesn't guarantee that there is any integer type
    capable of representing the magnitude of INT_MIN.
    That's the relevant rule which makes implementing itoa, amusing.
     
    pete, Jan 30, 2004
    #14
  15. pete

    Dan Pop Guest

    It's not that amusing: just treat INT_MIN as a special case: set a flag
    and handle it as INT_MIN + 1. If the flag is set, increment the least
    significant digit of the result. No need to worry about carry propagation
    because no power of two has 0 as its least significant digit.

    Dan
     
    Dan Pop, Jan 30, 2004
    #15
  16. pete

    pete Guest

    That might be why I did it without any standard library headers.
    It was adapted from put_d(),
    which is a functional component of a minprintf() (K&R2, 7.3)
    that I'm writing from "first principles", for no important reason.
    It uses stdarg.h of course, but only four features from stdio.h:
    EOF, FILE, stdout, and putc(c, stream)
    void ito_a(int i, char *s)
    {
    char c, *p;
    int int_min = 0;

    p = s;
    if (0 > i) {
    *p++ = '-';
    ++s;
    if (i == INT_MIN) {
    int_min = 1;
    i = INT_MAX;
    } else {
    i = -i;
    }
    }
    do {
    *p++ = (char)('0' + i % 10);
    i /= 10;
    } while (i);
    if (int_min) {
    ++*s;
    }
    for (*p = '\0'; --p > s; ++s) {
    c = *s;
    *s = *p;
    *p = c;
    }
    }
     
    pete, Jan 31, 2004
    #16
  17. pete

    pete Guest

    I think you mean, handle (-1 - INT_MAX) as a special case
    void ito_a(int i, char *s)
    {
    char c, *p;
    int flag = 0;

    p = s;
    if (0 > i) {
    *p++ = '-';
    ++s;
    if (i == -1 - INT_MAX) {
    flag = 1;
    i = INT_MAX;
    } else {
    i = -i;
    }
    }
    do {
    *p++ = (char)('0' + i % 10);
    i /= 10;
    } while (i);
    if (flag) {
    ++*s;
    }
    for (*p = '\0'; --p > s; ++s) {
    c = *s;
    *s = *p;
    *p = c;
    }
    }
     
    pete, Jan 31, 2004
    #17
  18. You're assuming INT_MIN + 1 == INT_MAX.

    i = -(INT_MIN + 1);
     
    Peter Nilsson, Jan 31, 2004
    #18
  19. No. On implementations where INT_MIN == -INT_MAX, -1-INT_MAX can overflow.
     
    Peter Nilsson, Jan 31, 2004
    #19
  20. pete

    pete Guest

    Thank you.

    void ito_a(int i, char *s)
    {
    char c, *p;
    int flag = 0;

    p = s;
    if (0 > i) {
    *p++ = '-';
    ++s;
    if (-INT_MAX > i) {
    flag = 1;
    i = INT_MAX;
    } else {
    i = -i;
    }
    }
    do {
    *p++ = (char)('0' + i % 10);
    i /= 10;
    } while (i);
    if (flag) {
    ++*s;
    }
    for (*p = '\0'; --p > s; ++s) {
    c = *s;
    *s = *p;
    *p = c;
    }
    }
     
    pete, Jan 31, 2004
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.