-2147483648 and gcc optimisation, all sorts of different results

Discussion in 'C Programming' started by tom_usenet@optusnet.com.au, Mar 11, 2010.

  1. Guest

    I'm surprised at the different results I can get from code with and
    without optimisation where overflow is involved.

    I suspect this was "done to death" about 20 years ago, but
    I can't find anything in comp.lang.c matching this. Is this
    the "if you overflow the compiler can do whatever it likes"
    clause?

    The original problem I was trying to solve is why a simple
    embedded "printf("%ld")" was printing random garbage when
    handed -2147483648.

    Here are the results for the following simple program with different
    optimisation levels, in all cases "-O1" gives sane results and "-O2"
    gives remarkably creative results.

    The problems are:

    1 - With "-O2" the second test in the following is optimised out:
    as it "can't be true". Unless (num == -2147483648) in which
    case it IS true, and is what I'm trying to correct for in
    the function I was trying to fix.

    if (num < 0L) num = -num;
    if (num < 0L) stillNeg = 1;

    2 - "-2147483647 + 1 == -2147483647" ???

    3 - "-2147483647 == --2147483647" ???

    This is the "sane" one without optimisation.

    $ gcc --version
    gcc (GCC) 4.3.4 20090804 (release) 1

    $ gcc -Wall -O1 -o ox2 ox2.c
    $ ./ox2
    Function calls
    2147483646
    2147483647
    -2147483648 Still negative
    2147483647
    2147483646
    For loop
    num = 2147483646 vnum = 2147483646
    num = 2147483647 vnum = 2147483647
    num = -2147483648 neg vnum = -2147483648 neg
    num = 2147483647 vnum = 2147483647
    num = 2147483646 vnum = 2147483646

    This is the insane one with optimisation.

    $ gcc -Wall -O2 -o ox2 ox2.c
    $ ./ox2
    Function calls
    2147483646
    2147483647
    -2147483648 *** Missed the second test ***
    2147483647
    2147483646
    For loop
    num = 2147483646 vnum = 2147483646
    num = 2147483647 vnum = 2147483647
    num = 2147483647 vnum = -2147483648
    num = 2147483647 vnum = -2147483647
    num = 2147483647 vnum = -2147483646

    ^^ That got stuck ^^ ^^ That is negating when it shouldn't **

    Here's the test code. Apologies for the "crammed style", I don't
    usually write code that looks this bad:

    #include <stdio.h>

    void tneg(long num);
    void tneg(long num)
    {
    int stillNeg = 0;
    if (num < 0L) num = -num;
    if (num < 0L) stillNeg = 1;
    printf("%ld %s\n", num, (stillNeg)? " Still negative" : "");
    }

    int main(int argc, char ** argv)
    {
    long count;
    long test, vtest, num, vnum;

    printf("Function calls\n");
    tneg((long)0x7ffffffe);
    tneg((long)0x7fffffff);
    tneg((long)0x80000000);
    tneg((long)0x80000001);
    tneg((long)0x80000002);

    printf("For loop\n");
    vnum = (argc == 5) ? 5 : 0x7ffffffe;
    num = 0x7ffffffe;
    for (count = 0; count < 5; count++)
    {
    int stillNeg = 0, vstillNeg = 0;

    test = num; vtest = vnum;
    if (test < 0) test = - test;
    if (test < 0) stillNeg = 1;
    if (vtest < 0) vtest = - vtest;
    if (vtest < 0) vstillNeg = 1;
    printf("num = %ld %s ", test, (stillNeg)? " neg" : " ");
    printf("vnum = %ld %s\n", vtest, (vstillNeg)? " neg" : "");
    num += 1; vnum += 1;
    }
    return 0;
    }
     
    , Mar 11, 2010
    #1
    1. Advertising

  2. "" <> writes:

    > I'm surprised at the different results I can get from code with and
    > without optimisation where overflow is involved.
    >
    > I suspect this was "done to death" about 20 years ago, but
    > I can't find anything in comp.lang.c matching this. Is this
    > the "if you overflow the compiler can do whatever it likes"
    > clause?


    Looks like it, yes.

    > The original problem I was trying to solve is why a simple
    > embedded "printf("%ld")" was printing random garbage when
    > handed -2147483648.


    That sounds quite different. For the "long int" that you seem to be
    using it would be a library bug for printf("%ld", x) to print anything
    but -2147483648. Did you mean that printf is being handed something
    apparently random when you expected it to be handed -2147483648?

    > Here are the results for the following simple program with different
    > optimisation levels, in all cases "-O1" gives sane results and "-O2"
    > gives remarkably creative results.
    >
    > The problems are:
    >
    > 1 - With "-O2" the second test in the following is optimised out:
    > as it "can't be true". Unless (num == -2147483648) in which
    > case it IS true, and is what I'm trying to correct for in
    > the function I was trying to fix.
    >
    > if (num < 0L) num = -num;
    > if (num < 0L) stillNeg = 1;


    From the point of view of the C language it's not quite that "it can't
    be true" -- it's more a case of "either it's true or undefined
    behaviour has occurred". The net effect is the same, in that the
    compiler is using this undefined behaviour as permission to conclude
    that the second test is redundant.

    I hope that does not sound like too much splitting of hairs. It's
    useful to distinguish between what the C standard says and what a
    compiler decides to do as a result.

    > 2 - "-2147483647 + 1 == -2147483647" ???


    I don't see where this happens in your code. If -2147483647 is
    representable in your long int type (and it is from you example below)
    then the above would be a bug. -2147483647 + 1 must be -2147483646.

    [Guessing here: did you mean "2147483647 + 1 == 2147483647"? If so,
    the compiler can do pretty much what it likes since 2147483647 + 1 is
    undefined with the types you are using.]

    > 3 - "-2147483647 == --2147483647" ???


    Due to C's parsing rules, -- is not the same as - - but I know what you
    are saying here.

    With normal 32-but integers, if you see that -(-2147483647) !=
    2147483647 then you would have bug but, again, I don't see that in
    your code.

    > This is the "sane" one without optimisation.
    >
    > $ gcc --version
    > gcc (GCC) 4.3.4 20090804 (release) 1
    >
    > $ gcc -Wall -O1 -o ox2 ox2.c
    > $ ./ox2
    > Function calls
    > 2147483646
    > 2147483647
    > -2147483648 Still negative
    > 2147483647
    > 2147483646
    > For loop
    > num = 2147483646 vnum = 2147483646
    > num = 2147483647 vnum = 2147483647
    > num = -2147483648 neg vnum = -2147483648 neg
    > num = 2147483647 vnum = 2147483647
    > num = 2147483646 vnum = 2147483646
    >
    > This is the insane one with optimisation.
    >
    > $ gcc -Wall -O2 -o ox2 ox2.c
    > $ ./ox2
    > Function calls
    > 2147483646
    > 2147483647
    > -2147483648 *** Missed the second test ***
    > 2147483647
    > 2147483646
    > For loop
    > num = 2147483646 vnum = 2147483646
    > num = 2147483647 vnum = 2147483647
    > num = 2147483647 vnum = -2147483648
    > num = 2147483647 vnum = -2147483647
    > num = 2147483647 vnum = -2147483646
    >
    > ^^ That got stuck ^^ ^^ That is negating when it shouldn't **


    The compiler is probably unrolling the loop[1] and can thus tell that num
    overflows. It is permitted to make num += 1 whatever it likes. It
    can't tell that vnum overflows because you (deliberately, I am sure)
    made it depend on argc but it can (and, I think, does) assume that
    there is no point in testing for vtest < 0 (vtest being a copy of
    vnum) since it starts positive and is only incremented.

    [1] It only needs to unroll one loop body to see that num hits its
    maximum value and the optimiser will always try to unroll one loop to
    put the test at the bottom. Change the initial value so that it is
    one less and you will see that num and vnum now mirror each other.

    > Here's the test code. Apologies for the "crammed style", I don't
    > usually write code that looks this bad:
    >
    > #include <stdio.h>
    >
    > void tneg(long num);
    > void tneg(long num)
    > {
    > int stillNeg = 0;
    > if (num < 0L) num = -num;
    > if (num < 0L) stillNeg = 1;
    > printf("%ld %s\n", num, (stillNeg)? " Still negative" : "");
    > }
    >
    > int main(int argc, char ** argv)
    > {
    > long count;
    > long test, vtest, num, vnum;
    >
    > printf("Function calls\n");
    > tneg((long)0x7ffffffe);
    > tneg((long)0x7fffffff);
    > tneg((long)0x80000000);
    > tneg((long)0x80000001);
    > tneg((long)0x80000002);


    FYI: these last three are implementation defined conversions (i.e. the
    C language does not say exactly what happens). 0x80000000 is a
    positive number that can't be represented in your long type.

    > printf("For loop\n");
    > vnum = (argc == 5) ? 5 : 0x7ffffffe;
    > num = 0x7ffffffe;
    > for (count = 0; count < 5; count++)
    > {
    > int stillNeg = 0, vstillNeg = 0;
    >
    > test = num; vtest = vnum;
    > if (test < 0) test = - test;
    > if (test < 0) stillNeg = 1;
    > if (vtest < 0) vtest = - vtest;
    > if (vtest < 0) vstillNeg = 1;
    > printf("num = %ld %s ", test, (stillNeg)? " neg" : " ");
    > printf("vnum = %ld %s\n", vtest, (vstillNeg)? " neg" : "");
    > num += 1; vnum += 1;
    > }
    > return 0;
    > }


    --
    Ben.
     
    Ben Bacarisse, Mar 11, 2010
    #2
    1. Advertising

  3. Eric Sosman Guest

    On 3/11/2010 9:59 AM, Ben Bacarisse wrote:
    > ""<> writes:
    >
    >> I'm surprised at the different results I can get from code with and
    >> without optimisation where overflow is involved.
    >>
    >> I suspect this was "done to death" about 20 years ago, but
    >> I can't find anything in comp.lang.c matching this. Is this
    >> the "if you overflow the compiler can do whatever it likes"
    >> clause?

    >
    > Looks like it, yes.
    >
    >> The original problem I was trying to solve is why a simple
    >> embedded "printf("%ld")" was printing random garbage when
    >> handed -2147483648.

    >
    > That sounds quite different. For the "long int" that you seem to be
    > using it would be a library bug for printf("%ld", x) to print anything
    > but -2147483648. Did you mean that printf is being handed something
    > apparently random when you expected it to be handed -2147483648?


    The way the value is "handed" to printf() may make a
    difference, and so may the applicable version of the Standard.
    Note that the value 2147483648 is too large for a 32-bit long,
    so the operand of the `-' operator will be of a different type.
    The chosen type depends on the Standard version: Under C90 rules
    you'll get an unsigned long, C99 gives (signed) long long. The
    unary `-' operator is then applied; under C90 you wind up with
    the unsigned long 2147483648, C99 gives the negative -2147483648
    as a long long.

    Passing either of these to "%ld" is undefined behavior, because
    "%ld" wants a (signed) long, period. Under C90 you're very likely
    to get away with it and see the negative output you were expecting
    all along, but under C99 you'll be passing a (probably) 64-bit
    value where a 32-bit value was expected. This could easily throw
    things off and generate the garbage the O.P. encountered.

    In short, under C99

    printf ("%ld\n", -2147483648); // passes LL

    may plausibly generate different output than

    long num = -2147483648; // LL converts to L
    printf ("%ld\n", num);

    .... because of the type mismatch. (There's also the potential for
    conversion issues in the second fragment, but that's unlikely to
    be the source of the trouble.)

    --
    Eric Sosman
    lid
     
    Eric Sosman, Mar 11, 2010
    #3
  4. Eric Sosman <> writes:

    > On 3/11/2010 9:59 AM, Ben Bacarisse wrote:
    >> ""<> writes:
    >>
    >>> I'm surprised at the different results I can get from code with and
    >>> without optimisation where overflow is involved.
    >>>
    >>> I suspect this was "done to death" about 20 years ago, but
    >>> I can't find anything in comp.lang.c matching this. Is this
    >>> the "if you overflow the compiler can do whatever it likes"
    >>> clause?

    >>
    >> Looks like it, yes.
    >>
    >>> The original problem I was trying to solve is why a simple
    >>> embedded "printf("%ld")" was printing random garbage when
    >>> handed -2147483648.

    >>
    >> That sounds quite different. For the "long int" that you seem to be
    >> using it would be a library bug for printf("%ld", x) to print anything
    >> but -2147483648. Did you mean that printf is being handed something
    >> apparently random when you expected it to be handed -2147483648?

    >
    > The way the value is "handed" to printf() may make a
    > difference, and so may the applicable version of the Standard.
    > Note that the value 2147483648 is too large for a 32-bit long,
    > so the operand of the `-' operator will be of a different type.
    > The chosen type depends on the Standard version: Under C90 rules
    > you'll get an unsigned long, C99 gives (signed) long long. The
    > unary `-' operator is then applied; under C90 you wind up with
    > the unsigned long 2147483648, C99 gives the negative -2147483648
    > as a long long.


    I was unclear in a way that is depressingly common (not just for me
    but I do it quite often): I meant the mathematical value -2147483648
    not the C expression. Given that, I think I am right that printf must
    print "-2147483648" with the 32 bit longs being used by the OP.

    > Passing either of these to "%ld" is undefined behavior, because
    > "%ld" wants a (signed) long, period. Under C90 you're very likely
    > to get away with it and see the negative output you were expecting
    > all along, but under C99 you'll be passing a (probably) 64-bit
    > value where a 32-bit value was expected. This could easily throw
    > things off and generate the garbage the O.P. encountered.


    That's a good point, but it seems unlikely that the original case the
    OP is describing is one where the C constant expression -2147483648 is
    the actual argument of printf. Why would anyone write that?

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Mar 11, 2010
    #4
  5. Guest

    On Mar 12, 1:59 am, Ben Bacarisse <> wrote:
    > "" <> writes:

    ....
    > > The original problem I was trying to solve is why a simple
    > > embedded "printf("%ld")" was printing random garbage when
    > > handed  -2147483648.

    >
    > That sounds quite different.  For the "long int" that you seem to be
    > using it would be a library bug for printf("%ld", x) to print anything
    > but -2147483648.  Did you mean that printf is being handed something
    > apparently random when you expected it to be handed -2147483648?


    We're not using gcc's libc, at least not the stdio one. We have our
    own
    printf() code.

    The function that prints "a number in any base" is being handed
    "32 bits with the top bit set" best represented as "0x80000000", and
    starts:

    static void outnum( long num, const long base, struct PRINTF_CTX
    *ctx )

    {

    charptr cp;

    int negative;

    char outbuf[32];

    const char digits[] = "0123456789ABCDEF";



    /* Check if number is negative */

    if (num < 0L) {

    negative = 1;

    num = -num;

    }
    else

    negative = 0;



    /* Build number (backwards) in outbuf */

    cp = outbuf;

    do {

    *cp++ = digits[(int)(num % base)];

    } while ((num /= base) > 0);

    if (negative)

    *cp++ = '-';

    *cp-- = 0;

    And "*cp++ = digits[(int)(num % base)];" indexes backwards when
    "num" is negative.

    I added another "if (num < 0L)" after the first one to handle and
    the compiler removed it. That was the start of this.

    > > 2 - "-2147483647 + 1 == -2147483647" ???

    >
    > I don't see where this happens in your code.
    >
    > [Guessing here: did you mean "2147483647 + 1 == 2147483647"?


    That's the one.

    > > 3 - "-2147483647  == --2147483647" ???

    >
    > Due to C's parsing rules, -- is not the same as - - but
    > I know what you are saying here.


    That was a typo. I meant to say "2147483647 == -2147483647".

    The unoptimised case prints:

    vnum = 2147483646, 2147483647, -2147483648, 2147483647, 2147483646.

    The optimised case prints:

    vnum = 2147483646, 2147483647, -2147483648, -2147483647, -2147483646.

    The code is "if the number is negative, make it positive and print
    it",
    but for the last two numbers the "meant to be positive" numbers
    aren't.

    > > ^^ That got stuck ^^   ^^ That is negating when it shouldn't **

    >
    > The compiler is probably unrolling the loop[1] and can thus tell that num
    > overflows.  It is permitted to make num += 1 whatever it likes.


    So it is doing "INT_MAX + 1 = INT_MAX" when it known about the
    overflow
    and "INT_MAX + 1 is OK if we assume it is now an unsigned int" when it
    doesn't.

    > That's a good point, but it seems unlikely that the original
    > case the OP is describing is one where the C constant
    > expression -2147483648 is the actual argument of printf.
    > Why would anyone write that?


    I didn't. I was doing "long num = f" where "f" is a float that had
    gone to infinity due to a divide-by-zero. "num = f" results in
    "INT_MAX" when "f" is too big to represent, but strangely "INT_MIN"
    when infinity. Strange conversion, probaby legal. I was trying to
    print "num" to see what was going on and then hit the bug
    in our print code.

    > Why would anyone write that?


    When writing test cases to find out what the compiler is doing.

    tom_usenet
     
    , Mar 11, 2010
    #5
  6. "" <> writes:

    > On Mar 12, 1:59 am, Ben Bacarisse <> wrote:
    >> "" <> writes:

    > ...
    >> > The original problem I was trying to solve is why a simple
    >> > embedded "printf("%ld")" was printing random garbage when
    >> > handed  -2147483648.

    >>
    >> That sounds quite different.  For the "long int" that you seem to be
    >> using it would be a library bug for printf("%ld", x) to print anything
    >> but -2147483648.  Did you mean that printf is being handed something
    >> apparently random when you expected it to be handed -2147483648?

    >
    > We're not using gcc's libc, at least not the stdio one. We have our
    > own printf() code.
    >
    > The function that prints "a number in any base" is being handed "32
    > bits with the top bit set" best represented as "0x80000000",


    In this case I think it is simpler to say LONG_MIN. Everything else
    is up for misinterpretation.

    In fact, I understood you fine the first time. You pass the
    mathematical value -2147483648 and get an odd result. (You could also
    say that you pass LONG_MIN.) The confusion comes when someone
    interprets -2147483648 as a C expression. If you use C99 it is
    possible that this is an integer constant expression of type "long
    long int".

    > and starts:
    >
    > static void outnum( long num, const long base, struct PRINTF_CTX
    > *ctx )
    > {
    > charptr cp;
    > int negative;
    > char outbuf[32];
    > const char digits[] = "0123456789ABCDEF";
    >
    > /* Check if number is negative */
    > if (num < 0L) {
    > negative = 1;
    > num = -num;
    > }
    > else
    > negative = 0;
    >
    > /* Build number (backwards) in outbuf */
    > cp = outbuf;
    > do {
    > *cp++ = digits[(int)(num % base)];
    > } while ((num /= base) > 0);
    > if (negative)
    > *cp++ = '-';
    > *cp-- = 0;
    >
    > And "*cp++ = digits[(int)(num % base)];" indexes backwards when
    > "num" is negative.
    >
    > I added another "if (num < 0L)" after the first one to handle and
    > the compiler removed it. That was the start of this.


    Your best bet is probably to handle num == LONG_MIN as a special case.
    There are various other tricks that you can do, like handling the
    first digit before negating the number (so the most negative number
    you try to make positive is num/base) but I don't think any are
    significantly better than a simple special case.

    <snip>
    >> The compiler is probably unrolling the loop[1] and can thus tell that num
    >> overflows.  It is permitted to make num += 1 whatever it likes.

    >
    > So it is doing "INT_MAX + 1 = INT_MAX" when it known about the
    > overflow


    (s/INT/LONG/ because you are using long)

    Yes. Though that is an arbitrary choice it made. The loop unrolling
    spotted that the next value is LONG_MAX and that it did not need to
    increment any further because after an overflow, anything will do.
    This it replaced the increment with an assignment of LONG_MAX. As I
    suggested, if you start with num one smaller, the compiler does not
    spot the overflow (because it only unrolls the loop to put the test at
    the bottom) and you get an implementation defined increment. I point
    this out only because I had fun investigating. I don't think the
    details matter.

    > and "INT_MAX + 1 is OK if we assume it is now an unsigned int" when it
    > doesn't.


    I don't follow this bit but I am not sure there is any point in trying
    really hard to understand what the compiler did once you entered the
    realms of undefined behaviour.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Mar 11, 2010
    #6
  7. Tim Rentsch Guest

    Ben Bacarisse <> writes:

    > "" <> writes:

    [snip]
    >>
    >> static void outnum( long num, const long base, struct PRINTF_CTX
    >> *ctx )
    >> {
    >> charptr cp;
    >> int negative;
    >> char outbuf[32];
    >> const char digits[] = "0123456789ABCDEF";
    >>
    >> /* Check if number is negative */
    >> if (num < 0L) {
    >> negative = 1;
    >> num = -num;
    >> }
    >> else
    >> negative = 0;
    >>
    >> /* Build number (backwards) in outbuf */
    >> cp = outbuf;
    >> do {
    >> *cp++ = digits[(int)(num % base)];
    >> } while ((num /= base) > 0);
    >> if (negative)
    >> *cp++ = '-';
    >> *cp-- = 0;
    >>
    >> And "*cp++ = digits[(int)(num % base)];" indexes backwards when
    >> "num" is negative.
    >>
    >> I added another "if (num < 0L)" after the first one to handle and
    >> the compiler removed it. That was the start of this.

    >
    > Your best bet is probably to handle num == LONG_MIN as a special case.
    > [snip elaboration]


    I second this recommendation, except a better test is 'num < -LONG_MAX';
    writing the test this way more directly expresses the essential
    characteristic of the condition that needs to be tested. (Other
    obvious comments and advice left as an exercise.)
     
    Tim Rentsch, Mar 23, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Srikanth Mandava
    Replies:
    1
    Views:
    397
    Michael Hudson
    Feb 19, 2004
  2. Skybuck Flying

    int a = -2147483648;

    Skybuck Flying, Jul 8, 2007, in forum: C Programming
    Replies:
    35
    Views:
    1,015
    Richard Bos
    Jul 13, 2007
  3. suresh
    Replies:
    48
    Views:
    2,211
    James Kanze
    Jan 4, 2008
  4. Jonathan Wood
    Replies:
    2
    Views:
    329
    Jonathan Wood
    Jun 18, 2008
  5. ks
    Replies:
    9
    Views:
    416
    Jorgen Grahn
    Mar 20, 2010
Loading...

Share This Page