Performance of signed vs unsigned types

Discussion in 'C Programming' started by Thomas Scrace, Apr 20, 2011.

  1. Hi all,

    In K&R there is an exercise to compute the range of possible values for each C
    type. In doing this I have written two functions that are identical, except for
    the fact that one uses signed long ints, and the other uses unsigned long ints.
    I have found that the function that uses the unsigned ints completes in a few seconds,
    while the signed version takes so long that I have never had the patience to
    wait for it to end. The only operators I am using on the variables are increment
    and decrement.

    Is there a difference in performance between these types? Is this restricted to
    long ints, or is there a general difference between all signed/non-signed C
    types? Could this be something compiler or processor specific? I am using GCC on
    the x86 platform (intel core i5).

    Thanks,

    Tom
     
    Thomas Scrace, Apr 20, 2011
    #1
    1. Advertisements

  2. Thomas Scrace

    James Kuyper Guest

    The most important differences between unsigned and signed types are not
    in their performance, but their behavior.
    I'm not familiar with the exercise you're referring to, I only have a
    copy of the first edition of K&R, which might not even have that
    exercise (I didn't find it in a quick glance through the book).

    The easiest way to determine the limits of the C types is to use the
    macros from <limits.h>, but I presume that would defeat the point of the
    exercise. There as safe and quite trivial methods of computing the range
    of an unsigned type, but most of the ways I can think of for computing
    the range of a signed type are subject to serious portability issues.
    What do you expect the following function to do?

    #include <limits.h>

    long next_after_last(void)
    {
    return LONG_MAX + 1L;
    }

    Your answer to that question is likely to be closely related to the
    reason your program is stuck.
     
    James Kuyper, Apr 20, 2011
    #2
    1. Advertisements

  3. Oh!

    I get it now. It is cycling around. It didn't happen with the others because
    there was either a larger type to handle the overflow, or it was unsigned, and
    therefore had nothing to cycle around to.

    Thank you.
     
    Thomas Scrace, Apr 20, 2011
    #3
  4. Yes, sorry. I do use >/< operators.

    The code is as follows:

    signed long slmax()
    {
    signed long s;
    signed long temp;
    for (s = 0, temp = -1; s > temp; ++s, ++temp) {
    ;
    }
    return temp;
    }

    It's the same for all the other types, the only difference is the types.

    As you can see (and as I now see) in the case of an signed long it will cycle
    around and around infinitely. For the unsigned long this was of course not a
    problem. For signed shorts and signed ints it was not a problem because I could
    use a larger type (long) for the temp variable.

    I did not realise that signed types behaved this way.

    Thomas Scrace <>
     
    Thomas Scrace, Apr 20, 2011
    #4
  5. It's not obvious to me. I'd expect this loop to terminate on most
    current C implementations.

    Integer overflow is undefined behaviour so, in theory, anything can
    happen when s reaches LONG_MAX. In fact, the compiler is permitted to
    work out that this is inevitable and may therefore replace your code
    with anything that it pleases. Of course they won't. What will happen
    in a very large number of cases is that s will wrap from LONG_MAX to
    LONG_MIN and the loop will end.

    You may be seeing what I am seeing. On my machine long is 64 bits and I
    calculate that, at the speed my machine can run this loop, it won't end
    for about 340 years.
    I am not 100% sure that what you've deduced about your code is true. I
    am pretty sure that you've missed that fact that this code is undefined
    by the rules of the language.
     
    Ben Bacarisse, Apr 20, 2011
    #5
  6. Signed types have undefined behaviour on overflow. Normally they just
    wrap like unsigned types from maximium to minimum. But other
    behaviours are possible.
     
    Malcolm McLean, Apr 20, 2011
    #6
  7. Well, it seems as if you and Ben might be right, because I just amended the code
    to take advance of the wrapping behaviour, and the loop still is not
    terminating.

    It is rather strange, because when I do:

    LONG_MAX + 1L

    I get LONG_MIN. But a loop that increments a signed long
    from 0 until it becomes negative does not appear to end. I can't really understand this, because I can
    get to ULONG_MAX in a matter of a few seconds in the same fashion.
     
    Thomas Scrace, Apr 20, 2011
    #7
  8. Well, I just added a print statement to the loop so I could see what was
    happening. It is definitely not looping around, it is just going really slowly.
    I did a very rough timing, and it took about a minute to count to 4 million. By
    comparison, with an unsigned long it took just a few seconds to max out at
    9223372036854775807. So, the wrapping does not seem to be the issue. I am at a
    loss as to how to explain why the speed is so dramatically different.
     
    Thomas Scrace, Apr 20, 2011
    #8
  9. Sorry, I meant 18446744073709551615.
     
    Thomas Scrace, Apr 20, 2011
    #9
  10. Right. This makes sense. Yes, we are talking 64 bits.

    I was not recognising the problem with the unsigned variables, because they do
    in fact produce the desired result, just not in the way I intended!

    Thanks a lot for your help.
     
    Thomas Scrace, Apr 20, 2011
    #10
  11. (That's 2**64-1.)

    That's implausible; there must be something else going on.

    If your computer can increment a 64-bit long in 1 nanosecond
    (1 GHz), it would take it over 500 years to iterate from 0
    to 18446744073709551615. You can tweak the numbers slightly
    (maybe your computer is faster than that, maybe it can do multiple
    increments per clock cycle, etc.), but not enough to reduce the
    time to a few seconds.

    You said you added a print statement to the loop, and it took about
    a minute to count to 4 million. (BTW, this would be a lot easier if
    you had kept that context in your followup.) The print statement is
    going to take orders of magnitude longer than a simple increment;
    a minute to generate 4 million lines of output is plausible.
    The overhead of formatting the output string is going to be
    substantial, and even that is likely to be swamped by the time it
    takes to physically generate the output.

    My best guess is that you're actually iterating over a 32-bit range.
    That takes about 6 seconds on my system (with no ouput in the loop).
    (A sufficiently clever compiler might also optimize away the loop,
    but then it would take no perceptible time, not "just a few seconds".)

    Post the code (with no print stastements in the loop) that
    "took just a few seconds". Please post the complete compilable
    and runnable program. Before the loop starts, have it print the
    relevent values from your <limits.h>, e.g.:

    #include <limits.h>
    ....
    printf("INT_MAX = %d\n", INT_MAX);
    printf("UINT_MAX = %u\n", UINT_MAX);
    printf("LONG_MAX = %ld\n", LONG_MAX);
    printf("ULONG_MAX = %lu\n", ULONG_MAX);

    If this doesn't solve the mystery, we might consider looking at the
    generated assembly code, but we're not there yet.
     
    Keith Thompson, Apr 20, 2011
    #11
  12. Hi Keith.

    I posted the relevant function in a previous post, but I will reproduce it here:
    The function is the same for the unsigned longs, except with the types of the
    two counter variables changed to unsigned. Scott Fluhrer pointed out that the
    explanation for the supernaturally fast enumeration of the 64bit longs was in
    the

    temp = -1

    assignment. Because temp is an unsigned variable in the unsigned functions this
    actually ended up taking the value of ULONG_MAX, and the loop immediately
    terminated. The reason I did not realise the error was that I was looking for it
    to return ULONG_MAX; I just thought it was doing it in a different fashion.

    So, mystery solved! It's amazing how these apparently simple K&R exercises can
    lead to learning a lot more about the language than you initially expect.

    Thanks,
    Tom
     
    Thomas Scrace, Apr 20, 2011
    #12
  13. I'm glad you solved the mystery, but for future reference:

    Yes, you posted the function, but you didn't post a complete program,
    nor did you post the actual function that exhibited the problem. A
    description of your actual code, even if it's accurate, isn't nearly as
    useful as an exact copy (as in copy-and-paste) of your code. We can't
    guess what details you've left out, either accidentally or because you
    assumed they were unimportant.

    Here's the kind of thing I was asking for:

    #include <stdio.h>
    #include <limits.h>

    unsigned long slmax(void)
    {
    unsigned long s;
    unsigned long temp;
    for (s = 0, temp = -1; s > temp; ++s, ++temp) {
    ;
    }
    return temp;
    }


    int main(void)
    {
    unsigned long result;

    printf("ULONG_MAX = %lu\n", ULONG_MAX);
    printf("Calling slmax()\n");
    result = slmax();
    printf("After slmax()\n");
    return 0;
    }

    Output:
    ULONG_MAX = 4294967295
    Calling slmax()
    After slmax()

    And you told us this took "a few seconds" to run. On my system, the
    time it takes to run is literally too short to measure (well, to measure
    conveniently). That's a *huge* difference, and it led me to a
    completely icorrect conclusion.

    Recommended reading:
    <http://www.catb.org/~esr/faqs/smart-questions.html>.

    Again, I'm glad you solved the problem, but I hope this advice will be
    useful next time you want to ask a question here. And please don't
    think I'm picking on you (I'm not) or beating a dead horse (well, I
    probably am).
     
    Keith Thompson, Apr 20, 2011
    #13
  14. What are the functions?


    Not on modern cpus.

    <snip>

    Indeed, and under C90, not all integer types have known limit
    macros, e.g. ptrdiff_t.
    It's impossible in C99, but the maximum is certainly achievable
    in C90.

    ptrdiff_t max, new;
    for (max = 32767; (new = 2ul * max + 1) > max; )
    max = new;
    printf("%ld\n", (long) max);

    Whether the minimum is possible depends on whether you believe
    XXX_MIN is limited to -XXX_MAX or -XXX_MAX-1.
     
    Peter Nilsson, Apr 20, 2011
    #14
  15. Baby-step Giant-step method.

    $ cat try.c
    #include <stdio.h>

    int
    main(void)
    {
    int x;
    int y;
    int step;

    x = step = 1;
    for (; step != 0;)
    {
    if ((y = x + step) > x)
    {
    x = y;
    if ((y = 2 * step) > step)
    step = y;
    }
    else
    {
    step /= 2;
    }
    }
    printf("Maximum int = %d = %#x.\n", x, x);
    return 0;
    }
    $ cc try.c
    $ ./a.out
    Maximum int = 2147483647 = 0x7fffffff.
     
    Michael Press, Apr 21, 2011
    #15

  16. Thanks Keith, I'll bear that in mind.
     
    Thomas Scrace, Apr 21, 2011
    #16
  17. Thomas Scrace

    James Kuyper Guest

    That's one example of precisely what I was talking about. As soon as x >
    INT_MAX - step, the expression x+step has undefined behavior; as soon as
    step > INT_MAX/2, the expression 2*step has undefined behavior. The
    above code seems to rely upon the unportable assumption that the actual
    behavior in each of those cases will be to produce a result that is
    negative. That's very common behavior, but it's not guaranteed by the
    standard. That's what I meant by "serious portability issues".
     
    James Kuyper, Apr 21, 2011
    #17
  18. I thought you all might like to see what I ended up with for this. I have to
    admit that this code does take advantage of formally undefined behaviour;
    notably that signed types will loop around to their maximum when the minimum is
    exceeded, and vice versa. However, this does seem to be quite common behaviour,
    and - more importantly - I cannot find another way to do it.

    I also have not included functions to compute the range of floating point types
    (although I do print these values from the <floats.>h macros) since I have not
    yet found a satisfactory method for this.

    Also, I realise that I could probably make this a bit tidier and less
    function-happy, but I wanted to restrict myself to language features that had
    been covered by K&R up to the point where this exercise appears.

    So, here it is (warning! large code dump ahead):

    <CODE>

    /*
    * K&R Exercise 2-1
    *
    * Determine the ranges of char, short, int,
    * and long variables, both signed and
    * unsigned, by printing appropriate values
    * from standard headers and by direct
    * computation. Harder if you compute them:
    * determine the ranges of the various
    * floating point types.
    *
    * Author: Thomas Scrace <>
    */

    #include <stdio.h>
    #include <limits.h>
    #include <float.h>

    signed char scmin();
    signed short ssmin();
    signed int simin();
    signed long slmin();

    main()
    {
    unsigned char cmn, cmx;
    signed char scmn, scmx;
    unsigned short smx;
    short ssmn;
    unsigned int imx;
    int simn;
    unsigned long lmx;
    long slmn;

    cmx = -1;
    cmn = cmx + 1;
    scmn = scmin();
    scmx = scmn - 1;
    smx = -1;
    ssmn = ssmin();
    imx = -1;
    simn = simin();
    lmx = -1;
    slmn = slmin();

    printf("\n*****************INTEGRAL TYPES********************\n\n");

    printf("-----------values from standard headers-----------\n\n");

    printf("%21d : char : %d\n", 0, UCHAR_MAX);
    printf("%21d : signed char: %d\n", CHAR_MIN, CHAR_MAX);
    printf("%21d : short : %d\n", 0, USHRT_MAX);
    printf("%21d :signed short: %d\n", SHRT_MIN, SHRT_MAX);
    printf("%21d : int : %u\n", 0, UINT_MAX);
    printf("%21d : signed int : %d\n", INT_MIN, INT_MAX);
    printf("%21d : long : %lu\n", 0, ULONG_MAX);
    printf("%21ld :signed long : %ld\n", LONG_MIN, LONG_MAX);

    printf("\n-----------values by direct computation-----------\n\n");

    printf("%21d : char : %d\n", cmn, cmx);
    printf("%21d : signed char: %d\n", scmn, scmx);
    printf("%21hd : short : %d\n", smx + 1, smx);
    printf("%21d :signed short: %hd\n", ssmn, ssmn - 1);
    printf("%21d : int : %u\n", imx + 1, imx);
    printf("%21d : signed int : %d\n", simn, simn - 1);
    printf("%21d : long : %lu\n", lmx + 1, lmx);
    printf("%21ld :signed long : %ld\n", slmn, slmn - 1);

    printf("\n**************FLOATING-POINT TYPES*****************\n\n");

    printf("-----------values from standard headers-------------\n\n");

    printf("%21.6e : float : %0.6e\n", FLT_MIN, FLT_MAX);
    printf("%21.10e : double : %0.10e\n", DBL_MIN, DBL_MAX);

    return 0;
    }

    /* scmin: return the minimum value of a signed char */
    signed char scmin()
    {
    signed char c, tmp;

    for (c = tmp = -2; c < 0; c = c * 2)
    tmp = c;
    return tmp;
    }

    /* ssmin: return the minimum value of a signed short */
    signed short ssmin()
    {
    signed short s, tmp;

    for (s = tmp = -2; s < 0; s = s * 2)
    tmp = s;
    return tmp;
    }

    /* simin: return the minimum value of a signed int */
    signed int simin()
    {
    signed int i, tmp;

    for (i = tmp = -2;i < 0; i = i * 2)
    tmp = i;
    return tmp;
    }

    /* slmin: return the minimum value of a signed long */
    signed long slmin()
    {
    signed long l, tmp;

    for (l = tmp = -2; l < 0; l = l * 2)
    tmp = l;
    return tmp;
    }

    </CODE>

    The output, on my system, is as follows:

    <OUTPUT>

    *****************INTEGRAL TYPES********************

    -----------values from standard headers-----------

    0 : char : 255
    -128 : signed char: 127
    0 : short : 65535
    -32768 :signed short: 32767
    0 : int : 4294967295
    -2147483648 : signed int : 2147483647
    0 : long : 18446744073709551615
    -9223372036854775808 :signed long : 9223372036854775807

    -----------values by direct computation-----------

    0 : char : 255
    -128 : signed char: 127
    0 : short : 65535
    -32768 :signed short: 32767
    0 : int : 4294967295
    -2147483648 : signed int : 2147483647
    0 : long : 18446744073709551615
    -9223372036854775808 :signed long : 9223372036854775807

    **************FLOATING-POINT TYPES*****************

    -----------values from standard headers-------------

    1.175494e-38 : float : 3.402823e+38
    2.2250738585e-308 : double : 1.7976931349e+308

    </OUTPUT>
     
    Thomas Scrace, Apr 21, 2011
    #18
  19. [On K&R Ex 2.1: findinf the range of basic types]
    You can get round the undefined nature integer arithmetic (at least I
    think you can) but it's hard (by which I mean I can't see how) to get
    round the problem of trap representations.

    If a signed type has no trap representation, you can use

    union {
    int i;
    unsigned char b[sizeof(int)];
    } u;

    to set all the bits of u.i to 1 by setting u.b[k] = -1 for all suitable
    k. (You can't just set u.i = -1 because the implementation may not be
    using 2s complement).

    You then unset the sign bit. To find it, flip each bit of each u.b[k]
    looking for one that makes u.i > 0. Similar techniques can be used for
    the minimum.

    <snip>
     
    Ben Bacarisse, Apr 21, 2011
    #19
  20. Well it turned out that you are correct. I have discussed this in previous
    posts, but, briefly, I had a bug that caused the function to give me the
    result I was looking for, but using a method I was not expecting.

    The reason was that I was initiating a loop by assinging -1 to an unsigned type.
    This meant that the variable immediately had the max value, and the loop exited
    without looping through all the possible values of the unsigned type (which is
    what I thought it was doing).

    From my perspective this meant that the program was looping through a large
    number of values extremely quickly in comparison with the signed version of the
    function, which was extremely slow (it actually *was* looping through all the
    values).

    I have posted the full version of the program I ended up with in a previous
    message if you are interested.

    Thanks for the information - it was very interesting.
     
    Thomas Scrace, Apr 21, 2011
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.