Representation of integers

Discussion in 'C Programming' started by Mantorok Redgormor, Sep 13, 2003.

  1. I always see posts that involve the representation of integers, where
    some poster claims that the unerlyding representation of an integer
    doesn't have to reflect on the actual integer, for example:

    int foo = 0;

    0 can be all zeros 0x00000000 or 00000000 00000000 00000000 00000000

    Then someone chimes in and says 0 doesn't have to contain all zeros..

    So what is that about?

    Also I have heard the same for (void *)0 or NULL, repreatedly..

    How can an integer have a different underlying representation but
    refer to that integer that it does not reflect?

    That is, is it possible to have something like:

    int foo = 0;

    And the underlying representation of that zero is actually:

    0x10000010 ? or some crazy nonsense like that.
    Mantorok Redgormor, Sep 13, 2003
    #1
    1. Advertising

  2. On Sat, 12 Sep 2003, Mantorok Redgormor wrote:
    >
    > I always see posts that involve the representation of integers, where
    > some poster claims that the unerlyding representation of an integer
    > doesn't have to reflect on the actual integer, for example:
    >
    > int foo = 0;
    >
    > 0 can be all zeros 0x00000000 or 00000000 00000000 00000000 00000000


    Sort-of-correct. In the C programming language,

    0 == 000 /* octal */
    0 == 0x000000 /* hex */
    0 == '\0' /* character constant */


    > Then someone chimes in and says 0 doesn't have to contain all zeros..


    Sort-of-correct. At the machine level, the mathematical concept of
    "zero" can be represented by any number of bit patterns. Sure, it's
    common (these days) to represent "zero" as 32 contiguous bits aligned
    in a particular way and all set to the low state; but "zero" on some
    machines might be 24 bits with the high-order 6 bits all set to 1, for
    parity checking; or something even more unconventional.

    The C language really doesn't care too much about the machine level.

    But the C standard library includes a few functions that *do* deal
    directly with the machine representation of concepts like "zero".
    For example, 'calloc'.


    > So what is that about?
    > Also I have heard the same for (void *)0 or NULL, repreatedly..


    Same idea. Some machines treat a pointer value with all-machine-level-
    -bits-zero as a "null pointer" that can't be dereferenced without
    a segfault or something similar. Other machines treat address "zero"
    just the same as any other address, and use a different "null pointer"
    representation at the machine level.

    Again, C really doesn't care. In a C program, if you write 0 or
    (void*)0 or NULL in a pointer context, you *will* get whatever "null
    pointer representation" is appropriate for your particular machine.
    You don't have to worry about this.

    > How can an integer have a different underlying representation but
    > refer to that integer that it does not reflect?


    "Does not reflect"? How does 00001100 "reflect" the concept of "twelve"
    more or less than 01110110? Two's complement is convenient for
    arithmetic, sure; but that doesn't mean it's convenient for *everything*.

    > That is, is it possible to have something like:
    >
    > int foo = 0;
    >
    > And the underlying representation of that zero is actually:
    >
    > 0x10000010 ? or some crazy nonsense like that.


    Yes. Except that of course you didn't mean "0x10000010"; the
    prefix "0x" conventionally indicates hexadecimal notation. What
    you probably meant was, "the bits 10000010." (Not that you
    *couldn't* have 64-bit integers with weird representations...)
    Anyway, AFAIK having an integer constant that big may invoke
    undefined behavior ('int' is only guaranteed to have 16 bits
    or more). So, in the spirit of pedantry, I must point out
    that

    if (0 == 0x10000010)
    {
    printf("Undefined behavior can do weird things to you.\n");
    }

    Bottom line: All-bits-zero does *not* mean value-zero. One
    concept is much lower-level than the other. Avoid 'memset'
    and 'calloc' for now, and I think you'll be okay.

    HTH,
    -Arthur
    Arthur J. O'Dwyer, Sep 13, 2003
    #2
    1. Advertising

  3. Mantorok Redgormor

    Tom Zych Guest

    "Arthur J. O'Dwyer" wrote:

    > Sort-of-correct. At the machine level, the mathematical concept of
    > "zero" can be represented by any number of bit patterns. Sure, it's
    > common (these days) to represent "zero" as 32 contiguous bits aligned
    > in a particular way and all set to the low state; but "zero" on some
    > machines might be 24 bits with the high-order 6 bits all set to 1, for
    > parity checking; or something even more unconventional.


    > Bottom line: All-bits-zero does *not* mean value-zero. One
    > concept is much lower-level than the other. Avoid 'memset'
    > and 'calloc' for now, and I think you'll be okay.


    This came up about a week ago, in the context of using memset() to
    zero an array, and Jack Klein posted this:

    [snip]
    > From 7.18.1.1 Exact-width integer types:


    > <quote>
    > The typedef name intN_t designates a signed integer type with width N,
    > no padding bits, and a two's complement representation. Thus, int8_t
    > denotes a signed integer type with a width of exactly 8 bits.


    > The typedef name uintN_t designates an unsigned integer type with
    > width N. Thus, uint24_t denotes an unsigned integer type with a width
    > of exactly 24 bits.
    > <end quote>


    > Since there are no sign bits an any unsigned int type, all that there
    > can be are value and padding bits. Since the exact width types
    > specifically have no padding, every bit in the unsigned form is a
    > value bit and all bits 0 in any exact width unsigned type must be the
    > one and only representation of the value 0 for that type.

    [snip]

    I'm not sure about his conclusion, though. I don't see anything in
    the quote that says zero has to be represented by all zero bits. It
    has to be a one-to-one mapping, but theoretically all zero bits
    could represent something else and zero could be represented by
    something else.

    --
    Tom Zych
    This email address will expire at some point to thwart spammers.
    Permanent address: echo '' | rot13
    Tom Zych, Sep 13, 2003
    #3
  4. On Sat, 13 Sep 2003, Tom Zych wrote:
    >
    > "Arthur J. O'Dwyer" wrote:
    >
    > > Sort-of-correct. At the machine level, the mathematical concept of
    > > "zero" can be represented by any number of bit patterns. Sure, it's
    > > common (these days) to represent "zero" as 32 contiguous bits aligned
    > > in a particular way and all set to the low state; but "zero" on some
    > > machines might be 24 bits with the high-order 6 bits all set to 1, for
    > > parity checking; or something even more unconventional.

    >
    > > Bottom line: All-bits-zero does *not* mean value-zero. One
    > > concept is much lower-level than the other. Avoid 'memset'
    > > and 'calloc' for now, and I think you'll be okay.

    >
    > This came up about a week ago, in the context of using memset() to
    > zero an array, and Jack Klein posted this:
    >

    [snip]
    >
    > I'm not sure about his conclusion, though. I don't see anything in
    > the quote that says zero has to be represented by all zero bits. It
    > has to be a one-to-one mapping, but theoretically all zero bits
    > could represent something else and zero could be represented by
    > something else.


    For the uint_XXt types defined in C99, yes, all-zero-bits means
    "zero". [That's because unsigned types are guaranteed to be
    represented in a binary notation; see N869, sec. 6.2.6.2#1.]
    This is *not* true for signed types (consider ones'-complement),
    the normal unsigned types (consider padding bits), pointers (well
    duh), floating-point types (well duh), or whatever I've missed
    in that list.

    All-zero-bits does mean "zero" for 'unsigned char', and for those
    special C99 types. But not in general.

    HTCTU,
    -Arthur
    Arthur J. O'Dwyer, Sep 13, 2003
    #4
  5. "Arthur J. O'Dwyer" <> wrote in message
    news:p...
    >
    > On Sat, 13 Sep 2003, Tom Zych wrote:
    > >
    > > "Arthur J. O'Dwyer" wrote:
    > >
    > > > Sort-of-correct. At the machine level, the mathematical concept of
    > > > "zero" can be represented by any number of bit patterns. Sure, it's
    > > > common (these days) to represent "zero" as 32 contiguous bits aligned
    > > > in a particular way and all set to the low state; but "zero" on some
    > > > machines might be 24 bits with the high-order 6 bits all set to 1, for
    > > > parity checking; or something even more unconventional.

    > >
    > > > Bottom line: All-bits-zero does *not* mean value-zero. One
    > > > concept is much lower-level than the other. Avoid 'memset'
    > > > and 'calloc' for now, and I think you'll be okay.

    > >
    > > This came up about a week ago, in the context of using memset() to
    > > zero an array, and Jack Klein posted this:
    > >

    > [snip]
    > >
    > > I'm not sure about his conclusion, though. I don't see anything in
    > > the quote that says zero has to be represented by all zero bits. It
    > > has to be a one-to-one mapping, but theoretically all zero bits
    > > could represent something else and zero could be represented by
    > > something else.


    There does seem to be a strong tendency in hardware design for the all zero
    bits to represent numerical zero. Consider that the biased representation
    commonly used for floating point exponents is not used for integer
    representation (on any machine that I know of).

    It is also common for the all zero bits floating point representation to be
    numerical zero. In IEEE floating point, which uses a hidden one, zero must
    have a special representation, I believe with all bits zero.

    In the traditional IBM S/360 floating point, without a hidden one, there are
    a variety of represenations with the numerical value of zero, but only one
    has the correct arithmetic properties. The Fortran AINT function which
    truncates the fractional part of a floating point number, but keeps it in
    floating point form, on S/360 consists of adding the constant 0*16**7, that
    is mantissa 0 and hexadecimal exponent of 7. Prenormalization will shift
    out any fraction bits, nothing will be added, and post normalization will be
    performed.

    For pointers, C doesn't require the NULL pointer representation to have all
    zero bits, but does make it more convenient. In protected mode x86 with
    large model, there must be some value for the segment selector. The
    hardware actually treats zero special in this case as the null selector. I
    would expect any processor built with the expectation of running C code to
    follow this convention, so that all zero bits would be the NULL pointer.

    On any processor that didn't happen to have all bits zero as its NULL
    representation, my choice would be to XOR the value with the desired
    representation on any conversions between (int) and pointer types. That
    would satisfy the NULL constant, and NULL pointer logical tests. It would
    fail for the use of memset(), but the standard doesn't guarantee that,
    anyway.

    -- glen
    Glen Herrmannsfeldt, Sep 13, 2003
    #5
  6. Mantorok Redgormor

    Jack Klein Guest

    On Sat, 13 Sep 2003 13:32:53 -0400 (EDT), "Arthur J. O'Dwyer"
    <> wrote in comp.lang.c:

    >
    > On Sat, 13 Sep 2003, Tom Zych wrote:
    > >
    > > "Arthur J. O'Dwyer" wrote:
    > >
    > > > Sort-of-correct. At the machine level, the mathematical concept of
    > > > "zero" can be represented by any number of bit patterns. Sure, it's
    > > > common (these days) to represent "zero" as 32 contiguous bits aligned
    > > > in a particular way and all set to the low state; but "zero" on some
    > > > machines might be 24 bits with the high-order 6 bits all set to 1, for
    > > > parity checking; or something even more unconventional.

    > >
    > > > Bottom line: All-bits-zero does *not* mean value-zero. One
    > > > concept is much lower-level than the other. Avoid 'memset'
    > > > and 'calloc' for now, and I think you'll be okay.

    > >
    > > This came up about a week ago, in the context of using memset() to
    > > zero an array, and Jack Klein posted this:
    > >

    > [snip]
    > >
    > > I'm not sure about his conclusion, though. I don't see anything in
    > > the quote that says zero has to be represented by all zero bits. It
    > > has to be a one-to-one mapping, but theoretically all zero bits
    > > could represent something else and zero could be represented by
    > > something else.

    >
    > For the uint_XXt types defined in C99, yes, all-zero-bits means
    > "zero". [That's because unsigned types are guaranteed to be
    > represented in a binary notation; see N869, sec. 6.2.6.2#1.]
    > This is *not* true for signed types (consider ones'-complement),
    > the normal unsigned types (consider padding bits), pointers (well
    > duh), floating-point types (well duh), or whatever I've missed
    > in that list.
    >
    > All-zero-bits does mean "zero" for 'unsigned char', and for those
    > special C99 types. But not in general.
    >
    > HTCTU,
    > -Arthur


    The are exactly two and only two situations in C where any integer
    type with a value of 0 can contain non-zero bits.

    1. The representation has padding bits, in which case one or more of
    the padding bits might be non-zero.

    2. The implementation uses 1's complement or signed magnitude
    representation for a signed integer type and that signed integer type
    contains the value of -0. Note that -0 may be a trap representation
    on such implementations rather than a valid value.

    The meaning from the standard is quite unmistakable:

    ========
    6.2.6.2 Integer types

    1 For unsigned integer types other than unsigned char, the bits of the
    object representation shall be divided into two groups: value bits and
    padding bits (there need not be any of the latter). If there are N
    value bits, each bit shall represent a different power of 2 between 1
    and 2N-1, so that objects of that type shall be capable of
    representing values from 0 to 2N - 1 using a pure binary
    representation; this shall be known as the value representation. The
    values of any padding bits are unspecified.
    ========

    The quotation above clearly states that the one and only possible
    representation for 0 in any unsigned integer type must be all value
    bits 0.

    ========
    2 For signed integer types, the bits of the object representation
    shall be divided into three groups: value bits, padding bits, and the
    sign bit. There need not be any padding bits; there shall be exactly
    one sign bit. Each bit that is a value bit shall have the same value
    as the same bit in the object representation of the corresponding
    unsigned type (if there are M value bits in the signed type and N in
    the unsigned type, then M <= N). If the sign bit is zero, it shall not
    affect the resulting value. If the sign bit is one, the value shall be
    modified in one of the following ways:

    — the corresponding value with sign bit 0 is negated (sign and
    magnitude);

    — the sign bit has the value -(2N) (two’s complement);

    — the sign bit has the value -(2N - 1) (one’s complement).

    Which of these applies is implementation-defined, as is whether the
    value with sign bit 1 and all value bits zero (for the first two), or
    with sign bit and all value bits 1 (for one’s complement), is a trap
    representation or a normal value. In the case of sign and
    magnitude and one’s complement, if this representation is a normal
    value it is called a negative zero.
    ========

    ....and the second quotation clearly states that any signed integer
    type with a positive value must have the same bit representation as
    the corresponding unsigned type in its value bits.

    So any signed or unsigned integer type with all its value bits 0 has
    the value of 0. A signed integer type with a value of 0 (as opposed
    to -0, on non 2's complement systems if they support -0) must also
    have its sign bit 0.

    So only padding bits, if any, may be non-zero in the representation of
    any C integer type with a value of 0.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c /faq
    Jack Klein, Sep 14, 2003
    #6
  7. On Sun, 14 Sep 2003 03:43:08 GMT, Jack Klein <>
    wrote:

    >1 For unsigned integer types other than unsigned char, the bits of the
    >object representation shall be divided into two groups: value bits and
    >padding bits (there need not be any of the latter). If there are N
    >value bits, each bit shall represent a different power of 2 between 1
    >and 2N-1, so that objects of that type shall be capable of
    >representing values from 0 to 2N - 1 using a pure binary
    >representation; this shall be known as the value representation. The
    >values of any padding bits are unspecified.



    Damn - there goes my hope to port C99 to my ternary computer.


    --
    #include <standard.disclaimer>
    _
    Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
    Per the FCA, this address may not be added to any commercial mail list
    Kevin D. Quitt, Sep 15, 2003
    #7
  8. Mantorok Redgormor

    LibraryUser Guest

    "Kevin D. Quitt" wrote:
    > Jack Klein <> wrote:
    >
    > > 1 For unsigned integer types other than unsigned char, the bits
    > > of the object representation shall be divided into two groups:
    > > value bits and padding bits (there need not be any of the
    > > latter). If there are N value bits, each bit shall represent a
    > > different power of 2 between 1 and 2N-1, so that objects of
    > > that type shall be capable of representing values from 0 to 2N
    > > - 1 using a pure binary representation; this shall be known as
    > > the value representation. The values of any padding bits are
    > > unspecified.

    >
    > Damn - there goes my hope to port C99 to my ternary computer.


    No, you can do it. You may have to restrict the actual trits you
    use. The problem is similar to implementing BCD arithmetic on
    hexadecimal units. :) For example excess 3 encoding will use
    the hex values 3 through 0x0c only, and others will be trap
    values.

    You may lose some efficiency though. :-[

    --
    Replies should be to the newsgroup
    Chuck Falconer, on vacation.
    LibraryUser, Sep 16, 2003
    #8
  9. On Sat, 13 Sep 2003, Arthur J. O'Dwyer wrote:

    >
    > On Sat, 12 Sep 2003, Mantorok Redgormor wrote:
    >
    > > That is, is it possible to have something like:
    > >
    > > int foo = 0;
    > >
    > > And the underlying representation of that zero is actually:
    > >
    > > 0x10000010 ? or some crazy nonsense like that.

    >
    > Yes. Except that of course you didn't mean "0x10000010"; the
    > prefix "0x" conventionally indicates hexadecimal notation. What
    > you probably meant was, "the bits 10000010." (Not that you
    > *couldn't* have 64-bit integers with weird representations...)
    > Anyway, AFAIK having an integer constant that big may invoke
    > undefined behavior ('int' is only guaranteed to have 16 bits
    > or more). So, in the spirit of pedantry, I must point out
    > that
    >
    > if (0 == 0x10000010)
    > {
    > printf("Undefined behavior can do weird things to you.\n");
    > }


    What? Hexadecimal 10000010 is invariably less than ULONG_MAX, so there's
    no problem (even with 16-bit ints).

    --
    =?ISO-8859-1?Q?Johan_Aur=E9r?=, Sep 16, 2003
    #9
  10. On Tue, 16 Sep 2003, [ISO-8859-1] Johan Aurér wrote:
    >
    > On Sat, 13 Sep 2003, Arthur J. O'Dwyer wrote:
    > > On Sat, 12 Sep 2003, Mantorok Redgormor wrote:
    > >
    > > > That is, is it possible to have something like:
    > > > int foo = 0;
    > > > And the underlying representation of that zero is actually:
    > > > 0x10000010 ? or some crazy nonsense like that.

    > >
    > > Yes. Except that of course you didn't mean "0x10000010"; the
    > > prefix "0x" conventionally indicates hexadecimal notation. What
    > > you probably meant was, "the bits 10000010." (Not that you
    > > *couldn't* have 64-bit integers with weird representations...)
    > > Anyway, AFAIK having an integer constant that big may invoke
    > > undefined behavior ('int' is only guaranteed to have 16 bits
    > > or more). So, in the spirit of pedantry, I must point out
    > > that
    > >
    > > if (0 == 0x10000010)
    > > {
    > > printf("Undefined behavior can do weird things to you.\n");
    > > }

    >
    > What? Hexadecimal 10000010 is invariably less than ULONG_MAX, so there's
    > no problem (even with 16-bit ints).


    Whoops. I forgot about the promotion to 'long' if it won't fit in
    an 'int'. So just tack a few more zeroes on the end of that number.

    -Arthur
    Arthur J. O'Dwyer, Sep 16, 2003
    #10
  11. On Tue, 16 Sep 2003 09:20:33 -0400, LibraryUser
    <> wrote:

    >No, you can do it. You may have to restrict the actual trits you
    >use. The problem is similar to implementing BCD arithmetic on
    >hexadecimal units. :) For example excess 3 encoding will use
    >the hex values 3 through 0x0c only, and others will be trap
    >values.
    >
    >You may lose some efficiency though. :-[


    That *does* explain why my Pentium is so slow and requires so much memory.
    Thanks!


    --
    #include <standard.disclaimer>
    _
    Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
    Per the FCA, this address may not be added to any commercial mail list
    Kevin D. Quitt, Sep 17, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mantorok Redgormor

    displaying underlying representation of integers

    Mantorok Redgormor, Sep 26, 2003, in forum: C Programming
    Replies:
    5
    Views:
    345
  2. Mantorok Redgormor

    representation of integers(again) very annoying..

    Mantorok Redgormor, Oct 13, 2003, in forum: C Programming
    Replies:
    4
    Views:
    447
    Barry Schwarz
    Oct 14, 2003
  3. =?iso-8859-9?Q?Tongu=E7?= Yumruk

    Bitmask representation of integers

    =?iso-8859-9?Q?Tongu=E7?= Yumruk, Oct 8, 2003, in forum: Python
    Replies:
    3
    Views:
    853
    Scott David Daniels
    Oct 8, 2003
  4. Guest
    Replies:
    3
    Views:
    362
    Cameron Laird
    Sep 9, 2004
  5. Shao Miller

    Re: Representation of integers

    Shao Miller, Jan 17, 2013, in forum: C Programming
    Replies:
    9
    Views:
    165
    Ben Bacarisse
    Jan 18, 2013
Loading...

Share This Page