Bit shifts and endianness

Discussion in 'C Programming' started by gamehack, Jan 5, 2006.

  1. gamehack

    gamehack Guest

    Hi all,

    I was thinking today, suppose we have the number
    n = 0xAB 0xFF
    which is equivalent to 44031 in decimal. In big endian it will be
    stored as
    10101011 11111111
    but in little endian it will be
    11111111 10101011
    If we then apply a bit shift n << 2; that would give us completely
    different numbers. So is actually a left bit shift done to the left on
    a big-endian and left bit shift on little endian is actually a right
    shift in the memory?

    Thanks a lot

    PS. What other problems arise from endianness? Are there any good
    resources on that?
    gamehack, Jan 5, 2006
    1. Advertisements

  2. That's two numbers. Do you mean 0xABFF?
    Apparently so.
    No. Shift operators are defined on the *values* of their operands,
    not on their representations.
    Keith Thompson, Jan 5, 2006
    1. Advertisements

  3. gamehack

    gamehack Guest

    Sorry, I meant 0xABFF and just separated them as distinct bytes.
    gamehack, Jan 5, 2006
  4. Clarification: I assume you mean that shift operators operate on a
    binary value, irrespective of how that is stored on disk or in memory.
    I'm saying this because few humans would think of the value of 0xabff
    as the value in binary. :)

    (and please, don't start another one of those tedious "well patently
    what I said is 'right', so you're a fool" discussions again, I really
    don't want to have to express my opinion of people who can't accept
    that there may be more than one way to skin a cat).
    Mark McIntyre
    Mark McIntyre, Jan 5, 2006
  5. Mark McIntyre said:
    No, they operate on a value. Values don't have number bases. That's merely a
    representation issue.
    Richard Heathfield, Jan 5, 2006
  6. gamehack

    slebetman Guest

    Yeah, I had this misconception before. 0xabff >> 8 always gives me
    0x00ab regardless of weather the machine is big endian or little
    endian. Endianness just defines weather 'ab' comes before or after 'ff'
    in memory. It doesn't affect binary operations. Endianness is only an
    issue when you are transferring data between machines via a file or a
    network, not when you are doing the computing.
    slebetman, Jan 5, 2006
  7. said:
    Even so, I don't recommend that you try this in the rain.
    Richard Heathfield, Jan 5, 2006
  8. That's one way of looking at it.

    The standard describes the result of a shift operator both as "E1
    (left|right)-shifted E2 bit positions" and in terms of multiplication
    or division by powers of 2. I think the latter is intended to define
    what the standard means by the terms "(left|right)-shifted". The
    standard could as easily have defined the operators purely in terms of
    multiplication or division with no reference to shifting, with the
    resulting wording describing exactly the same semantics.

    One advantage of describing it this way is that it avoids the OP's
    question; "<<" and ">>" don't depend on endianness any more than "+"
    and "-" do, and there's no potential ambiguity about the meanings of
    "left" and "right".
    Keith Thompson, Jan 5, 2006
  9. gamehack

    Jordan Abel Guest

    The standard mentions "bits" in too many places for me to believe this.
    Jordan Abel, Jan 6, 2006
  10. gamehack

    Chris Torek Guest

    More precisely (and more correctly :) ), "endianness" becomes
    an issue whenever someone or something takes a value apart,
    so that there is a "before" and an "after", or a "left" and a
    "right", or some other way of sequencing parts of the value.

    For instance, imagine you are moving from one place of living to
    another (e.g., moving apartments). Your car has room for 1/3 of
    your bed, but not the whole thing. You take a saw to the bed and
    cut it into thirds. You then transport each third to your new
    location and reassemble it.

    You need to reassemble it in the same order you took it apart, or
    it will likely be quite uncomfortable to sleep in. :)

    Endianness in computers arises when you allow the machine to take
    a value apart and ship it somewhere, and then allow some other
    machine to put it together again. If the two machines use the same
    order, all is well, but if they use different orders, your bed is
    no longer very useful.

    There are a number of ways to handle this, but the simplest is:
    do the disassembly and reassembly yourself. Instead of letting
    the machine do it in "machine order", do it yourself and control
    the order.
    Chris Torek, Jan 6, 2006
  11. gamehack

    Eric Sosman Guest

    .... lest the negative sign give you a positive charge ...
    Eric Sosman, Jan 6, 2006
  12. gamehack

    Eric Sosman Guest

    Yeah. And there's also the requirement for a "pure binary"
    representation. Nonetheless, I stick with the Heathfield Maxim:
    operators operate on values, not on representations. When the
    Standard describes ^ or >> in terms of bits, it does so merely
    as a convenience, as an appeal to the readers' understanding of
    binary arithmetic. Nowhere (that I can see) is there a requirement
    that the arithmetic actually be carried out bit-by-bit or even
    bits-in-parallel; it's just easier for the Standard to describe
    `x & 17' by referring to binary digits than by avoiding such a
    reference. (Quick: Describe `x & 17' without reference to bits.
    It can surely be done, but I expect the description would be
    verbose and opaque even by the standards of Standardese.)

    Let's put it this way: Imagine a machine with two "banks"
    of memory, whose stoned-out-of-their-gourds designers have
    decided should have different endiannesses. If `x' is in the
    BigEndian bank and `y' in the LittleEndian, what is the effect
    of `x = 1; y = x << 1;'? I claim that the effect must be the
    same as `x = 1; y = 2;', or else the implementation is broken.
    The fact that the representations differ just doesn't matter.
    Eric Sosman, Jan 6, 2006
  13. The standard imposes some fairly stringent requirements on how
    integer, both signed and unsigned, are represented. These
    requirements are tight enough to guarantee (I think) that the
    "multiple or divide by powers of 2" and "shift by N bits" models of
    the "<<" and ">>" operators turn out to be equivalent (with the
    proviso that any padding bits don't participate).

    Once could imagine a hypothetical C standard that doesn't place any
    requirements on the representations of integers, but that defines "<<"
    and ">>" in much the same way as the actual standard does. An
    implementation conforming to this hypothetical standard could use,
    say, a trinary hardware representation for integers, but would still
    have to implement "<<" and ">>" so they yield the same results as on a
    binary machine. (In fact, such an implementation would probably be
    legal under the actual C standard as long as it does a sufficiently
    good job of hiding the trinary hardware behind a binary-looking
    Keith Thompson, Jan 6, 2006
  14. Since not everyone has a copy of the standard, here's what C99 says in
    its description of the bitwise shift operators. (I've replaced the
    multiplication symbol 'x' with "*", and the superscript exponent with
    a "**" operator.)

    The integer promotions are performed on each of the operands. The
    type of the result is that of the promoted left operand. If the
    value of the right operand is negative or is greater than or equal
    to the width of the promoted left operand, the behavior is

    The result of E1 << E2 is E1 left-shifted E2 bit positions;
    vacated bits are filled with zeros. If E1 has an unsigned type,
    the value of the result is E1 * 2**E2, reduced modulo one more
    than the maximum value representable in the result type. If E1 has
    a signed type and nonnegative value, and E1 * 2**E2 is
    representable in the result type, then that is the resulting
    value; otherwise, the behavior is undefined.

    The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
    has an unsigned type or if E1 has a signed type and a nonnegative
    value, the value of the result is the integral part of the
    quotient of E1 / 2**E2. If E1 has a signed type and a negative
    value, the resulting value is implementation-defined.
    Keith Thompson, Jan 6, 2006
  15. I wouldn't try to describe x & 17 without reference to bits. Rather,
    I'd define "bits" in terms of the arithmetic properties of the number,
    and *then* define x & 17 in terms of the "bits" I've defined.
    Something along the lines of:

    An integer value in the range 0..2**(n-1) can be uniquely defined
    as a sum of powers of two: ... + b3 * 2**3 + b2 * 2**2 + b1 * 2**1
    + b0 * 2**0. Each bn value is referred to as the nth _bit_ of the
    integer value.

    With this definition, it becomes irrelevant whether these "bits"
    correspond to some on-off state in hardware somewhere; the description
    applies equally well to numbers represented in decimal or trinary.
    Keith Thompson, Jan 6, 2006
  16. gamehack

    Eric Sosman Guest

    Right, and that's the point: C's operators[*] are defined
    in terms of the values of the operands and the value of the
    result. They are not defined in terms of the representations
    of those values, but in terms of the values themselves, however

    [*] "Ordinary" operators, anyhow. Special operators like
    `.' don't fit well with this way of understanding -- but not
    even these oddballs work with the representations of their

    As a concrete example of a situation where values are
    represented without any individual "bit-artifacts" being
    present, consider the data stream between two modems. If I
    understand correctly (I'm not a modem maven), each change in
    the signal encodes an entire group of bits: one phase shift
    represents 000, another phase shift represents 010, and so
    on. "Value bits" are transmitted, but no isolated "signal
    bits" are discernible.
    Eric Sosman, Jan 6, 2006
  17. gamehack

    Jordan Abel Guest

    That doesn't mean it's not a "binary value" - which, if you were
    paying attention, is the claim you are attempting to refute.
    Jordan Abel, Jan 6, 2006
  18. gamehack

    Eric Sosman Guest

    Thank you, Jordan, for drawing my attention to my
    inattention. Tracing back the thread, though, I find
    this remark from Mark McIntyre:
    .... to which you responded:
    McIntyre says the shift operators work on values, not
    representations, and you say you don't believe him. That's
    what I'm trying to refute: I add my voice to the McIntyre
    (and Heathfield) chorus to say that you are mistaken.

    But perhaps the "this" in your response referred to
    something else altogether?
    Eric Sosman, Jan 6, 2006
  19. There are a number of different encoding mechanisms used for
    modems designed for use with telephone lines (the term 'modem'
    applies to a number of other situations as well.) The first standards,
    for 110 and 300 baud, were straight single-frequency carriers with
    no phase encoding. 1200 baud was, if I recall correctly, dual frequency
    but not phase encoded. All of the CCITT standard encodings from 2400 baud
    upwards rely upon phase encoding. The proprietary Telebit protocols
    worked rather differently (128 or 256 frequency channels but
    individual signals were held for long periods.)

    It is not exactly accurate to say that -each- change in the signal
    encodes a group of bits; rather, the signal is sampled at several
    intervals along the cycle, and the details of the "phase breaking"
    that is imposed on the base sine wave over that interval encodes
    a cluster of bits in a completely non-linear non-binary manner
    [i.e., there is no single aspect of the phase changes that encodes
    any one particular bit.]

    Especially for the higher speeds, extra "logical" bits are encoded into
    the signal, and the decoded group of bits is run through a
    "constellation" corrector to find the most probable original bit group.
    In theory the constellation corrector could map to a bit grouping that
    had no obvious resemblance to the bits read off of the signal -- with
    the non-linear encoding of bits and taking into account the need for
    some slosh while the signal slews, the constellation correction could
    decide that one should have chosen a different non-linear decoding.

    Another example of non-binary encoding is gigabit ethernet, which
    starts with the signaling mechanisms used for 100 megabit ethernet,
    raised the carrier frequency from 100 megahertz to 125 megahertz,
    but then does phase encoding and constellation correction that extracts
    10 databits from each 16 signal values decoded out of the wave.
    Walter Roberson, Jan 6, 2006
  20. Eric Sosman said:
    No, those are my words, not Mark's.
    Richard Heathfield, Jan 6, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.