Bit shifts and endianness

G

gamehack

Hi all,

I was thinking today, suppose we have the number
n = 0xAB 0xFF
which is equivalent to 44031 in decimal. In big endian it will be
stored as
10101011 11111111
but in little endian it will be
11111111 10101011
If we then apply a bit shift n << 2; that would give us completely
different numbers. So is actually a left bit shift done to the left on
a big-endian and left bit shift on little endian is actually a right
shift in the memory?

Thanks a lot

PS. What other problems arise from endianness? Are there any good
resources on that?
 
K

Keith Thompson

gamehack said:
I was thinking today, suppose we have the number
n = 0xAB 0xFF

That's two numbers. Do you mean 0xABFF?
which is equivalent to 44031 in decimal.

Apparently so.
In big endian it will be
stored as
10101011 11111111
but in little endian it will be
11111111 10101011
If we then apply a bit shift n << 2; that would give us completely
different numbers.

No. Shift operators are defined on the *values* of their operands,
not on their representations.
 
M

Mark McIntyre

No. Shift operators are defined on the *values* of their operands,
not on their representations.

Clarification: I assume you mean that shift operators operate on a
binary value, irrespective of how that is stored on disk or in memory.
I'm saying this because few humans would think of the value of 0xabff
as the value in binary. :)

(and please, don't start another one of those tedious "well patently
what I said is 'right', so you're a fool" discussions again, I really
don't want to have to express my opinion of people who can't accept
that there may be more than one way to skin a cat).
Mark McIntyre
 
R

Richard Heathfield

Mark McIntyre said:
Clarification: I assume you mean that shift operators operate on a
binary value,

No, they operate on a value. Values don't have number bases. That's merely a
representation issue.
 
S

slebetman

Mark said:
Clarification: I assume you mean that shift operators operate on a
binary value, irrespective of how that is stored on disk or in memory.
I'm saying this because few humans would think of the value of 0xabff
as the value in binary. :)

Yeah, I had this misconception before. 0xabff >> 8 always gives me
0x00ab regardless of weather the machine is big endian or little
endian. Endianness just defines weather 'ab' comes before or after 'ff'
in memory. It doesn't affect binary operations. Endianness is only an
issue when you are transferring data between machines via a file or a
network, not when you are doing the computing.
 
R

Richard Heathfield

(e-mail address removed) said:
0xabff >> 8 always gives me 0x00ab regardless of weather

Even so, I don't recommend that you try this in the rain.
 
K

Keith Thompson

Mark McIntyre said:
Clarification: I assume you mean that shift operators operate on a
binary value, irrespective of how that is stored on disk or in memory.
I'm saying this because few humans would think of the value of 0xabff
as the value in binary. :)

That's one way of looking at it.

The standard describes the result of a shift operator both as "E1
(left|right)-shifted E2 bit positions" and in terms of multiplication
or division by powers of 2. I think the latter is intended to define
what the standard means by the terms "(left|right)-shifted". The
standard could as easily have defined the operators purely in terms of
multiplication or division with no reference to shifting, with the
resulting wording describing exactly the same semantics.

One advantage of describing it this way is that it avoids the OP's
question; "<<" and ">>" don't depend on endianness any more than "+"
and "-" do, and there's no potential ambiguity about the meanings of
"left" and "right".
 
J

Jordan Abel

Mark McIntyre said:


No, they operate on a value. Values don't have number bases. That's merely a
representation issue.

The standard mentions "bits" in too many places for me to believe this.
 
C

Chris Torek

... Endianness is only an issue when you are transferring
data between machines via a file or a network ...

More precisely (and more correctly :) ), "endianness" becomes
an issue whenever someone or something takes a value apart,
so that there is a "before" and an "after", or a "left" and a
"right", or some other way of sequencing parts of the value.

For instance, imagine you are moving from one place of living to
another (e.g., moving apartments). Your car has room for 1/3 of
your bed, but not the whole thing. You take a saw to the bed and
cut it into thirds. You then transport each third to your new
location and reassemble it.

You need to reassemble it in the same order you took it apart, or
it will likely be quite uncomfortable to sleep in. :)

Endianness in computers arises when you allow the machine to take
a value apart and ship it somewhere, and then allow some other
machine to put it together again. If the two machines use the same
order, all is well, but if they use different orders, your bed is
no longer very useful.

There are a number of ways to handle this, but the simplest is:
do the disassembly and reassembly yourself. Instead of letting
the machine do it in "machine order", do it yourself and control
the order.
 
E

Eric Sosman

Richard said:
(e-mail address removed) said:




Even so, I don't recommend that you try this in the rain.

.... lest the negative sign give you a positive charge ...
 
E

Eric Sosman

Jordan said:
The standard mentions "bits" in too many places for me to believe this.

Yeah. And there's also the requirement for a "pure binary"
representation. Nonetheless, I stick with the Heathfield Maxim:
operators operate on values, not on representations. When the
Standard describes ^ or >> in terms of bits, it does so merely
as a convenience, as an appeal to the readers' understanding of
binary arithmetic. Nowhere (that I can see) is there a requirement
that the arithmetic actually be carried out bit-by-bit or even
bits-in-parallel; it's just easier for the Standard to describe
`x & 17' by referring to binary digits than by avoiding such a
reference. (Quick: Describe `x & 17' without reference to bits.
It can surely be done, but I expect the description would be
verbose and opaque even by the standards of Standardese.)

Let's put it this way: Imagine a machine with two "banks"
of memory, whose stoned-out-of-their-gourds designers have
decided should have different endiannesses. If `x' is in the
BigEndian bank and `y' in the LittleEndian, what is the effect
of `x = 1; y = x << 1;'? I claim that the effect must be the
same as `x = 1; y = 2;', or else the implementation is broken.
The fact that the representations differ just doesn't matter.
 
K

Keith Thompson

Jordan Abel said:
The standard mentions "bits" in too many places for me to believe this.

The standard imposes some fairly stringent requirements on how
integer, both signed and unsigned, are represented. These
requirements are tight enough to guarantee (I think) that the
"multiple or divide by powers of 2" and "shift by N bits" models of
the "<<" and ">>" operators turn out to be equivalent (with the
proviso that any padding bits don't participate).

Once could imagine a hypothetical C standard that doesn't place any
requirements on the representations of integers, but that defines "<<"
and ">>" in much the same way as the actual standard does. An
implementation conforming to this hypothetical standard could use,
say, a trinary hardware representation for integers, but would still
have to implement "<<" and ">>" so they yield the same results as on a
binary machine. (In fact, such an implementation would probably be
legal under the actual C standard as long as it does a sufficiently
good job of hiding the trinary hardware behind a binary-looking
interface.)
 
K

Keith Thompson

Keith Thompson said:
The standard imposes some fairly stringent requirements on how
integer, both signed and unsigned, are represented. These
requirements are tight enough to guarantee (I think) that the
"multiple or divide by powers of 2" and "shift by N bits" models of
the "<<" and ">>" operators turn out to be equivalent (with the
proviso that any padding bits don't participate).

Since not everyone has a copy of the standard, here's what C99 says in
its description of the bitwise shift operators. (I've replaced the
multiplication symbol 'x' with "*", and the superscript exponent with
a "**" operator.)

The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If the
value of the right operand is negative or is greater than or equal
to the width of the promoted left operand, the behavior is
undefined.

The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros. If E1 has an unsigned type,
the value of the result is E1 * 2**E2, reduced modulo one more
than the maximum value representable in the result type. If E1 has
a signed type and nonnegative value, and E1 * 2**E2 is
representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the
quotient of E1 / 2**E2. If E1 has a signed type and a negative
value, the resulting value is implementation-defined.
 
K

Keith Thompson

Eric Sosman said:
Yeah. And there's also the requirement for a "pure binary"
representation. Nonetheless, I stick with the Heathfield Maxim:
operators operate on values, not on representations. When the
Standard describes ^ or >> in terms of bits, it does so merely
as a convenience, as an appeal to the readers' understanding of
binary arithmetic. Nowhere (that I can see) is there a requirement
that the arithmetic actually be carried out bit-by-bit or even
bits-in-parallel; it's just easier for the Standard to describe
`x & 17' by referring to binary digits than by avoiding such a
reference. (Quick: Describe `x & 17' without reference to bits.
It can surely be done, but I expect the description would be
verbose and opaque even by the standards of Standardese.)

I wouldn't try to describe x & 17 without reference to bits. Rather,
I'd define "bits" in terms of the arithmetic properties of the number,
and *then* define x & 17 in terms of the "bits" I've defined.
Something along the lines of:

An integer value in the range 0..2**(n-1) can be uniquely defined
as a sum of powers of two: ... + b3 * 2**3 + b2 * 2**2 + b1 * 2**1
+ b0 * 2**0. Each bn value is referred to as the nth _bit_ of the
integer value.

With this definition, it becomes irrelevant whether these "bits"
correspond to some on-off state in hardware somewhere; the description
applies equally well to numbers represented in decimal or trinary.
 
E

Eric Sosman

Keith said:
Eric Sosman said:
[...] (Quick: Describe `x & 17' without reference to bits.
It can surely be done, but I expect the description would be
verbose and opaque even by the standards of Standardese.)


I wouldn't try to describe x & 17 without reference to bits. Rather,
I'd define "bits" in terms of the arithmetic properties of the number,
and *then* define x & 17 in terms of the "bits" I've defined.
Something along the lines of:

An integer value in the range 0..2**(n-1) can be uniquely defined
as a sum of powers of two: ... + b3 * 2**3 + b2 * 2**2 + b1 * 2**1
+ b0 * 2**0. Each bn value is referred to as the nth _bit_ of the
integer value.

With this definition, it becomes irrelevant whether these "bits"
correspond to some on-off state in hardware somewhere; the description
applies equally well to numbers represented in decimal or trinary.

Right, and that's the point: C's operators[*] are defined
in terms of the values of the operands and the value of the
result. They are not defined in terms of the representations
of those values, but in terms of the values themselves, however
represented.

[*] "Ordinary" operators, anyhow. Special operators like
`.' don't fit well with this way of understanding -- but not
even these oddballs work with the representations of their
operands.

As a concrete example of a situation where values are
represented without any individual "bit-artifacts" being
present, consider the data stream between two modems. If I
understand correctly (I'm not a modem maven), each change in
the signal encodes an entire group of bits: one phase shift
represents 000, another phase shift represents 010, and so
on. "Value bits" are transmitted, but no isolated "signal
bits" are discernible.
 
J

Jordan Abel

Keith said:
Eric Sosman said:
[...] (Quick: Describe `x & 17' without reference to bits.
It can surely be done, but I expect the description would be
verbose and opaque even by the standards of Standardese.)


I wouldn't try to describe x & 17 without reference to bits. Rather,
I'd define "bits" in terms of the arithmetic properties of the number,
and *then* define x & 17 in terms of the "bits" I've defined.
Something along the lines of:

An integer value in the range 0..2**(n-1) can be uniquely defined
as a sum of powers of two: ... + b3 * 2**3 + b2 * 2**2 + b1 * 2**1
+ b0 * 2**0. Each bn value is referred to as the nth _bit_ of the
integer value.

With this definition, it becomes irrelevant whether these "bits"
correspond to some on-off state in hardware somewhere; the description
applies equally well to numbers represented in decimal or trinary.

Right, and that's the point: C's operators[*] are defined
in terms of the values of the operands and the value of the
result. They are not defined in terms of the representations of
those values, but in terms of the values themselves, however
represented.

[*] "Ordinary" operators, anyhow. Special operators like
`.' don't fit well with this way of understanding -- but not even
these oddballs work with the representations of their operands.

As a concrete example of a situation where values are
represented without any individual "bit-artifacts" being present,
consider the data stream between two modems. If I understand
correctly (I'm not a modem maven), each change in the signal
encodes an entire group of bits: one phase shift represents 000,
another phase shift represents 010, and so on. "Value bits" are
transmitted, but no isolated "signal bits" are discernible.

That doesn't mean it's not a "binary value" - which, if you were
paying attention, is the claim you are attempting to refute.
 
E

Eric Sosman

Jordan said:
That doesn't mean it's not a "binary value" - which, if you were
paying attention, is the claim you are attempting to refute.

Thank you, Jordan, for drawing my attention to my
inattention. Tracing back the thread, though, I find
this remark from Mark McIntyre:
> No, they operate on a value. Values don't have number bases.
> That's merely a representation issue.

.... to which you responded:
> The standard mentions "bits" in too many places for me to
> believe this.

McIntyre says the shift operators work on values, not
representations, and you say you don't believe him. That's
what I'm trying to refute: I add my voice to the McIntyre
(and Heathfield) chorus to say that you are mistaken.

But perhaps the "this" in your response referred to
something else altogether?
 
W

Walter Roberson

As a concrete example of a situation where values are
represented without any individual "bit-artifacts" being
present, consider the data stream between two modems. If I
understand correctly (I'm not a modem maven), each change in
the signal encodes an entire group of bits: one phase shift
represents 000, another phase shift represents 010, and so
on. "Value bits" are transmitted, but no isolated "signal
bits" are discernible.

There are a number of different encoding mechanisms used for
modems designed for use with telephone lines (the term 'modem'
applies to a number of other situations as well.) The first standards,
for 110 and 300 baud, were straight single-frequency carriers with
no phase encoding. 1200 baud was, if I recall correctly, dual frequency
but not phase encoded. All of the CCITT standard encodings from 2400 baud
upwards rely upon phase encoding. The proprietary Telebit protocols
worked rather differently (128 or 256 frequency channels but
individual signals were held for long periods.)

It is not exactly accurate to say that -each- change in the signal
encodes a group of bits; rather, the signal is sampled at several
intervals along the cycle, and the details of the "phase breaking"
that is imposed on the base sine wave over that interval encodes
a cluster of bits in a completely non-linear non-binary manner
[i.e., there is no single aspect of the phase changes that encodes
any one particular bit.]

Especially for the higher speeds, extra "logical" bits are encoded into
the signal, and the decoded group of bits is run through a
"constellation" corrector to find the most probable original bit group.
In theory the constellation corrector could map to a bit grouping that
had no obvious resemblance to the bits read off of the signal -- with
the non-linear encoding of bits and taking into account the need for
some slosh while the signal slews, the constellation correction could
decide that one should have chosen a different non-linear decoding.


Another example of non-binary encoding is gigabit ethernet, which
starts with the signaling mechanisms used for 100 megabit ethernet,
raised the carrier frequency from 100 megahertz to 125 megahertz,
but then does phase encoding and constellation correction that extracts
10 databits from each 16 signal values decoded out of the wave.
 
R

Richard Heathfield

Eric Sosman said:
Thank you, Jordan, for drawing my attention to my
inattention. Tracing back the thread, though, I find
this remark from Mark McIntyre:

No, those are my words, not Mark's.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top