Bitwise operators.

D

dspfun

C doesn't specify the bit-layout in memory of the different types, so
how are the operations ( & | ~ ^ << >>) defined to modify the objects
(which does not have its bit-layout defined) ?

6.2.6.1p2:
Except for bit-fields, objects are composed of contiguous sequences of
one or more bytes,
the number, order, and encoding of which are either explicitly
specified or
implementation-defined.

Lets say for example:

unsigned int a = 6;
a = a & 3;

Do we know anything about the bit-layout of a?

Or do we simply know that the value of a is 2 after the last
statement, and we know nothing about the bit-layout? But why the name
bitwise operators if we don't know which bit the operator and operand
modifies?

What is the benefit of using the bit-operators if we do not know the
bit-layout of objects?
 
B

Ben Pfaff

dspfun said:
C doesn't specify the bit-layout in memory of the different types, so
how are the operations ( & | ~ ^ << >>) defined to modify the objects
(which does not have its bit-layout defined) ?

The bitwise operators are defined in terms of values.
Lets say for example:

unsigned int a = 6;
a = a & 3;

Do we know anything about the bit-layout of a?
No.

Or do we simply know that the value of a is 2 after the last
statement, and we know nothing about the bit-layout?
Yes.

But why the name bitwise operators if we don't know which bit
the operator and operand modifies?

We know that the statement modified the bit with value 4.
 
D

dspfun

The bitwise operators are defined in terms of values.

Ok, thanks!

I believe it is the following in the standard the specifies this? :

6.5p4
Some operators (the unary operator ~, and the binary operators <<, >>,
&, ^, and |,
collectively described as bitwise operators) are required to have
operands that have
integer type. These operators yield values that depend on the internal
representations of
integers, and have implementation-defined and undefined aspects for
signed types.
 
T

Tomás Ó hÉilidhe

dspfun:
C doesn't specify the bit-layout in memory of the different types, so
how are the operations ( & | ~ ^ << >>) defined to modify the objects
(which does not have its bit-layout defined) ?


C *does* in fact specify the bit-layout of integer types. It doesn't do
so in plain English, but it is inferred from other statements (such as
the fact that shifting to the left is the same as multiplying by powers
of two). It uses simple canonical binary:

0 = 0
1 = 1
2 = 10
3 = 11
4 = 100
5 = 101
6 = 110
7 = 111
8 = 1000

6.2.6.1p2:
Except for bit-fields, objects are composed of contiguous sequences of
one or more bytes,
the number, order, and encoding of which are either explicitly
specified or
implementation-defined.

Lets say for example:

unsigned int a = 6;
a = a & 3;

Do we know anything about the bit-layout of a?


At initialisation, a is 110. After the assignment, it's 10.

Or do we simply know that the value of a is 2 after the last
statement, and we know nothing about the bit-layout?


We are certain that the value is two and that the bit layout is 10.

But why the name
bitwise operators if we don't know which bit the operator and operand
modifies?

What is the benefit of using the bit-operators if we do not know the
bit-layout of objects?


Fortunately for unsigned integer types, we know the bit patterns
exactly. For signed integer types, we must take the number system into
account, but this can easily be determined.
 
W

Walter Roberson

C *does* in fact specify the bit-layout of integer types. It doesn't do
so in plain English, but it is inferred from other statements (such as
the fact that shifting to the left is the same as multiplying by powers
of two).

Those operations are -defined- by their result on *values*, not by
bit representation. If I recall correctly, ~ is the only operation
defined in terms of bits rather than in terms of values.

Because the operations are defined in terms of values, it doesn't
matter to C what the bit ordering is in memory. For example, it
does not say whether in a two-octet integeral type whether the
internal order is AAAAAAAABBBBBBBB or BBBBBBBBAAAAAAAA or
perhaps something else completely.

If C did in fact specify the bit-layout of integer types, then
there could not be both "big-endian" and "little-endian"
conformant C implementations: if the bit-layout was specified by C,
only one layout would be valid.
 
E

Eric Sosman

Tomás Ó hÉilidhe said:
dspfun:
C doesn't specify the bit-layout in memory of the different types, so
how are the operations ( & | ~ ^ << >>) defined to modify the objects
(which does not have its bit-layout defined) ?


C *does* in fact specify the bit-layout of integer types. It doesn't do
so in plain English, but it is inferred from other statements (such as
the fact that shifting to the left is the same as multiplying by powers
of two). [...]

No. I've used this "thought experiment" before, but maybe
it's time to trot it out again: Tomás, I have built a computer
that uses four-state components instead of the more traditional
two-state parts. Internally, therefore, numbers in my computer
are represented in base 4. Questions:

1) Since the "4's bit" and the "8's bit" both inhabit the
"4's quit," which is to the left of the other?

2) Propose a C program to demonstrate that I am lying
about using four-state parts, and am in fact using plain old
two-state components like everybody else.
 
R

Robert Latest

dspfun said:
These operators yield values that depend on the internal
representations of [unsigned] integers

Question is: Is --for instance-- (x << 2) == (x * 2) guaranteed? I thought
so.

robert
 
P

Philip Potter

Robert said:
dspfun said:
These operators yield values that depend on the internal
representations of [unsigned] integers

Question is: Is --for instance-- (x << 2) == (x * 2) guaranteed? I thought
so.

Assuming you meant (x<<1) == (x*2),

Consider the case x of type signed char, sign-magnitude representation
and x==-1, and 8-bit chars.

sign magnitude
-1 : 1 0000001 == -1
-1*2 : 1 0000010 == -2
-1<<1: 0 0000010 == 2

Of course, n1256 6.5.7p4 states that negative numbers shifted left cause
undefined behaviour, so -1 << 1 could yield any value, or even crash
your computer.

I believe that if you add the following conditions:
* x is nonnegative
* x is of integer type
* x*2 falls within the defined range of x's type

then (x<<1) == (x*2) is guaranteed to be true.

But the compiler knows this anyway, so you don't gain anything by using
x<<1 when you should have used x*2 for clarity.
 
P

Philip Potter

Walter said:
Those operations are -defined- by their result on *values*, not by
bit representation. If I recall correctly, ~ is the only operation
defined in terms of bits rather than in terms of values.

Because the operations are defined in terms of values, it doesn't
matter to C what the bit ordering is in memory. For example, it
does not say whether in a two-octet integeral type whether the
internal order is AAAAAAAABBBBBBBB or BBBBBBBBAAAAAAAA or
perhaps something else completely.

If C did in fact specify the bit-layout of integer types, then
there could not be both "big-endian" and "little-endian"
conformant C implementations: if the bit-layout was specified by C,
only one layout would be valid.

I disagree; C defines that certain bits have certain values, and this is
what we mean by "bit layout". We write them down in a canonical way:
sign bit, then MSB to LSB, in order from left to right, but there is no
guarantee they are stored in the machine this way. This doesn't matter.

In a little-endian system, a 16-bit short represented in a hex-editor as
ff 00 has the binary value 0000000011111111. The fact that its bytes are
conventionally written down in a different order to the bits don't faze
me; the bytes are written in order of address, while bits are written in
order of significance. Each convention is, in some sense, correct.

In fact, for the bitwise operators ~ & | ^ the order is irrelevant since
each bit's resultant value is dependent only on the corresponding bit in
the operand(s) and not on any other bit. For E1 << E2 and E1 >> E2, when
their behaviour is defined, it is exactly equivalent to shifting the E1
left or right by E2 bits, provided that E1 is written using the above
canonical bit representation.

IOW, I believe the definition of >> in terms of arithmetic defines the
bit-layout, and I have no problem with this disagreeing with the byte
layout. After all, byte layout is only of concern to the memory, while
bit layout is only of concern to the processor.

Phil
 
P

Philip Potter

Eric said:
Tomás Ó hÉilidhe said:
dspfun:
C doesn't specify the bit-layout in memory of the different types, so
how are the operations ( & | ~ ^ << >>) defined to modify the objects
(which does not have its bit-layout defined) ?

C *does* in fact specify the bit-layout of integer types. It doesn't do
so in plain English, but it is inferred from other statements (such as
the fact that shifting to the left is the same as multiplying by powers
of two). [...]

No. I've used this "thought experiment" before, but maybe
it's time to trot it out again: Tomás, I have built a computer
that uses four-state components instead of the more traditional
two-state parts. Internally, therefore, numbers in my computer
are represented in base 4. Questions:

1) Since the "4's bit" and the "8's bit" both inhabit the
"4's quit," which is to the left of the other?

2) Propose a C program to demonstrate that I am lying
about using four-state parts, and am in fact using plain old
two-state components like everybody else.

You are quite correct, though to be a valid C implementation your
quit-based machine would have to emulate the bitwise operators in the
same way as if they operated on a bit-based machine.

My view (already expressed elsethread in response to Walter) is that you
can think of C as specifying a bit-layout in its abstract machine, and
although implmementations are free to use any techniques that produce
the same result, your mental model will still be appropriate and produce
correct reasoning.

Perhaps I am too strong to state that C does define a bit-layout;
rather, that it allows one to assume that integers are in a particular
bit-layout in order to reason about programs.

(The fact that << and >> do not always have defined behaviour vexes my
argument somewhat, though. When they are defined, it's perfectly fine to
reason about them in terms of bit-shifting within a standard bit-layout
rather than in terms of arithmetic; but you still need to remember when
their behaviour is defined, which is defined in terms of arithmetic
properties. I'm going to have to go away and think about this one.)

Phil
 
T

Tomás Ó hÉilidhe

Walter Roberson:
Those operations are -defined- by their result on *values*, not by
bit representation. If I recall correctly, ~ is the only operation
defined in terms of bits rather than in terms of values.

Because the operations are defined in terms of values, it doesn't
matter to C what the bit ordering is in memory.



Indeed, you could store 111000111000 as follows in memory: 000111000111,
or you could store it as 101010101010, or maybe 010101010101. It doesn't
matter what order the actual physical capacitors are in in RAM, just so
long as the bitwise operators behave as though it were 111000111000.


And as for the ~ operator. Well it's behaviour on unsigned int is:

~val == UINT_MAX ^ val

For example, it
does not say whether in a two-octet integeral type whether the
internal order is AAAAAAAABBBBBBBB or BBBBBBBBAAAAAAAA or
perhaps something else completely.


You're right it doesn't. But that doesn't matter because we know how all
the arithmetic operators will behave. (That's of course assuming we're
not using union tricks or anything of the like to play around with bit
orders).

If C did in fact specify the bit-layout of integer types, then
there could not be both "big-endian" and "little-endian"
conformant C implementations: if the bit-layout was specified by C,
only one layout would be valid.


The actual physical bit layout is irrelevant. You could have 3
capacitors on one side of the room in the shape of a triangle, and 5 on
the other side in the shape of a pentagon. Doesn't matter what order the
bits are in, so long as the implementation behaves as though they were
lain out in canonical binary.
 
T

Tomás Ó hÉilidhe

Eric Sosman:

No. I've used this "thought experiment" before, but maybe
it's time to trot it out again: Tomás, I have built a computer
that uses four-state components instead of the more traditional
two-state parts.


This is quite simply implemented. In memory, instead of having one
capacitor to indicate one bit's value, you could have two's capacitors
to indicate four bits' value.

Internally, therefore, numbers in my computer
are represented in base 4. Questions:

1) Since the "4's bit" and the "8's bit" both inhabit the
"4's quit," which is to the left of the other?


Not to sound caustic, but that's irrelevant. The capacitors could be
lightyears apart, some of them could make out the shape of a poodle
while others could make out the shape of a salmon.

2) Propose a C program to demonstrate that I am lying
about using four-state parts, and am in fact using plain old
two-state components like everybody else.



In C we have bitwise operators. We don't have bi-bitwise operators.

In order for your machine to be able to do bitwise stuff, it must have
some sort of emulation.
 
C

CBFalconer

Robert said:
dspfun said:
These operators yield values that depend on the internal
representations of [unsigned] integers

Question is: Is --for instance-- (x << 2) == (x * 2) guaranteed?
I thought so.

It is guaranteed to be false. However, barring arithmetic
overflow, ((x << 2) == (x * 4)) is guaranteed for unsigned x.
 
E

Eric Sosman

Tomás Ó hÉilidhe said:
Eric Sosman:

No. I've used this "thought experiment" before, but maybe
it's time to trot it out again: Tomás, I have built a computer
that uses four-state components instead of the more traditional
two-state parts.


This is quite simply implemented. In memory, instead of having one
capacitor to indicate one bit's value, you could have two's capacitors
to indicate four bits' value.

Internally, therefore, numbers in my computer
are represented in base 4. Questions:

1) Since the "4's bit" and the "8's bit" both inhabit the
"4's quit," which is to the left of the other?

Not to sound caustic, but that's irrelevant. [...]

Yes -- but aren't you the person who claimed that

? It's hard to see how "C specifies layout" and "that's
irrelevant" can be reconciled. (I'm in the "that's irrelevant"
camp, but I read your claim as meaning that you are not.)
In C we have bitwise operators. We don't have bi-bitwise operators.

In order for your machine to be able to do bitwise stuff, it must have
some sort of emulation.

No more "emulation" than an ordinary machine uses when it
operates on values. The circuitry of my quaternary computer
calculates `8 ^ 31' and gets the result 23, just like your
binary computer does. Mine gets 128 from `1 << 7', just like
yours. And so on. For all operators on all (valid) operands,
my quaternary machine delivers the same result as your binary
machine. So, what C program can demonstrate a difference?
And if (as I maintain) there is no such program, it follows
that no program can tell whether the 4's bit is to the left
or to the right of the 8's bit -- because if it could, you
could use it to show that my machine is not quaternary.

(Side-note: If the notion of encoding several bits at a
time in a multi-state device sounds fanciful, take a look at
the on-the-wire signal from a 56kb/s modem. Or look at the
output from a compression program that sucks in a megabit and
spits out a kilobyte: How many input bits are encoded in the
final output bit?)
 
T

Tomás Ó hÉilidhe

Eric Sosman:
No more "emulation" than an ordinary machine uses when it
operates on values. The circuitry of my quaternary computer
calculates `8 ^ 31' and gets the result 23, just like your
binary computer does.

Mine gets 128 from `1 << 7', just like
yours. And so on. For all operators on all (valid) operands,
my quaternary machine delivers the same result as your binary
machine. So, what C program can demonstrate a difference?
And if (as I maintain) there is no such program, it follows
that no program can tell whether the 4's bit is to the left
or to the right of the 8's bit -- because if it could, you
could use it to show that my machine is not quaternary.


Bitwise operations will be indentical on every system, and therefore we
can think of it as being canonical binary. Thus, bitwise operation is
reliable and predictable on every compliant implementation of the
Standard.

I think the OP's question was why do we bother using bitwise operation
if we can't rely on its results. Wel we *can* rely on the results,
because we can always just say that it uses canonical binary (even if
it's doing something funky under the hood).

(Side-note: If the notion of encoding several bits at a
time in a multi-state device sounds fanciful, take a look at
the on-the-wire signal from a 56kb/s modem.


I had a Telecoms exam just there on Thursday :-D The maximal amount
of different voltage levels you can use on a line is given by:

( 1 + (signal to noise ratio) ) ^ .5

Nowadays tho things are fancier than that, we use stuff like 16QAM
whereby each transmitted symbol represents 4 bits (16 different values
because of 4 voltage levels and 4 different phases).

While it certainly makes sense to transmit data via symbols which
represent more than one bit, I can't think of any merit in actually
having a processor that plays around with multi-bit symbols. (In fact I
can't think of a way of achieving such a processor, because capacitors
have always and will always be either charged or discharged, 1 or 0).
 
W

Willem

Tomás wrote:
) While it certainly makes sense to transmit data via symbols which
) represent more than one bit, I can't think of any merit in actually
) having a processor that plays around with multi-bit symbols. (In fact I
) can't think of a way of achieving such a processor, because capacitors
) have always and will always be either charged or discharged, 1 or 0).

It can be charged in both directions, so that's 1, 0 or -1.

I think the most compelling argument for using 2-state logic is
that using more states greatly complicates things.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
C

christian.bau

Unsigned integers use a binary representation with n value bits, that
is there is one bit with a value of 1, one bit with a value of 2, then
4, 8, 16 and so on. It is not specified how these bits are stored in
memory. The bit operators & | ^ ~ are performed on these bits. For any
two C implementations where for example unsigned int uses the same
number of value bits, the results of the bitwise operators will be
identical.

Signed integers use a binary representation with n value bits, plus
one sign bit. It is implementation defined how the sign bit modifies
the value. There are three methods allowed by the C Standard. If the
value bits have a value of X, and the sign bit is zero, then the value
of the signed integer is X. If the sign bit is 1, then the value is
one of X - (2*INT_MAX + 2) (two's complement), X - (2 * INT_MAX + 1)
(one's complement) or -X (signed magnitude), the most common method
today is 2's complement. The bitwise operation is performed on all the
value bits and on the sign bit, and the result is then again
interpreted as a signed integer.

For signed integers, there will be different results depending on
whether the machine uses 2's complement, 1's complement or signed
magnitude representation. On 1's complement and signed magnitude
implementations, a simple bitwise and may even produce undefined
behavior. On 2's complement machines, this cannot happen.

So the bitwise operators all take the values, split them either into
value bits or into value bits and sign bit, perform the bit operation,
and combine the value bits and sign bits back into a value. This
operation is independent on the memory layout, it just depends on the
value, and also on the method that the implementation uses for
negative signed integers.
 
C

CBFalconer

Tomás Ó hÉilidhe said:
.... snip ...

While it certainly makes sense to transmit data via symbols which
represent more than one bit, I can't think of any merit in
actually having a processor that plays around with multi-bit
symbols. (In fact I can't think of a way of achieving such a
processor, because capacitors have always and will always be
either charged or discharged, 1 or 0).

Just as a simple example, consider capacitors that are charged to
0, 1, 2, or 3 volts. Then all you have to do is assign the value
of a bit pair to each such voltage.

The point of using binary hardware is that the circuitry becomes
simple and easily maintainable.
 
P

Philip Potter

CBFalconer said:
Robert said:
dspfun said:
These operators yield values that depend on the internal
representations of [unsigned] integers
Question is: Is --for instance-- (x << 2) == (x * 2) guaranteed?
I thought so.

It is guaranteed to be false.

Not so. x == -1 produces UB in the LH operand, so it could be true.
Slightly more plausibly, if x == 0 then the expression is guaranteed to
be true.
However, barring arithmetic
overflow, ((x << 2) == (x * 4)) is guaranteed for unsigned x.
Again, not so. For x == -1, no overflow occurs but UB does occur so no
guarantees exist.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top