Bit Padding and other questions

V

vippstar

Hey comp.lang.c

I'm somewhat confused with bit padding.
I tried searching the FAQ, but there isn't a search feature, so I used
google and the search query: site:c-faq.com padding.
I did not find anything relevant to bit padding, only byte padding for
structs, which more or less I understand.

All sections are from n1256

6.2.6.1 p4 says all objects except bit-fields consist of n * CHAR_BIT
bits, where n is sizeof object.
So if an int has padding bits, they are multiples of CHAR_BIT? Is
there a way to find which bits are the padding bits?
Padding bits can only be at the start/end of the object
representation, or anywhere? What happends if I access those padding
bits, by casting the object to (unsigned char *)? What happends if I
change the value of these padding bits?

6.2.6.1 p5 says the object representation does not have to represent
the value of the object type. I think I understand this, for example
with offset:segment pointers, where they could be mapped to the same
area, but have different values? If I understand correctly, == takes
care of this.
In sort, == compares values and memcmp() compares object
representations, right?
However, 6.2.6.2 note 43 says that x == y does not necessarily imply
that x and y have the same value.
I don't understand this, in my opinion contradicts the previous
paragraphs, does it not?

Why would an implementation want padding bits in any object type? Why
cannot these bits be used to increase the value range?
Why does the standard explicity mention that the byte that getchar() &
family functions read is returned casted to (unsigned char)? I think
it has something to do with 'unsigned char' being the only type that
doesn't have padding bits, but I'm not sure how/why that matters.

Lastly, can the bitshift operators manipulate the padding bits?
For example: int i = x; i <<= y;

All replies much appreciated.
 
S

santosh

Hey comp.lang.c

I'm somewhat confused with bit padding.
I tried searching the FAQ, but there isn't a search feature,

What about:

so I used
google and the search query: site:c-faq.com padding.
I did not find anything relevant to bit padding,

That's because the CLC FAQ was written much before C99 which first
introduced padding bits.

<I'll leave it to the "real experts" to answer your further questions>
:)
 
B

Barry Schwarz

Hey comp.lang.c

I'm somewhat confused with bit padding.
I tried searching the FAQ, but there isn't a search feature, so I used
google and the search query: site:c-faq.com padding.
I did not find anything relevant to bit padding, only byte padding for
structs, which more or less I understand.

All sections are from n1256

6.2.6.1 p4 says all objects except bit-fields consist of n * CHAR_BIT
bits, where n is sizeof object.
So if an int has padding bits, they are multiples of CHAR_BIT? Is

No, there is no requirement that any padding bits completely fill a
byte. See 6.2.6.2.
there a way to find which bits are the padding bits?

Possibly but why would you care? One method would be to use an array
of unsigned char of the appropriate size. You can generate each
possible value of the aggregate (manipulating the elements one at a
time), copy (memcpy) the array to an object of the type in question,
and evaluate the object. If the value is the same as any previous
value, then the two different bit patterns represent the same value.
Any bit differences between the two representations are probably only
in padding bits. Beware that some of the bit patterns this technique
generates could be trap representations and attempting to evaluate
this pattern invokes undefined behavior.

If a 32 bit integer has 1 padding bit, you would need to loop through
4+ billion array values and save 2+ billion "previous values". So the
real question is not can you but why would you want to.
Padding bits can only be at the start/end of the object
representation, or anywhere? What happends if I access those padding

No restriction.
bits, by casting the object to (unsigned char *)? What happends if I

I assume you meant cast the address of the object to unsigned char*.
By definition, unsigned char has no padding bits and no trap
representations. Since you are not evaluating the object but only
accessing its bytes, there is no problem.
change the value of these padding bits?

If you evaluate the original object after changing any of its bits
using the unsigned char*, there are three possibilities:

If new bit pattern is a trap representation, you invoke undefined
behavior.

Else if the change is only in padding bits, the value of the
object is unchanged.

Else the object has a new value.
6.2.6.1 p5 says the object representation does not have to represent
the value of the object type. I think I understand this, for example

No it says the representation does not have to represent **a** value.
If the object has a valid value, then the representation must
represent that value.
with offset:segment pointers, where they could be mapped to the same

I don't think this type of difference is what the section is
discussing.
area, but have different values? If I understand correctly, == takes
care of this.

If they compare equal, they have the same value, just different
representations.
In sort, == compares values and memcmp() compares object
representations, right?

Seems right to me.
However, 6.2.6.2 note 43 says that x == y does not necessarily imply
that x and y have the same value.

That is not what it says. It says if x == y when they are evaluated
as type T1, they need not compare equal when evaluated as type T2.
This is a direct result of == ignoring padding bits but types such as
unsigned char not ignoring them (in fact, not even having them).

Consider two 16 bit int objects where the low order bit is padding
(ignore the fact that this violates the minimum size of an int). If
one contains 0x0000 and the other contains 0x0001, both represent the
value 0 and will compare equal as int objects. If you compare the low
order byte of each as unsigned char, one has the value 0 and the other
has the value 1 and they will not compare equal.
I don't understand this, in my opinion contradicts the previous
paragraphs, does it not?

The only contradiction is your misunderstanding.
Why would an implementation want padding bits in any object type? Why

The example in the standard is for parity. In any event, if the
hardware arithmetic circuitry treats certain bits as non-participants,
then padding bits make the compiler writer's job easier.
cannot these bits be used to increase the value range?

If the circuitry ignores the bits, the compiler would have to generate
a lot of additional code. The resulting bloat in memory and CPU usage
would probably be unpopular.
Why does the standard explicity mention that the byte that getchar() &
family functions read is returned casted to (unsigned char)? I think
it has something to do with 'unsigned char' being the only type that
doesn't have padding bits, but I'm not sure how/why that matters.

getchar does not return a byte. It returns an int. By forcing the
char that is obtained from the stream to treated (but not necessarily
via a cast operation) as unsigned, the int will always be positive
regardless of whether char is signed or unsigned. This significantly
increases code portability.
Lastly, can the bitshift operators manipulate the padding bits?
For example: int i = x; i <<= y;

The shift operators work on values, not representations. The only
guarantee is that a legal shift operation will not result in a trap
representation. Whether any padding bits are changed or unchanged is
unspecified.


Remove del for email
 
L

lawrence.jones

6.2.6.1 p4 says all objects except bit-fields consist of n * CHAR_BIT
bits, where n is sizeof object.
So if an int has padding bits, they are multiples of CHAR_BIT?

By "they", do you mean the padding bits? If so, the answer is "no".
The int itself must contain an integral multiple of CHAR_BIT bits
(counting the sign bit, if any, the value bits, and the padding bits),
but the individual collections of bits can have any length.
Is there a way to find which bits are the padding bits?
No.

Padding bits can only be at the start/end of the object
representation, or anywhere?

Anywhere, but most often they are at the start, less often at the end.
What happends if I access those padding bits, by casting the object to
(unsigned char *)?

Then you get to see what particular unspecified values they happen to
have at the current time.
What happends if I change the value of these padding bits?

If you change them such that you create a trap representation in the
original int, then accessing its value causes undefined behavior.
Otherwise, the original int maintains its original value.
6.2.6.1 p5 says the object representation does not have to represent
the value of the object type. I think I understand this, for example
with offset:segment pointers, where they could be mapped to the same
area, but have different values?

Not quite -- that's two different object representations for the same
value. What 6.2.6.1p5 is talking about is that some object
representations don't represent *any* value. For example, some
floating-point representations don't assign meanings to all possible bit
patterns (e.g., if the exponent is zero, then the fraction also must be
zero; a non-zero fraction with a zero exponent is not a vaid floating-
point value).
If I understand correctly, == takes care of this.

Not necessarily. If a valid program (i.e., no unspecified or undefined
behavior) can produce different representations for the same value, then
== is obliged to "normalize" the values so that they compare equal. If
the only way to produce an alternate representation is by dabbling in
undefined or unspecified behavior, then the compiler is not required to
normalize before comparing.
In sort, == compares values and memcmp() compares object
representations, right?
Correct.

However, 6.2.6.2 note 43 says that x == y does not necessarily imply
that x and y have the same value.
I don't understand this, in my opinion contradicts the previous
paragraphs, does it not?

It's a slippery slope. In particular, floating-point implementations
that support signed zeros are obliged to return true for -0.0 == +0.0,
but math library functions like signbit() treat them differently.
Why would an implementation want padding bits in any object type? Why
cannot these bits be used to increase the value range?

The hardware may not allow it. The canonical example is a machine (the
manufacturer of which escapes me at the moment) that did not have
integer types, only floating-point. Integers were simulated by having
the exponent field of the floating-point number set to zero. There are
also systems that don't support unsigned arithmetic where the unsigned
types are simulated using signed integers with a zero sign bit, making
the sign bit a padding bit for the unsigned types (and that's why
there's no requirement that UINT_MAX be greater than INT_MAX).
Why does the standard explicity mention that the byte that getchar() &
family functions read is returned casted to (unsigned char)? I think
it has something to do with 'unsigned char' being the only type that
doesn't have padding bits, but I'm not sure how/why that matters.

It matters because you have to have access to *all* the bits in the
bytes in order to be able to read and write arbitrary objects as
collections of bytes.
Lastly, can the bitshift operators manipulate the padding bits?
For example: int i = x; i <<= y;

No. Operators work on the value, not the representation, so the padding
bits are invisible.

-- Larry Jones

Things are never quite as scary when you've got a best friend. -- Calvin
 
B

Ben Bacarisse

I'm somewhat confused with bit padding.
<snip>

You've had an excellent answer to all of the technical points so I'll
just add a bit to one small question:
Why would an implementation want padding bits in any object type? Why
cannot these bits be used to increase the value range?

When computers were not stamped on wafers, the two main reasons to have
padding bits were cost and speed.

Logic was both expensive and slow, so having more than you need was a
waste. The classic example was the 32-bit IBM 360 and 370 series.
32-bit ints are useful but since no one could afford (or even build)
32 bits worth of memory (4Gb) there was no point in having address
logic to operate on more than, say, 24-bit addresses. At the same
time, the fastest way to get addresses out of memory was to request
a full 32-bit word so, as a result, all pointers had 8 padding bits.
These are addresses, but pointers are still object types in C.

Similar arguments led some supercomputers[1] to have short integer and
floating-point operations that used faster logic than the full-width
versions. Actually that much was, in fact, common. The key is that on
some systems, due to the design of the memory, getting the fastest
access times sometimes required that these values be stored in objects
that were larger than they needed to be -- i.e. they had padding.

[1] Anyone have a reference? This is "what people always say" but I
have never used such a system, nor seen any documentation for one.
 
V

vippstar

Hey comp.lang.c

I'm somewhat confused with bit padding.
I tried searching the FAQ, but there isn't a search feature, so I used
google and the search query: site:c-faq.com padding.
I did not find anything relevant to bit padding, only byte padding for
structs, which more or less I understand.
All replies much appreciated.

Thanks, very enlightening replies.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top