bits and stuff

Joe Laughlin · Jun 12, 2004

#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;
num |= b << 16;
num |= c << 8;
num |= d << 0;

printf("a = %02.2x\nb = %02.2x\nc = %02.2x\nd = %02.2x\n", a, b, c, d);
printf("num = %08.8x\n", num);

return 0;
}

What I'm trying to do here is pack a, b, c, and d into num.

It works if I set num = 0, but how come it's not working if I leave it out
(like above)? I know that when num is declared, its memory contents is full
of garbage, but I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

Thanks,
Joe

Artie Gold · Jun 12, 2004

Joe said:
#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;

This is the same as writing:
num = num | a << 24;

i.e. you're doing a bitwise `or' of whatever was in `num' and `a'
left-shifted 24 bits. See what the problem is?

num |= b << 16;
num |= c << 8;
num |= d << 0;

printf("a = %02.2x\nb = %02.2x\nc = %02.2x\nd = %02.2x\n", a, b, c, d);
printf("num = %08.8x\n", num);

return 0;
}

What I'm trying to do here is pack a, b, c, and d into num.

It works if I set num = 0, but how come it's not working if I leave it out
(like above)? I know that when num is declared, its memory contents is full
of garbage, but I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

See above.

HTH,
--ag

Mark McIntyre · Jun 12, 2004

#include <stdio.h>

int main()
{
unsigned long num;

this is an uninitialised variable. It contains garbage.

num |= a << 24;

here you OR the bits of a with garbage.
GIGO.

What I'm trying to do here is pack a, b, c, and d into num.

This is not guaranteed to work at all - you're assuming that
sizeof(long)==4 which need not be true.

I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

You're ORing it with the garbage.

Jack Klein · Jun 12, 2004

this is an uninitialised variable. It contains garbage.

here you OR the bits of a with garbage.
GIGO.

This is not guaranteed to work at all - you're assuming that
sizeof(long)==4 which need not be true.

No he's not. He's assuming that unsigned long contains at least 32
value bits, and that is guaranteed by the standard.

On the compiler I used at work today, unsigned long has exactly 32
value bits. And sizeof(unsigned long) is 2, not 4.

You're ORing it with the garbage.

Well, that's true.

Jack Klein · Jun 12, 2004

On Fri, 11 Jun 2004 23:01:49 GMT, "Joe Laughlin"

In addition to what others have said, you're risking undefined
behavior in ways they haven't pointed out.

#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;

The C standard requires that signed and unsigned ints be able to
represent a range of values that requires them to have at least 16
bits. There are indeed implementations where an unsigned int has 16
bits and no more, although this is no longer common in popular desk
top systems. On such a system, shifting an unsigned int left by 24,
or by 16 as below, generates undefined behavior.

This should be written as:

num = (unsigned long)a << 24;

The first one only should be "=" rather than "|=".

num |= b << 16;

This one also needs the cast to (unsigned long). The other two do
not.

CBFalconer · Jun 12, 2004

Mark said:
this is an uninitialised variable. It contains garbage.

here you OR the bits of a with garbage.
GIGO.

This is not guaranteed to work at all - you're assuming that
sizeof(long)==4 which need not be true.

You're ORing it with the garbage.

In addition, the "a << 24" expression above is an int expression.
It is only converted to long for the addition to num, which is too
late. If int happens to be 16 bits you have undefined behaviour.
Thus that should be written as:

((unsigned long)a << 24)

and similarly for the b value. Next you should worry about
CHAR_BIT being larger than 8, and a,b,c,d potentially containing
values larger than 255.

Stephen L. · Jun 12, 2004

Joe said:
#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;
num |= b << 16;
num |= c << 8;
num |= d << 0;

printf("a = %02.2x\nb = %02.2x\nc = %02.2x\nd = %02.2x\n", a, b, c, d);
printf("num = %08.8x\n", num);

return 0;
}

What I'm trying to do here is pack a, b, c, and d into num.

It works if I set num = 0, but how come it's not working if I leave it out
(like above)? I know that when num is declared, its memory contents is full
of garbage, but I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

Thanks,
Joe

Other posters pointed out the issues with
the above.

Have you thought about using a union (let the
compiler do the work for you)?

int
main()
{
union {
unsigned long n;
unsigned char piece[ sizeof (unsigned long) ];
} num;
unsigned int a = 10, b = 20, c = 30, d = 40;

num.piece[ 0 ] = a;
num.piece[ 1 ] = b;
num.piece[ 2 ] = c;
num.piece[ 3 ] = d;

printf("a = %2.2x\nb = %2.2x\nc = %2.2x\nd = %2.2x\n", a, b, c, d);
printf("num = %8.8lx\n", num.n);

return (0);
}

Of course, if your unsigned ints overflow what will fit
in an unsigned char, you'll get unknown results...

HTH,

Stephen

Chris Torek · Jun 12, 2004

Joe Laughlin wrote:
[code that uses left-shift and bitwise-OR to construct a 32-bit value
from four eight-bit values, with a slight flaw]

Other posters pointed out the issues with
the above.

Have you thought about using a union (let the
compiler do the work for you)?

This method has advantages and disadvantages. Often the
disadvantages outweigh the advantages:

union {
unsigned long n;
unsigned char piece[ sizeof (unsigned long) ];
} num;

num.piece[ 0 ] = a;

[and so on]

Of course, if your unsigned ints overflow what will fit
in an unsigned char, you'll get unknown results...

The disadvantage of using the union trick -- or, equivalently,
using an "unsigned char *" to point to the individual C-bytes that
make up an "unsigned long" -- is that you expose yourself to the
implementation's representation. In particular, on common
implementations today, you now have to worry about:

- sizeof(unsigned long) changing from 4 to 8
- endianness

The advantage of using the union trick is the same as the disadvantage:
you expose yourself to the implementation's representation. If
that is what you *want* to do, go ahead and do it. On the other
hand, if you just want to compose a predictable 32-bit value from
four eight-bit values, the shift-and-bitwise-OR method will always
work. The common concerns above (sizeof(unsigned long) and
endinanness) become entirely irrelevant.

CBFalconer · Jun 12, 2004

Stephen L. said:
.... snip ...

Have you thought about using a union (let the
compiler do the work for you)?

int
main()
{
union {
unsigned long n;
unsigned char piece[ sizeof (unsigned long) ];
} num;
unsigned int a = 10, b = 20, c = 30, d = 40;

That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Michael Mair · Jun 13, 2004

Hiho,

[union-for-bytewise-access]

That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...
I quickly scanned the question list of the faq and did not find
one dealing with this. For accessing double bits/bytes/words/whatever,
I would rather look at the memory representation with the
help of pointers but I am not sure whether this is the only, let
alone the best way.

Cheers,
Michael

Mark McIntyre · Jun 14, 2004

No he's not. He's assuming that unsigned long contains at least 32
value bits,

And that each of his unsigned ints has at most 8 relevant value bits. Is
that certain, on all implementations?

and that is guaranteed by the standard.

You're right in that.

On the compiler I used at work today, unsigned long has exactly 32
value bits. And sizeof(unsigned long) is 2, not 4.

I'd be interested to know if it has a comforming hosted implementation tho

Mark McIntyre · Jun 14, 2004

Have you thought about using a union (let the
compiler do the work for you)?

snip example of packing a union and then unpacking it differently.

Of course, if your unsigned ints overflow what will fit
in an unsigned char, you'll get unknown results...

But reading from a union by accessing a member other than that which you
wrote last is UB anyway. Its a common extension to place meaning on the
behaviour of course, but you can't rely on it.

Mark McIntyre · Jun 14, 2004

Hiho,

[union-for-bytewise-access]

That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Click to expand...

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...

I think CBF means Endianness. See Chris Torek's post.

those who know me have no need of my name · Jun 14, 2004

in comp.lang.c i read:

[union-for-bytewise-access]

That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Click to expand...

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...

which end of the array of bytes is the least significant, i.e., is the
least significant byte [0] or [sizeof object - 1]? and those are not the
only possibilities if sizeof object is greater than 2, where [1] is another
though it's not common to see these days.

signed long value = 1;
unsigned char bytes[sizeof value];
memcpy(bytes, &value, sizeof value);
/* is bytes[0], bytes[1] or bytes[sizeof bytes - 1], or some other, the 1? */

today one tends to see 01 00 00 00 (little endian) or 00 00 00 01 (big
endian), which correspond to b[0] or b[sizeof b - 1] having the 1.

and that ignores padding, which is, again, unlikely these days but it is
allowed, and for all anyone knows it will reappear or you're program will
have to work on a dinosaur. if there is padding then it may be that none
of the bytes in my example will be a 1, or there may be more than one with
a non-zero value.

oh, and what if sizeof value is 1? are you thinking `how can such a thing
be'? in c it is possible if CHAR_BIT is 32 or larger, which is seen on
today's dsp's. in that case you aren't accessing octets, which is often
what people want to do with the sort of tricks discussed, rather you are
accessing the one, 32 bit, byte, hence [0] is all there is, but how you
serialize it's octets remains an issue.

all this makes working with internal representations a difficult and
tedious, though not insurmountable thing.

CBFalconer · Jun 14, 2004

Michael said:
[union-for-bytewise-access]

That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Click to expand...

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...

Also known as endianess. The order of octets within the
representation of an integer, or other item. I know of at least 3
fairly popular versions for 32 bit integers. The shift, mask, and
add method is independant of this.

Michael Mair · Jun 14, 2004

Thanks for enlightening me

Michael Mair · Jun 14, 2004

Thanks for enlightening me

Old Wolf · Jun 16, 2004

Mark McIntyre said:
But reading from a union by accessing a member other than that which you
wrote last is UB anyway. Its a common extension to place meaning on the
behaviour of course, but you can't rely on it.

FWIW, doesn't this make unions worse than useless in portable code?
If there were two (or more) variables that never needed to maintain
their values concurrently, I would expect a compiler to assign them
the same bit of memory anyway.

Richard Bos · Jun 16, 2004

FWIW, doesn't this make unions worse than useless in portable code?
If there were two (or more) variables that never needed to maintain
their values concurrently, I would expect a compiler to assign them
the same bit of memory anyway.

It cannot always tell. Besides, which is the more useful declaration:

int foodvalue(union animal animal);

or

int foodvalue(struct common_animal animal,
struct fish fish, struct bird bird, struct mammal mammal,
enum taxon which_taxon);

I'd prefer the former.

Richard

Mark McIntyre · Jun 16, 2004

FWIW, doesn't this make unions worse than useless in portable code?

Not at all. Take a look at how the original MS Excel toolkit passed cell
contents to and from functions:

stuct
{
char datatype;
union
{
int intdata;
long longdata;
float floatdata;
double doubledata;
// etc
}
}

C coding a rotate function (help me pleasee)	1	Dec 26, 2022
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
A generic interface for numeric variables	8	Apr 4, 2011
-2147483648 and gcc optimisation, all sorts of different results	6	Mar 11, 2010
perror()4 says SUCCESS	10	Nov 22, 2011
va_arg... recursion: changing arguments and the using recursion	8	Apr 25, 2012
Command Line Arguments	0	Mar 7, 2023
Basic noob question re console input	26	Feb 12, 2014

bits and stuff

Joe Laughlin

Artie Gold

Mark McIntyre

Jack Klein

Jack Klein

CBFalconer

Stephen L.

Chris Torek

CBFalconer

Michael Mair

Mark McIntyre

Mark McIntyre

Mark McIntyre

those who know me have no need of my name

CBFalconer

Michael Mair

Michael Mair

Old Wolf

Richard Bos

Mark McIntyre

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads