bits and stuff

J

Joe Laughlin

#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;
num |= b << 16;
num |= c << 8;
num |= d << 0;

printf("a = %02.2x\nb = %02.2x\nc = %02.2x\nd = %02.2x\n", a, b, c, d);
printf("num = %08.8x\n", num);

return 0;
}

What I'm trying to do here is pack a, b, c, and d into num.

It works if I set num = 0, but how come it's not working if I leave it out
(like above)? I know that when num is declared, its memory contents is full
of garbage, but I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

Thanks,
Joe
 
A

Artie Gold

Joe said:
#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;

This is the same as writing:
num = num | a << 24;

i.e. you're doing a bitwise `or' of whatever was in `num' and `a'
left-shifted 24 bits. See what the problem is?
num |= b << 16;
num |= c << 8;
num |= d << 0;

printf("a = %02.2x\nb = %02.2x\nc = %02.2x\nd = %02.2x\n", a, b, c, d);
printf("num = %08.8x\n", num);

return 0;
}

What I'm trying to do here is pack a, b, c, and d into num.

It works if I set num = 0, but how come it's not working if I leave it out
(like above)? I know that when num is declared, its memory contents is full
of garbage, but I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)
See above.

HTH,
--ag
 
M

Mark McIntyre

#include <stdio.h>

int main()
{
unsigned long num;

this is an uninitialised variable. It contains garbage.
num |= a << 24;

here you OR the bits of a with garbage.
GIGO.
What I'm trying to do here is pack a, b, c, and d into num.

This is not guaranteed to work at all - you're assuming that
sizeof(long)==4 which need not be true.
I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

You're ORing it with the garbage.
 
J

Jack Klein

this is an uninitialised variable. It contains garbage.


here you OR the bits of a with garbage.
GIGO.


This is not guaranteed to work at all - you're assuming that
sizeof(long)==4 which need not be true.

No he's not. He's assuming that unsigned long contains at least 32
value bits, and that is guaranteed by the standard.

On the compiler I used at work today, unsigned long has exactly 32
value bits. And sizeof(unsigned long) is 2, not 4.
You're ORing it with the garbage.

Well, that's true.
 
J

Jack Klein

On Fri, 11 Jun 2004 23:01:49 GMT, "Joe Laughlin"

In addition to what others have said, you're risking undefined
behavior in ways they haven't pointed out.
#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;

The C standard requires that signed and unsigned ints be able to
represent a range of values that requires them to have at least 16
bits. There are indeed implementations where an unsigned int has 16
bits and no more, although this is no longer common in popular desk
top systems. On such a system, shifting an unsigned int left by 24,
or by 16 as below, generates undefined behavior.

This should be written as:

num = (unsigned long)a << 24;

The first one only should be "=" rather than "|=".
num |= b << 16;

This one also needs the cast to (unsigned long). The other two do
not.
 
C

CBFalconer

Mark said:
this is an uninitialised variable. It contains garbage.


here you OR the bits of a with garbage.
GIGO.


This is not guaranteed to work at all - you're assuming that
sizeof(long)==4 which need not be true.


You're ORing it with the garbage.

In addition, the "a << 24" expression above is an int expression.
It is only converted to long for the addition to num, which is too
late. If int happens to be 16 bits you have undefined behaviour.
Thus that should be written as:

((unsigned long)a << 24)

and similarly for the b value. Next you should worry about
CHAR_BIT being larger than 8, and a,b,c,d potentially containing
values larger than 255.
 
S

Stephen L.

Joe said:
#include <stdio.h>

int main()
{
unsigned long num;
unsigned int a = 10, b = 20, c = 30, d = 40;
/* num = 0; */

num |= a << 24;
num |= b << 16;
num |= c << 8;
num |= d << 0;

printf("a = %02.2x\nb = %02.2x\nc = %02.2x\nd = %02.2x\n", a, b, c, d);
printf("num = %08.8x\n", num);

return 0;
}

What I'm trying to do here is pack a, b, c, and d into num.

It works if I set num = 0, but how come it's not working if I leave it out
(like above)? I know that when num is declared, its memory contents is full
of garbage, but I thought that by packing the ints into it would've
overwritten all the garbage? (hope that makes sense)

Thanks,
Joe

Other posters pointed out the issues with
the above.

Have you thought about using a union (let the
compiler do the work for you)?

int
main()
{
union {
unsigned long n;
unsigned char piece[ sizeof (unsigned long) ];
} num;
unsigned int a = 10, b = 20, c = 30, d = 40;

num.piece[ 0 ] = a;
num.piece[ 1 ] = b;
num.piece[ 2 ] = c;
num.piece[ 3 ] = d;

printf("a = %2.2x\nb = %2.2x\nc = %2.2x\nd = %2.2x\n", a, b, c, d);
printf("num = %8.8lx\n", num.n);

return (0);
}

Of course, if your unsigned ints overflow what will fit
in an unsigned char, you'll get unknown results...


HTH,

Stephen
 
C

Chris Torek

Joe Laughlin wrote:
[code that uses left-shift and bitwise-OR to construct a 32-bit value
from four eight-bit values, with a slight flaw]

Other posters pointed out the issues with
the above.

Have you thought about using a union (let the
compiler do the work for you)?

This method has advantages and disadvantages. Often the
disadvantages outweigh the advantages:
union {
unsigned long n;
unsigned char piece[ sizeof (unsigned long) ];
} num;

num.piece[ 0 ] = a;
[and so on]
Of course, if your unsigned ints overflow what will fit
in an unsigned char, you'll get unknown results...

The disadvantage of using the union trick -- or, equivalently,
using an "unsigned char *" to point to the individual C-bytes that
make up an "unsigned long" -- is that you expose yourself to the
implementation's representation. In particular, on common
implementations today, you now have to worry about:

- sizeof(unsigned long) changing from 4 to 8
- endianness

The advantage of using the union trick is the same as the disadvantage:
you expose yourself to the implementation's representation. If
that is what you *want* to do, go ahead and do it. On the other
hand, if you just want to compose a predictable 32-bit value from
four eight-bit values, the shift-and-bitwise-OR method will always
work. The common concerns above (sizeof(unsigned long) and
endinanness) become entirely irrelevant.
 
C

CBFalconer

Stephen L. said:
.... snip ...

Have you thought about using a union (let the
compiler do the work for you)?

int
main()
{
union {
unsigned long n;
unsigned char piece[ sizeof (unsigned long) ];
} num;
unsigned int a = 10, b = 20, c = 30, d = 40;

That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.
 
M

Michael Mair

Hiho,


[union-for-bytewise-access]
That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...
I quickly scanned the question list of the faq and did not find
one dealing with this. For accessing double bits/bytes/words/whatever,
I would rather look at the memory representation with the
help of pointers but I am not sure whether this is the only, let
alone the best way.


Cheers,
Michael
 
M

Mark McIntyre

No he's not. He's assuming that unsigned long contains at least 32
value bits,

And that each of his unsigned ints has at most 8 relevant value bits. Is
that certain, on all implementations?
and that is guaranteed by the standard.

You're right in that.
On the compiler I used at work today, unsigned long has exactly 32
value bits. And sizeof(unsigned long) is 2, not 4.

I'd be interested to know if it has a comforming hosted implementation tho
:)
 
M

Mark McIntyre

Have you thought about using a union (let the
compiler do the work for you)?

snip example of packing a union and then unpacking it differently.
Of course, if your unsigned ints overflow what will fit
in an unsigned char, you'll get unknown results...

But reading from a union by accessing a member other than that which you
wrote last is UB anyway. Its a common extension to place meaning on the
behaviour of course, but you can't rely on it.
 
M

Mark McIntyre

Hiho,


[union-for-bytewise-access]
That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...

I think CBF means Endianness. See Chris Torek's post.
 
T

those who know me have no need of my name

in comp.lang.c i read:
[union-for-bytewise-access]
That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...

which end of the array of bytes is the least significant, i.e., is the
least significant byte [0] or [sizeof object - 1]? and those are not the
only possibilities if sizeof object is greater than 2, where [1] is another
though it's not common to see these days.

signed long value = 1;
unsigned char bytes[sizeof value];
memcpy(bytes, &value, sizeof value);
/* is bytes[0], bytes[1] or bytes[sizeof bytes - 1], or some other, the 1? */

today one tends to see 01 00 00 00 (little endian) or 00 00 00 01 (big
endian), which correspond to b[0] or b[sizeof b - 1] having the 1.

and that ignores padding, which is, again, unlikely these days but it is
allowed, and for all anyone knows it will reappear or you're program will
have to work on a dinosaur. if there is padding then it may be that none
of the bytes in my example will be a 1, or there may be more than one with
a non-zero value.

oh, and what if sizeof value is 1? are you thinking `how can such a thing
be'? in c it is possible if CHAR_BIT is 32 or larger, which is seen on
today's dsp's. in that case you aren't accessing octets, which is often
what people want to do with the sort of tricks discussed, rather you are
accessing the one, 32 bit, byte, hence [0] is all there is, but how you
serialize it's octets remains an issue.

all this makes working with internal representations a difficult and
tedious, though not insurmountable thing.
 
C

CBFalconer

Michael said:
[union-for-bytewise-access]
That approach is inherently unsafe, both for misuse of the union,
and for dependence on byte sex.

Mmmm, I know about the former but could you or someone else please
expand your answer concerning the latter? I am not even sure what
you mean by byte sex...

Also known as endianess. The order of octets within the
representation of an integer, or other item. I know of at least 3
fairly popular versions for 32 bit integers. The shift, mask, and
add method is independant of this.
 
O

Old Wolf

Mark McIntyre said:
But reading from a union by accessing a member other than that which you
wrote last is UB anyway. Its a common extension to place meaning on the
behaviour of course, but you can't rely on it.

FWIW, doesn't this make unions worse than useless in portable code?
If there were two (or more) variables that never needed to maintain
their values concurrently, I would expect a compiler to assign them
the same bit of memory anyway.
 
R

Richard Bos

FWIW, doesn't this make unions worse than useless in portable code?
If there were two (or more) variables that never needed to maintain
their values concurrently, I would expect a compiler to assign them
the same bit of memory anyway.

It cannot always tell. Besides, which is the more useful declaration:

int foodvalue(union animal animal);

or

int foodvalue(struct common_animal animal,
struct fish fish, struct bird bird, struct mammal mammal,
enum taxon which_taxon);

I'd prefer the former.

Richard
 
M

Mark McIntyre

FWIW, doesn't this make unions worse than useless in portable code?

Not at all. Take a look at how the original MS Excel toolkit passed cell
contents to and from functions:

stuct
{
char datatype;
union
{
int intdata;
long longdata;
float floatdata;
double doubledata;
// etc
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top