Access individual bytes of a 4 byte long (optimization)

anon.asdf · Aug 10, 2007

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}

My feeling is the Version C is best.

What can be said about the alignment of array f_b and mylong in
Version B?
(I think in Version B, the alignment of array f_b and mylong might be
skew, in which case it is slower than C. If in Version B, four_b and
mylong are aligned, then Version B is identical to Version C.?)

..
..
..

Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?

/**** Version A2 ******/
{
long mylong = -1;
unsigned char b0, b1, b2, b3;

b0 = (unsigned char) mylong;
b1 = (unsigned char) (mylong >> 8);
b2 = (unsigned char) (mylong >>16);
b3 = (unsigned char) (mylong >>24);

// access: b0, b1, b2, b3
}

/**** Version B2 ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

// access: f_b[0], f_b[1], f_b[2], f_b[3]
}

/**** Version C2 ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_b[0], four.four_b[1], four.four_b[2],
four.four_b[3]
}

/**** Version D2 ******/
{
struct four_struct {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};

union align_array_and_long {
struct four_struct four_s;
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_s.byte0, four.four_s.byte1,
four.four_s.byte2, four.four_s.byte3

}

My feeling is the Version D2 is best: mylong is loaded into four in
one shot (no shifts etc. as in A2).

And in D2 the compiler always knows that we specify exactly which byte
we want:
four.four_s.byte0
This is different in C2: four.four_b[which_byte]
Or is it really different? :
are these 2 equivalent: four.four_s.byte0 <--> four.four_b[0] ???

..
..
..

Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given
platform.

Thanks.

anon.asdf

Chris Dollin · Aug 10, 2007

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better?

Measure them and find out.

Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?

Measure them and find out.

Richard Bos · Aug 10, 2007

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better?

Mu.

Rule one of micro-optimisation:
Don't Do It.
Rule two of micro-optimisation (for experts only!):
Don't Do It Yet.
Rule three of micro-optimisation (only under duress):
Measure, Measure, Measure.

Unless you _know_ that it matters, assume that it doesn't, and write the
clearest code. If you think you do know that it matters, first gather
evidence. Only by measuring which is the fastest will you know which is
the fastest - on your machine, using your implementation, in your
project, under your optimisation settings. And don't be surprised to
find out that you were wrong, and the difference is no more than 0.5%,
with an error of 1%.

Richard

Ben Bacarisse · Aug 10, 2007

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}

My feeling is the Version C is best.

For the fastest, try:

printf("0x%08lx\n", mylong); /*

*/

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

There is no need to lie about having an array. Version C is very
likely to work, but the standard does not guarantee accesses to any
union member other than the last one assigned to (barring the special
exception for "common initial members").

Similar comments apply to the your other code fragments.

Eric Sosman · Aug 10, 2007

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

1) Your question is about your environment -- machine(s),
compiler(s), O/S(es), etc. -- and not about C. Seek a forum
where the experts on your environment hang out.

2) If "as fast as possible" is really your goal, you
should not be using C, nor even assembly. Custom-built
hardware is the way to go. Seek a forum where chip designers
hang out.

3) This is the second time in recent days that you've
given "I want" as the only reason for doing something. You
may not understand it yet, but the context of the "I want"
can often have a huge influence on the speed of whatever code
you wind up with. For example: Is this long just sitting
around in memory, or is it the result of a recent computation
and perhaps still available in a register? Seek a forum where
compiler experts hang out.

anon.asdf · Aug 10, 2007

For the fastest, try:

printf("0x%08lx\n", mylong); /* */

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

Very good comment, about using a pointer that way!!
Thanks!
anon.asdf

Army1987 · Aug 10, 2007

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));

Bitwise shifts on negative integers are implementation-defined,
and that needn't have anything to do with endianness.

}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

#include <string.h>
memcpy(f_b, &mylong, 4);
This does the same thing you were trying to do, without the risk
of disasters if f_b doesn't happen to be correctly aligned for a
long.

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

You meant four.dummy = -1? You can only cast into a scalar type,
which a union is not.
Also, accessing one member of an union other than the last one
written in is UB, so I think the compiler is allowed to optimize
away an assignment to four.dummy if its value is not used.

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
} [snip]
Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given platform.

Whoever implemented memcpy on your platform is likely to know what
is more efficient on that specific platform better than you do.

Walter Roberson · Aug 10, 2007

The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

Make that cp[sizeof long - 1]

Ben Bacarisse · Aug 10, 2007

The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

Click to expand...

Make that cp[sizeof long - 1]

Of course, thanks.

Chris Torek · Aug 10, 2007

On a machine of *given architecture* ...

OK, I give you "MIPS" as the architecture (using the MIPS compilers).

... I want to access the individual bytes of a long (*once-off*)
as fast as possible.

Oops, now you have to decide whether this is a 32-bit MIPS (ILP32
model) or a 64-bit MIPS (I32LP64 model -- i.e., long is eight 8-bit
bytes long).

Is version A, version B, or version C better?

[where A is shift-and-mask, and B and C go through RAM]

On most compilers, version A will be *far* faster than almost
anything else. In fact, since your original code fragment had the
variable set to a constant, if you compile with optimization, the
four or eight extracted sub-parts will also be constants.

Interesting side note: if the architecture is changed to the original
DEC (now Compaq) Alpha, "byte" accesses to RAM are handled in the
compiler by doing full 8-byte machine-word accesses and then using
shift-and-mask instructions, because that is how the machine *has*
to do it. (There are special instructions like "zap" for working
with the eight 8-bit "byte fields" of a register, but loads and
stores are always full 64-bit operations.)

(The MIPS architecture is a lot more common though, as it is found
in various home gaming systems.)

pete · Aug 10, 2007

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

/* BEGIN new.c */

#include <stdio.h>

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

/* END new.c */

Walter Roberson · Aug 10, 2007

(e-mail address removed) wrote:

#include <stdio.h>

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) > 4 ?

pete · Aug 11, 2007

Walter said:
(e-mail address removed) wrote:

Click to expand...

#include <stdio.h>

Click to expand...

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

Click to expand...

What if sizeof(long) > 4 ?

/* BEGIN new.c */

#include <stdio.h>
#include <assert.h>

int main (void)
{
long mylong = 0x12345678;

assert(sizeof(long) == 4);
printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

/* END new.c */

¬a\\/b · Aug 11, 2007

#include <stdio.h>

Click to expand...

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

Click to expand...

What if sizeof(long) > 4 ?

what if this?

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef, prova=0xFF12;
unsigned char *a; int i;

if(CHAR_BIT!=8) return 0;

a= (char*) &mylong;
printf("Valore 0X%x\n",
(unsigned) ((unsigned char*)&prova)[sizeof(long)-1]);

if( ((unsigned char*)&prova)[sizeof(long)-1] == 0x12)
{for(i=0; i<sizeof(long); ++i)
printf("0x%02x ", (unsigned) a);
}
else {for(i=sizeof(long)-1; i>=0; --i)
printf("0x%02x ", (unsigned) a);
}

printf("\n");
return 0;
}

or this? How many UB do you find?
i find one in first example none in the below

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
unsigned long prova, r;
unsigned char *a;
int i;

if(CHAR_BIT!=8) return 0;
prova=0xFF;
for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i)
{r=((unsigned long)mylong & prova)>>(i*8);
printf("0x%02x ", r);
}

printf("\n");
return 0;
}

¬a\\/b · Aug 11, 2007

or this? How many UB do you find?
i find one in first example none in the below

UB in the sense the implementation give the correct result or nothing(
char_bit!=8)

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
unsigned long prova, r;
unsigned char *a;
int i;

if(CHAR_BIT!=8) return 0;
prova=0xFF;
for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i)
{r=((unsigned long)mylong & prova)>>(i*8);
printf("0x%02x ", r);

okok printf("0x%02x ", (unsigned) r);

}

printf("\n");
return 0;
}

not take all to siriusly it is the summer time i have to say something

Army1987 · Aug 11, 2007

(e-mail address removed) wrote:

Click to expand...

#include <stdio.h>

Click to expand...

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

Click to expand...

What if sizeof(long) > 4 ?

#include <stdio.h>
int main(void)
{
long mylong = 0x12345678;
unsigned char *ptr;
for (ptr = &mylong; ptr < &mylong + 1; ptr++)
printf("0x%02x ", *ptr);
putchar('\n');
return 0;
}

pete · Aug 11, 2007

¬a\/b said:
UB in the sense the implementation give the correct result or nothing(
char_bit!=8)

That's implementation defined behavior.

The result of assigning a 45 bit integer value to a long,
is also implementation defined.

¬a\\/b · Aug 12, 2007

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

this should print the number in the reverse order
is it not better

{
long mylong = -1;

if(CHAR_BIT>8 || sizeof(long)!=4) return;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
(unsigned char) (mylong >>24),
(unsigned char) (mylong >>16),
(unsigned char) (mylong >> 8),
(unsigned char) mylong
);
}

Adding adressing of IPv6 to program	1	Feb 16, 2023
byte alignment in structures and unions	20	Aug 9, 2007
How to concatinate 32bits of 4 bytes in a char array, and assigning to an int?	9	Jun 8, 2006
Why file containing 256 bytes is 257 bytes long?	12	Sep 14, 2005
Practical packing for structs of bytes	12	Sep 17, 2010
Any Ideas, please?	4	Feb 6, 2011
Converting 4 bytes to a float	6	May 9, 2007
Bitfields in struct	2	Oct 11, 2005

Access individual bytes of a 4 byte long (optimization)

anon.asdf

Chris Dollin

Richard Bos

Ben Bacarisse

Eric Sosman

anon.asdf

Army1987

Walter Roberson

Ben Bacarisse

Chris Torek

pete

Walter Roberson

pete

¬a\\/b

¬a\\/b

Army1987

pete

¬a\\/b

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads