Access individual bytes of a 4 byte long (optimization)

A

anon.asdf

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}


My feeling is the Version C is best.

What can be said about the alignment of array f_b and mylong in
Version B?
(I think in Version B, the alignment of array f_b and mylong might be
skew, in which case it is slower than C. If in Version B, four_b and
mylong are aligned, then Version B is identical to Version C.?)

..
..
..

Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?

/**** Version A2 ******/
{
long mylong = -1;
unsigned char b0, b1, b2, b3;

b0 = (unsigned char) mylong;
b1 = (unsigned char) (mylong >> 8);
b2 = (unsigned char) (mylong >>16);
b3 = (unsigned char) (mylong >>24);

// access: b0, b1, b2, b3
}

/**** Version B2 ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

// access: f_b[0], f_b[1], f_b[2], f_b[3]
}



/**** Version C2 ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_b[0], four.four_b[1], four.four_b[2],
four.four_b[3]
}

/**** Version D2 ******/
{
struct four_struct {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};

union align_array_and_long {
struct four_struct four_s;
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_s.byte0, four.four_s.byte1,
four.four_s.byte2, four.four_s.byte3

}

My feeling is the Version D2 is best: mylong is loaded into four in
one shot (no shifts etc. as in A2).

And in D2 the compiler always knows that we specify exactly which byte
we want:
four.four_s.byte0
This is different in C2: four.four_b[which_byte]
Or is it really different? :
are these 2 equivalent: four.four_s.byte0 <--> four.four_b[0] ???

..
..
..

Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given
platform.

Thanks.

anon.asdf
 
C

Chris Dollin

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better?

Measure them and find out.
Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?

Measure them and find out.
 
R

Richard Bos

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better?

Mu.

Rule one of micro-optimisation:
Don't Do It.
Rule two of micro-optimisation (for experts only!):
Don't Do It Yet.
Rule three of micro-optimisation (only under duress):
Measure, Measure, Measure.

Unless you _know_ that it matters, assume that it doesn't, and write the
clearest code. If you think you do know that it matters, first gather
evidence. Only by measuring which is the fastest will you know which is
the fastest - on your machine, using your implementation, in your
project, under your optimisation settings. And don't be surprised to
find out that you were wrong, and the difference is no more than 0.5%,
with an error of 1%.

Richard
 
B

Ben Bacarisse

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}


My feeling is the Version C is best.

For the fastest, try:

printf("0x%08lx\n", mylong); /* :) */

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

There is no need to lie about having an array. Version C is very
likely to work, but the standard does not guarantee accesses to any
union member other than the last one assigned to (barring the special
exception for "common initial members").

Similar comments apply to the your other code fragments.
 
E

Eric Sosman

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

1) Your question is about your environment -- machine(s),
compiler(s), O/S(es), etc. -- and not about C. Seek a forum
where the experts on your environment hang out.

2) If "as fast as possible" is really your goal, you
should not be using C, nor even assembly. Custom-built
hardware is the way to go. Seek a forum where chip designers
hang out.

3) This is the second time in recent days that you've
given "I want" as the only reason for doing something. You
may not understand it yet, but the context of the "I want"
can often have a huge influence on the speed of whatever code
you wind up with. For example: Is this long just sitting
around in memory, or is it the result of a recent computation
and perhaps still available in a register? Seek a forum where
compiler experts hang out.
 
A

anon.asdf

For the fastest, try:

printf("0x%08lx\n", mylong); /* :) */

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */


Very good comment, about using a pointer that way!!
Thanks!
anon.asdf
 
A

Army1987

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
Bitwise shifts on negative integers are implementation-defined,
and that needn't have anything to do with endianness.
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;
#include <string.h>
memcpy(f_b, &mylong, 4);
This does the same thing you were trying to do, without the risk
of disasters if f_b doesn't happen to be correctly aligned for a
long.
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;
You meant four.dummy = -1? You can only cast into a scalar type,
which a union is not.
Also, accessing one member of an union other than the last one
written in is UB, so I think the compiler is allowed to optimize
away an assignment to four.dummy if its value is not used.
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
} [snip]
Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given platform.
Whoever implemented memcpy on your platform is likely to know what
is more efficient on that specific platform better than you do.
 
W

Walter Roberson

The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

Make that cp[sizeof long - 1]
 
C

Chris Torek

On a machine of *given architecture* ...

OK, I give you "MIPS" as the architecture (using the MIPS compilers).
... I want to access the individual bytes of a long (*once-off*)
as fast as possible.

Oops, now you have to decide whether this is a 32-bit MIPS (ILP32
model) or a 64-bit MIPS (I32LP64 model -- i.e., long is eight 8-bit
bytes long).
Is version A, version B, or version C better?

[where A is shift-and-mask, and B and C go through RAM]

On most compilers, version A will be *far* faster than almost
anything else. In fact, since your original code fragment had the
variable set to a constant, if you compile with optimization, the
four or eight extracted sub-parts will also be constants.

Interesting side note: if the architecture is changed to the original
DEC (now Compaq) Alpha, "byte" accesses to RAM are handled in the
compiler by doing full 8-byte machine-word accesses and then using
shift-and-mask instructions, because that is how the machine *has*
to do it. (There are special instructions like "zap" for working
with the eight 8-bit "byte fields" of a register, but loads and
stores are always full 64-bit operations.)

(The MIPS architecture is a lot more common though, as it is found
in various home gaming systems.)
 
P

pete

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.


/* BEGIN new.c */

#include <stdio.h>

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

/* END new.c */
 
W

Walter Roberson

(e-mail address removed) wrote:
#include <stdio.h>
int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) > 4 ?
 
P

pete

Walter said:
(e-mail address removed) wrote:
#include <stdio.h>
int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) > 4 ?

/* BEGIN new.c */

#include <stdio.h>
#include <assert.h>

int main (void)
{
long mylong = 0x12345678;

assert(sizeof(long) == 4);
printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

/* END new.c */
 
¬

¬a\\/b

#include <stdio.h>
int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) > 4 ?

what if this?

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef, prova=0xFF12;
unsigned char *a; int i;

if(CHAR_BIT!=8) return 0;

a= (char*) &mylong;
printf("Valore 0X%x\n",
(unsigned) ((unsigned char*)&prova)[sizeof(long)-1]);

if( ((unsigned char*)&prova)[sizeof(long)-1] == 0x12)
{for(i=0; i<sizeof(long); ++i)
printf("0x%02x ", (unsigned) a);
}
else {for(i=sizeof(long)-1; i>=0; --i)
printf("0x%02x ", (unsigned) a);
}

printf("\n");
return 0;
}

or this? How many UB do you find?
i find one in first example none in the below

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
unsigned long prova, r;
unsigned char *a;
int i;

if(CHAR_BIT!=8) return 0;
prova=0xFF;
for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i)
{r=((unsigned long)mylong & prova)>>(i*8);
printf("0x%02x ", r);
}

printf("\n");
return 0;
}
 
¬

¬a\\/b

or this? How many UB do you find?
i find one in first example none in the below

UB in the sense the implementation give the correct result or nothing(
char_bit!=8)
#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
unsigned long prova, r;
unsigned char *a;
int i;

if(CHAR_BIT!=8) return 0;
prova=0xFF;
for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i)
{r=((unsigned long)mylong & prova)>>(i*8);
printf("0x%02x ", r);

okok printf("0x%02x ", (unsigned) r);
}

printf("\n");
return 0;
}

not take all to siriusly it is the summer time i have to say something
:)
 
A

Army1987

(e-mail address removed) wrote:
#include <stdio.h>
int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) > 4 ?
#include <stdio.h>
int main(void)
{
long mylong = 0x12345678;
unsigned char *ptr;
for (ptr = &mylong; ptr < &mylong + 1; ptr++)
printf("0x%02x ", *ptr);
putchar('\n');
return 0;
}
 
P

pete

¬a\/b said:
UB in the sense the implementation give the correct result or nothing(
char_bit!=8)

That's implementation defined behavior.

The result of assigning a 45 bit integer value to a long,
is also implementation defined.
 
¬

¬a\\/b

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

this should print the number in the reverse order
is it not better

{
long mylong = -1;

if(CHAR_BIT>8 || sizeof(long)!=4) return;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
(unsigned char) (mylong >>24),
(unsigned char) (mylong >>16),
(unsigned char) (mylong >> 8),
(unsigned char) mylong
);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top