A
anon.asdf
Hi!
On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.
Is version A, version B, or version C better? Are there other
alternatives?
/**** Version A ******/
{
long mylong = -1;
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}
/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];
*((long *)&f_b) = mylong;
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}
/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};
long mylong = -1;
union align_array_and_long four;
four = (union align_array_and_long) mylong;
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}
My feeling is the Version C is best.
What can be said about the alignment of array f_b and mylong in
Version B?
(I think in Version B, the alignment of array f_b and mylong might be
skew, in which case it is slower than C. If in Version B, four_b and
mylong are aligned, then Version B is identical to Version C.?)
..
..
..
Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?
/**** Version A2 ******/
{
long mylong = -1;
unsigned char b0, b1, b2, b3;
b0 = (unsigned char) mylong;
b1 = (unsigned char) (mylong >> 8);
b2 = (unsigned char) (mylong >>16);
b3 = (unsigned char) (mylong >>24);
// access: b0, b1, b2, b3
}
/**** Version B2 ******/
{
long mylong = -1;
unsigned char f_b[4];
*((long *)&f_b) = mylong;
// access: f_b[0], f_b[1], f_b[2], f_b[3]
}
/**** Version C2 ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};
long mylong = -1;
union align_array_and_long four;
four = (union align_array_and_long) mylong;
// access: four.four_b[0], four.four_b[1], four.four_b[2],
four.four_b[3]
}
/**** Version D2 ******/
{
struct four_struct {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};
union align_array_and_long {
struct four_struct four_s;
long dummy;
};
long mylong = -1;
union align_array_and_long four;
four = (union align_array_and_long) mylong;
// access: four.four_s.byte0, four.four_s.byte1,
four.four_s.byte2, four.four_s.byte3
}
My feeling is the Version D2 is best: mylong is loaded into four in
one shot (no shifts etc. as in A2).
And in D2 the compiler always knows that we specify exactly which byte
we want:
four.four_s.byte0
This is different in C2: four.four_b[which_byte]
Or is it really different? :
are these 2 equivalent: four.four_s.byte0 <--> four.four_b[0] ???
..
..
..
Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given
platform.
Thanks.
anon.asdf
On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.
Is version A, version B, or version C better? Are there other
alternatives?
/**** Version A ******/
{
long mylong = -1;
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >> 8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}
/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];
*((long *)&f_b) = mylong;
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}
/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};
long mylong = -1;
union align_array_and_long four;
four = (union align_array_and_long) mylong;
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}
My feeling is the Version C is best.
What can be said about the alignment of array f_b and mylong in
Version B?
(I think in Version B, the alignment of array f_b and mylong might be
skew, in which case it is slower than C. If in Version B, four_b and
mylong are aligned, then Version B is identical to Version C.?)
..
..
..
Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?
/**** Version A2 ******/
{
long mylong = -1;
unsigned char b0, b1, b2, b3;
b0 = (unsigned char) mylong;
b1 = (unsigned char) (mylong >> 8);
b2 = (unsigned char) (mylong >>16);
b3 = (unsigned char) (mylong >>24);
// access: b0, b1, b2, b3
}
/**** Version B2 ******/
{
long mylong = -1;
unsigned char f_b[4];
*((long *)&f_b) = mylong;
// access: f_b[0], f_b[1], f_b[2], f_b[3]
}
/**** Version C2 ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};
long mylong = -1;
union align_array_and_long four;
four = (union align_array_and_long) mylong;
// access: four.four_b[0], four.four_b[1], four.four_b[2],
four.four_b[3]
}
/**** Version D2 ******/
{
struct four_struct {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};
union align_array_and_long {
struct four_struct four_s;
long dummy;
};
long mylong = -1;
union align_array_and_long four;
four = (union align_array_and_long) mylong;
// access: four.four_s.byte0, four.four_s.byte1,
four.four_s.byte2, four.four_s.byte3
}
My feeling is the Version D2 is best: mylong is loaded into four in
one shot (no shifts etc. as in A2).
And in D2 the compiler always knows that we specify exactly which byte
we want:
four.four_s.byte0
This is different in C2: four.four_b[which_byte]
Or is it really different? :
are these 2 equivalent: four.four_s.byte0 <--> four.four_b[0] ???
..
..
..
Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given
platform.
Thanks.
anon.asdf