J
Jacek Dziedzic
Hi!
I'm trying to squeeze a few clock cycles from a tight
loop that profiling shows to be a bottleneck in my program.
I'm at a point where the only thing that matters is
execution speed, not style (within the loop, obviously).
The loop deals with an array:
// this is aligned at cache line boundaries
struct hash_t {
int id;
int next;
double d0;
double d1;
double d2;
} hashtable[HASH_MAX];
within the loop, whenever there is a hashtable miss I
need to store new values into hashtable[index].
Originally this looked like
if(...) {
hashtable[index].id=some_int;
hashtable[index].next=some_int2;
hashtable[index].d0=some_double0;
hashtable[index].d1=some_double1;
hashtable[index].d2=some_double2;
}
Now... I'm trying to save a few cycles by doing something
along the lines of
if(...) {
// point to the first member
int *hashtable_ptr = reinterpret_cast<int*> &(hashtable[index].id)
*(hashtable_ptr++) = some_int; // store into id and move on
*(hashtable_ptr++) = some_int2; // store into next and move on
*(hashtable_ptr++) = some_double0; // << --- trouble
// ...
}
the trouble is that I'm working with a pointer to int
and I want to store a double and, later on, advance the
pointer not by sizeof(int) bytes, but by sizeof(double)
bytes. On my system sizeof(int) is 4, sizeof(double) is 8
and portability is not an issue.
I tried
*((reinterpret_cast<double*>(hashtable_ptr))++) = some_double0;
but I got an error:
"the result of this cast cannot be used as an lvalue".
Why's that? I really need to force the compiler to treat
this pointer as a double* for a moment... I realize I can
have two pointers to the same hashtable entry, one an int*
and one a double*, but I need to save every clock cycle I
can as this loop is executed trillions of times.
What's the usual procedure to iterate a pointer through
members of varying types? Perhaps it would be easier with
a char*, but that means advancing by 4 and 8 which,
I suppose, would be slower than plain ++? Or does it boil
to moving by 4 or 8 offsets at the assembly level too?
thanks in advance,
- J.
PS. Are pointers to different datatypes guaranteed to
be of the same size? If not, than perhaps I need
an assert here and there...
I'm trying to squeeze a few clock cycles from a tight
loop that profiling shows to be a bottleneck in my program.
I'm at a point where the only thing that matters is
execution speed, not style (within the loop, obviously).
The loop deals with an array:
// this is aligned at cache line boundaries
struct hash_t {
int id;
int next;
double d0;
double d1;
double d2;
} hashtable[HASH_MAX];
within the loop, whenever there is a hashtable miss I
need to store new values into hashtable[index].
Originally this looked like
if(...) {
hashtable[index].id=some_int;
hashtable[index].next=some_int2;
hashtable[index].d0=some_double0;
hashtable[index].d1=some_double1;
hashtable[index].d2=some_double2;
}
Now... I'm trying to save a few cycles by doing something
along the lines of
if(...) {
// point to the first member
int *hashtable_ptr = reinterpret_cast<int*> &(hashtable[index].id)
*(hashtable_ptr++) = some_int; // store into id and move on
*(hashtable_ptr++) = some_int2; // store into next and move on
*(hashtable_ptr++) = some_double0; // << --- trouble
// ...
}
the trouble is that I'm working with a pointer to int
and I want to store a double and, later on, advance the
pointer not by sizeof(int) bytes, but by sizeof(double)
bytes. On my system sizeof(int) is 4, sizeof(double) is 8
and portability is not an issue.
I tried
*((reinterpret_cast<double*>(hashtable_ptr))++) = some_double0;
but I got an error:
"the result of this cast cannot be used as an lvalue".
Why's that? I really need to force the compiler to treat
this pointer as a double* for a moment... I realize I can
have two pointers to the same hashtable entry, one an int*
and one a double*, but I need to save every clock cycle I
can as this loop is executed trillions of times.
What's the usual procedure to iterate a pointer through
members of varying types? Perhaps it would be easier with
a char*, but that means advancing by 4 and 8 which,
I suppose, would be slower than plain ++? Or does it boil
to moving by 4 or 8 offsets at the assembly level too?
thanks in advance,
- J.
PS. Are pointers to different datatypes guaranteed to
be of the same size? If not, than perhaps I need
an assert here and there...