J
Jonathan Lee
Hi all,
A petty optimization question:
I'm working on optimizing a multi-precision integer library and
wonder what the fastest way to return two unsigned longs would be.
Part of my code is a schoolbook multiplication algorithm which calls a
word-size multiplication algorithm and returns the double width result
in an array. For example
unsigned long result[2];
...
do {
mulUL(factor1, factor2, result);
...
} while(still_computing);
In an ideal world, mulUL() is just a wrapper for the assembly language
instructions and it would be great if "result" stayed in the
registers. They're used right away and then discarded.
I thought about writing a small class that contained two unsigned
longs (faking unsigned long long) and writing instead
ull_t res = mulUL(factor1, factor2);
Or even "constructing" a product
mul_result_t product(factor1, factor2);
But I figure this will just cost me a stack pointer adjustment every
loop iteration.
Any advice on how to encourage the "best" behaviour here?
--Jonathan
PS The advice about using a profiler can be considered understood. I
more general question still stands: can a small class with a short
life time can be passed efficiently?
A petty optimization question:
I'm working on optimizing a multi-precision integer library and
wonder what the fastest way to return two unsigned longs would be.
Part of my code is a schoolbook multiplication algorithm which calls a
word-size multiplication algorithm and returns the double width result
in an array. For example
unsigned long result[2];
...
do {
mulUL(factor1, factor2, result);
...
} while(still_computing);
In an ideal world, mulUL() is just a wrapper for the assembly language
instructions and it would be great if "result" stayed in the
registers. They're used right away and then discarded.
I thought about writing a small class that contained two unsigned
longs (faking unsigned long long) and writing instead
ull_t res = mulUL(factor1, factor2);
Or even "constructing" a product
mul_result_t product(factor1, factor2);
But I figure this will just cost me a stack pointer adjustment every
loop iteration.
Any advice on how to encourage the "best" behaviour here?
--Jonathan
PS The advice about using a profiler can be considered understood. I
more general question still stands: can a small class with a short
life time can be passed efficiently?