Speeding Up Branches Based on Comparisons Between Floats

R

rembremading

Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)
if (f1 <= f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) <= 0)
if (f1 > f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) > 0)
if (f1 >= f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(f) <= 0x80000000U)

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)
However it seems that it works only if t is of type float.
Simply replacing float by double gives errornous results.
How do I have to modify it if I want to work with double variables
(doing a cast on a double variable "float t = (float) d;" before
I do the comparison, in general, decreases the speed again)

Best wishes, thanks in advance,
Andreas
 
B

Bartc

rembremading said:
Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

Have you tried changing these ints to long ints (or whatever is needed for
64 bits)?
// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)

And changing the constant to 0x8000000000000000 (perhaps with whatever
suffix is needed for 64 bits)?
Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)

It seems a lot more complicated than f1<f2; is it worth the effort? Have you
tested how fast doubles are? They might be already be faster than using
floats.
 
R

rembremading

Bartc said:
Have you tried changing these ints to long ints (or whatever is needed for
64 bits)?
Yes, I tried this (with long int), but it did not work.
And changing the constant to 0x8000000000000000 (perhaps with whatever
suffix is needed for 64 bits)?
This I did not try, because I am not sure what I have to use here. Any
ideas?
It seems a lot more complicated than f1<f2; is it worth the effort? Have
you tested how fast doubles are? They might be already be faster than
using floats.

Hard to tell. If I turn optimizations off it
makes a difference with both, Intel and GNU compiler.
Turning on optimizations, on the other hand, makes it hard
to distinguish from other effects.
(Trivial code without any further operations is immediately ignored by the
compilers in this case)
Speed for just double or just floats seems to be the same.
An additional cast cancels the performance gain. (And is probably not
identical to the original conditionals)
 
B

Bartc

rembremading said:
Yes, I tried this (with long int), but it did not work.

On my compiler I had to use long long int to get 64 bits.
This I did not try, because I am not sure what I have to use here. Any
ideas?

Well, the 0x8000000000000000 I mentioned, and I put ULL at the end.

This simply tests the sign bit in the 64-bit bit pattern of the double
value.
It sounds inefficient to me. A brief test did not show any improvement.
 
K

Kaz Kylheku

Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

Chapter 3 of this document contains mountains of bullshit.

The people at AMD who write these documents should stick to
assembly-language-level advice, since they clearly don't understand
the difference between C and assembly language.
#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)

The type punning performed here is undefined behavior.

Note also that this document is assuming that you have a braindead compiler
which implements comparisons in a particular way that isn't implemented well on
the Athlon.

The proper way to compensate for this is to correctly use your compiler's
documented extensions for inline assembly language. And of course to
wrap that in #ifdef's so that it's only used for that compiler.
if (f1 <= f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) <= 0)
if (f1 > f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) > 0)
if (f1 >= f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(f) <= 0x80000000U)

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)
However it seems that it works only if t is of type float.
Simply replacing float by double gives errornous results.

Is double the same size as int? Is the AMD a little-endian or big?
Is the most signficant part of a double, the part that can be compared
against 0x800000... at a lower address or a higher address?
How do I have to modify it if I want to work with double variables
(doing a cast on a double variable "float t = (float) d;" before
I do the comparison, in general, decreases the speed again)

The document doesn't make any recommendation about this.
Since they haven't extended their own nonsense to the double type,
it would be silly for anyone else to add his own nonsense.

But if I were to take a stab at it:

/* Kaz heaping bullshit upon bullshit */
#define FLOAT2INTCAST(f) (((int *)(&f + 1))[-1])
#define FLOAT2UINTCAST(f) (((unsigned int *)(&f + 1))[-1])
 
T

Tim Prince

rembremading said:
Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)
if (f1 <= f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) <= 0)
if (f1 > f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) > 0)
if (f1 >= f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(f) <= 0x80000000U)

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)
However it seems that it works only if t is of type float.
Simply replacing float by double gives errornous results.
How do I have to modify it if I want to work with double variables
(doing a cast on a double variable "float t = (float) d;" before
I do the comparison, in general, decreases the speed again)
I assume the context for this is use of compilers which lacked SSE support.
SSE2 fixed most architectural reasons for introducing such ugly violations
of standard C (which has sometimes been cited as the topic of this
newsgroup). If you are optimizing for an 8 year old compiler, by
departing from standard C, you may have difficulty finding an appropriate
forum.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top