Speeding Up Branches Based on Comparisons Between Floats

rembremading · Feb 19, 2009

Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)
if (f1 <= f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) <= 0)
if (f1 > f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) > 0)
if (f1 >= f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(f) <= 0x80000000U)

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)
However it seems that it works only if t is of type float.
Simply replacing float by double gives errornous results.
How do I have to modify it if I want to work with double variables
(doing a cast on a double variable "float t = (float) d;" before
I do the comparison, in general, decreases the speed again)

Best wishes, thanks in advance,
Andreas

Bartc · Feb 19, 2009

rembremading said:
Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

Have you tried changing these ints to long ints (or whatever is needed for
64 bits)?

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)

And changing the constant to 0x8000000000000000 (perhaps with whatever
suffix is needed for 64 bits)?

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)

It seems a lot more complicated than f1<f2; is it worth the effort? Have you
tested how fast doubles are? They might be already be faster than using
floats.

rembremading · Feb 19, 2009

Bartc said:
Have you tried changing these ints to long ints (or whatever is needed for
64 bits)?

Yes, I tried this (with long int), but it did not work.

And changing the constant to 0x8000000000000000 (perhaps with whatever
suffix is needed for 64 bits)?

This I did not try, because I am not sure what I have to use here. Any
ideas?

It seems a lot more complicated than f1<f2; is it worth the effort? Have
you tested how fast doubles are? They might be already be faster than
using floats.

Hard to tell. If I turn optimizations off it
makes a difference with both, Intel and GNU compiler.
Turning on optimizations, on the other hand, makes it hard
to distinguish from other effects.
(Trivial code without any further operations is immediately ignored by the
compilers in this case)
Speed for just double or just floats seems to be the same.
An additional cast cancels the performance gain. (And is probably not
identical to the original conditionals)

Bartc · Feb 19, 2009

rembremading said:
Yes, I tried this (with long int), but it did not work.

On my compiler I had to use long long int to get 64 bits.

This I did not try, because I am not sure what I have to use here. Any
ideas?

Well, the 0x8000000000000000 I mentioned, and I put ULL at the end.

This simply tests the sign bit in the 64-bit bit pattern of the double
value.
It sounds inefficient to me. A brief test did not show any improvement.

Kaz Kylheku · Feb 19, 2009

Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

Chapter 3 of this document contains mountains of bullshit.

The people at AMD who write these documents should stick to
assembly-language-level advice, since they clearly don't understand
the difference between C and assembly language.

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)

The type punning performed here is undefined behavior.

Note also that this document is assuming that you have a braindead compiler
which implements comparisons in a particular way that isn't implemented well on
the Athlon.

The proper way to compensate for this is to correctly use your compiler's
documented extensions for inline assembly language. And of course to
wrap that in #ifdef's so that it's only used for that compiler.

if (f1 <= f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) <= 0)
if (f1 > f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) > 0)
if (f1 >= f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(f) <= 0x80000000U)

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)
However it seems that it works only if t is of type float.
Simply replacing float by double gives errornous results.

Is double the same size as int? Is the AMD a little-endian or big?
Is the most signficant part of a double, the part that can be compared
against 0x800000... at a lower address or a higher address?

How do I have to modify it if I want to work with double variables
(doing a cast on a double variable "float t = (float) d;" before
I do the comparison, in general, decreases the speed again)

The document doesn't make any recommendation about this.
Since they haven't extended their own nonsense to the double type,
it would be silly for anyone else to add his own nonsense.

But if I were to take a stab at it:

/* Kaz heaping bullshit upon bullshit */
#define FLOAT2INTCAST(f) (((int *)(&f + 1))[-1])
#define FLOAT2UINTCAST(f) (((unsigned int *)(&f + 1))[-1])

Tim Prince · Feb 19, 2009

rembremading said:
Hi all,

The "AMD Athlon Processor x86 Code Optimization Guide" gives the following
tips
for "Speeding Up Branches Based on Comparisons Between Floats"

#define FLOAT2INTCAST(f) (*((int *)(&f)))
#define FLOAT2UINTCAST(f) (*((unsigned int *)(&f)))

// comparisons among two floats
if (f1 < f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(t) > 0x80000000U)
if (f1 <= f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) <= 0)
if (f1 > f2) ==>
float t = f1 - f2;
if (FLOAT2INTCAST(t) > 0)
if (f1 >= f2) ==>
float t = f1 - f2;
if (FLOAT2UINTCAST(f) <= 0x80000000U)

Indeed if find that this increases the speed of float comparisons
(depending on the compiler optimization settings, however)
However it seems that it works only if t is of type float.
Simply replacing float by double gives errornous results.
How do I have to modify it if I want to work with double variables
(doing a cast on a double variable "float t = (float) d;" before
I do the comparison, in general, decreases the speed again)

I assume the context for this is use of compilers which lacked SSE support.
SSE2 fixed most architectural reasons for introducing such ugly violations
of standard C (which has sometimes been cited as the topic of this
newsgroup). If you are optimizing for an 8 year old compiler, by
departing from standard C, you may have difficulty finding an appropriate
forum.

How to declare vector of generic template class	1	Nov 25, 2008
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
performance of std::vector<double>, double[] and uBlas::vector ondifferent CPU	15	Apr 24, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 1, 2008
ANN: Sequel 2.10.0 Released	2	Feb 3, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008

Speeding Up Branches Based on Comparisons Between Floats

rembremading

Bartc

rembremading

Bartc

Kaz Kylheku

Tim Prince

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads