Perl numerics: EPSILON and equality between floats

B

bill

I just recently came across this snippet:

my $EPSILON = 1;
$EPSILON /= 2 while 0.5 + $EPSILON/2 > 0.5;

where $EPSILON was later used, e.g., to compare floats for equality:

sub float_equal {
my ($x, $y) = @_;
return !($x||$y) || ($x+$y) && abs(($x-$y)/($x+$y)) < $EPSILON;
}

(The above is the gist of what I recall, not verbatim production
code, but I think I got it right.)

I'm not very knowledgeable about numerics, so I was wondering
whether this was the best way to compare floats for equality.

I thought that this would be a FAQ, but I couldn't find it.

The first edition of The Perl Cookbook recommends converting both
numbers to the same string format, and then doing a string comparison:

sub equal {
my ($A, $B, $dp) = @_;
return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B);
}

The main reason I don't like this approach is that, in my experience,
some numerical objects, such as Math::BigFloat and Math::pari,
don't work well with sprintf.

Also, I recall that in C there are pre-defined constants (e.g.
DBL_EPSILON in limits.h) that can be used for this kind of tests.
Is there something similar in Perl?

Many thanks in advance!

bill

P.S. If someone in Perl publishing reads this: a book on Perl
numerics is sorely needed (fundamentals, algorithms, recipes, best
practices) is sorely needed.
 
P

Paul Lalli

bill said:
I'm not very knowledgeable about numerics, so I was wondering
whether this was the best way to compare floats for equality.

Floats should not be compared for "equality". In any language.
I thought that this would be a FAQ, but I couldn't find it.

perldoc -q 999

Paul Lalli
 
X

xhoster

bill said:
I just recently came across this snippet:

my $EPSILON = 1;
$EPSILON /= 2 while 0.5 + $EPSILON/2 > 0.5;

where $EPSILON was later used, e.g., to compare floats for equality:

sub float_equal {
my ($x, $y) = @_;
return !($x||$y) || ($x+$y) && abs(($x-$y)/($x+$y)) < $EPSILON;
}

(The above is the gist of what I recall, not verbatim production
code, but I think I got it right.)

I'm not very knowledgeable about numerics, so I was wondering
whether this was the best way to compare floats for equality.

The best way to compare floats for equality is highly dependent on
why you want to compare floats for equality.

There are no magic bullets.

Xho
 
B

bill

The best way to compare floats for equality is highly dependent on
why you want to compare floats for equality.

That's interesting. I can't think of any examples in which method
X would be better than method Y for some purposes but worse for
others. Can you give an example?

In recent times, I've needed to compare floats in unit tests
(expected vs. obtained). I typically use an approach similar to
the one I described in my OP (i.e. declare two floats as "equal"
if the difference between them is below some tolerance), except
that I just pull the tolerance out of thin air (I usually go for
2**-20, instead of having a halfway defensible procedure for
determining it.

bill
 
A

Anno Siegel

bill said:
I just recently came across this snippet:

my $EPSILON = 1;
$EPSILON /= 2 while 0.5 + $EPSILON/2 > 0.5;

where $EPSILON was later used, e.g., to compare floats for equality:

sub float_equal {
my ($x, $y) = @_;
return !($x||$y) || ($x+$y) && abs(($x-$y)/($x+$y)) < $EPSILON;
}

(The above is the gist of what I recall, not verbatim production
code, but I think I got it right.)

I'm not very knowledgeable about numerics, so I was wondering
whether this was the best way to compare floats for equality.

I thought that this would be a FAQ, but I couldn't find it.

The first edition of The Perl Cookbook recommends converting both
numbers to the same string format, and then doing a string comparison:

sub equal {
my ($A, $B, $dp) = @_;
return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B);
}

For an ad-hoc approach I simply use "eq" on the float-point arguments
(without sprintf). "If it prints the same, it is the same."
The main reason I don't like this approach is that, in my experience,
some numerical objects, such as Math::BigFloat and Math::pari,
don't work well with sprintf.

For comparison of non-native number formats you'll have to resort to
special methods. Unless the package comes with its own sprintf(),
sprintf-based methods won't work. The epsilon-finding procedure
would probably loop forever.
Also, I recall that in C there are pre-defined constants (e.g.
DBL_EPSILON in limits.h) that can be used for this kind of tests.
Is there something similar in Perl?

Those constants are in POSIX.
P.S. If someone in Perl publishing reads this: a book on Perl
numerics is sorely needed (fundamentals, algorithms, recipes, best
practices) is sorely needed.

Perl doesn't have its own numerics, it uses the native operations
of the C compiler. A book about Perl numerics would have to be one
about computer numerics in general, which abound.

Anno
 
B

bill

In said:
For an ad-hoc approach I simply use "eq" on the float-point arguments
(without sprintf). "If it prints the same, it is the same."

Another objection I have against string-based methods (other than
the fact that it doesn't work well with certain numeric classes)
is the loss of precision incurred by translating to base 10. If
there were a way to do a "binary eq", where the string representation
was in base 2 instead of base 10, your approach would be fine with
me.
Those constants are in POSIX.
Thanks!

If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

Is it my imagination or did Google Groups go to the dogs after they
"overhauled it" about a year ago? It's got to be the buggiest of
all Google services.

bill
 
A

Anno Siegel

bill said:
In <[email protected]>


Another objection I have against string-based methods (other than
the fact that it doesn't work well with certain numeric classes)
is the loss of precision incurred by translating to base 10. If
there were a way to do a "binary eq", where the string representation
was in base 2 instead of base 10, your approach would be fine with
me.

But you use rounding, or epsilon fuzz, to get rid of spurious precision.
If you round off only a single decimal digit, you're giving up more
precision than the conversion to decimal costs in the first place.

Something like that "binary eq" could probably be written (in Perl
or otherwise), but what would it gain? In effect, it gives you a
choice of epsilons that are powers of two, instead of powers of ten
like rounding. The practical difference is small.

Anno
 
B

bill

But you use rounding, or epsilon fuzz, to get rid of spurious precision.
If you round off only a single decimal digit, you're giving up more
precision than the conversion to decimal costs in the first place.
Something like that "binary eq" could probably be written (in Perl
or otherwise), but what would it gain? In effect, it gives you a
choice of epsilons that are powers of two, instead of powers of ten
like rounding. The practical difference is small.

The effect is to leave it up to the user to decide, instead of
forcing him/her to rationalize a suboptimal situation after the
fact. The phrase "practical difference" makes a huge blanket
assumption about the task at hand. What's "good enough"? 10%?
1%? 0.0001%? The answer obviously depends on the application, and
my philosophy is to give as much power to the user as possible.
In this case, the best that can be done is to allow a binary
specification of precision, and let the user avail him/herself of
this as desired.

I suppose "binary eq" could be written using pack/unpack, but even
though I've been programming Perl for 8+ years and consider myself
pretty proficient at it, I remain a pack/unpack illiterate. For
some reason, even after I study pack/unpack intensively, its
operations seem mysterious to me, and therefore my minds retains
whatever I learn about it for roughly one nanosecond. Maybe I
should just read the perl source to finally understand it.

bill
 
A

Anno Siegel

bill said:

[snip comparison of floats]
The effect is to leave it up to the user to decide, instead of
forcing him/her to rationalize a suboptimal situation after the
fact. The phrase "practical difference" makes a huge blanket
assumption about the task at hand. What's "good enough"? 10%?
1%? 0.0001%?

Hmmm? Even the rounding approach gives you the choice of exactly
that? I don't get the big diff. The right approach to float
comparison is an epsilon approach, where (in principle) epsilon
must be specified for each comparison. The same goes for rounding,
whether done in decimal or binary. In principle, the number of
places to round must be specified on each comparison. There's your
chance to decide what's good enough. Nothing is being forced on you.

The rounding approach is basically equivalent to epsilon values
that are powers of ten (or two), just more convenient than a
specific comparison procedure. You get fewer choices of epsilon,
but you don't need that many.
The answer obviously depends on the application, and
my philosophy is to give as much power to the user as possible.
In this case, the best that can be done is to allow a binary
specification of precision, and let the user avail him/herself of
this as desired.

Nah... that would depend on the odds and ends of the local representation
of floating point numbers. That's not a good thing for a program to
depend on. Specify the "range of equality" numerically, explicitly
or implicitly through rounding.
I suppose "binary eq" could be written using pack/unpack, but even
though I've been programming Perl for 8+ years and consider myself
pretty proficient at it, I remain a pack/unpack illiterate. For
some reason, even after I study pack/unpack intensively, its
operations seem mysterious to me, and therefore my minds retains
whatever I learn about it for roughly one nanosecond. Maybe I
should just read the perl source to finally understand it.

That's not the way to understand pack/unpack. To learn about them,
play with them. Few people learn all the pack/unpack specifications
by heart, that's something to look up in each case. But it's useful
to acquaint oneself with the basic possibilities, so write a few test
cases and see what you get. It isn't used often, but for some things
it is just the ticket. One of them is reading of fixed-column data,
another (somewhat surprisingly) is counting the one-bits in a bit
string. Pack/unpack might be used in a "binary compare", but not
before we know what exactly it does.

Anno
 
X

xhoster

bill said:
That's interesting. I can't think of any examples in which method
X would be better than method Y for some purposes but worse for
others.

You mean in general, or just in this context?
Can you give an example?

If you are running an optimization, or an interation-until-convergence,
where the score function loses all physical significance after 3 digits but
you converge out to 28 digits, merely because your machine supports that
precision, you are likely to waste hours or decades of computer time for no
purpose.
In recent times, I've needed to compare floats in unit tests
(expected vs. obtained). I typically use an approach similar to
the one I described in my OP (i.e. declare two floats as "equal"
if the difference between them is below some tolerance), except
that I just pull the tolerance out of thin air (I usually go for
2**-20, instead of having a halfway defensible procedure for
determining it.

The defensible tolerance level is the one that makes sense given the
physical interpretation of the floats. Otherwise your algorithm
which gives a perfectly good answer will be deemed wrong simply because
your machine precision is high enough to detect a meaningless difference.
Or an algorithm that gives a fatally wrong answer (on a particular machine)
will be deemed right merely because your machine precision is too low to
realize it is wrong.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top