# Perl numerics: EPSILON and equality between floats

Discussion in 'Perl Misc' started by bill, Aug 11, 2005.

1. ### billGuest

I just recently came across this snippet:

my \$EPSILON = 1;
\$EPSILON /= 2 while 0.5 + \$EPSILON/2 > 0.5;

where \$EPSILON was later used, e.g., to compare floats for equality:

sub float_equal {
my (\$x, \$y) = @_;
return !(\$x||\$y) || (\$x+\$y) && abs((\$x-\$y)/(\$x+\$y)) < \$EPSILON;
}

(The above is the gist of what I recall, not verbatim production
code, but I think I got it right.)

I'm not very knowledgeable about numerics, so I was wondering
whether this was the best way to compare floats for equality.

I thought that this would be a FAQ, but I couldn't find it.

The first edition of The Perl Cookbook recommends converting both
numbers to the same string format, and then doing a string comparison:

sub equal {
my (\$A, \$B, \$dp) = @_;
return sprintf("%.\${dp}g", \$A) eq sprintf("%.\${dp}g", \$B);
}

The main reason I don't like this approach is that, in my experience,
some numerical objects, such as Math::BigFloat and Math:ari,
don't work well with sprintf.

Also, I recall that in C there are pre-defined constants (e.g.
DBL_EPSILON in limits.h) that can be used for this kind of tests.
Is there something similar in Perl?

bill

P.S. If someone in Perl publishing reads this: a book on Perl
numerics is sorely needed (fundamentals, algorithms, recipes, best
practices) is sorely needed.

bill, Aug 11, 2005

2. ### Paul LalliGuest

bill wrote:
> I'm not very knowledgeable about numerics, so I was wondering
> whether this was the best way to compare floats for equality.

Floats should not be compared for "equality". In any language.

> I thought that this would be a FAQ, but I couldn't find it.

perldoc -q 999

Paul Lalli

Paul Lalli, Aug 11, 2005

3. ### Guest

bill <> wrote:
> I just recently came across this snippet:
>
> my \$EPSILON = 1;
> \$EPSILON /= 2 while 0.5 + \$EPSILON/2 > 0.5;
>
> where \$EPSILON was later used, e.g., to compare floats for equality:
>
> sub float_equal {
> my (\$x, \$y) = @_;
> return !(\$x||\$y) || (\$x+\$y) && abs((\$x-\$y)/(\$x+\$y)) < \$EPSILON;
> }
>
> (The above is the gist of what I recall, not verbatim production
> code, but I think I got it right.)
>
> I'm not very knowledgeable about numerics, so I was wondering
> whether this was the best way to compare floats for equality.

The best way to compare floats for equality is highly dependent on
why you want to compare floats for equality.

There are no magic bullets.

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, Aug 11, 2005
4. ### billGuest

In <20050811160034.410\$> writes:

>bill <> wrote:
>> I just recently came across this snippet:
>>
>> my \$EPSILON = 1;
>> \$EPSILON /= 2 while 0.5 + \$EPSILON/2 > 0.5;
>>
>> where \$EPSILON was later used, e.g., to compare floats for equality:
>>
>> sub float_equal {
>> my (\$x, \$y) = @_;
>> return !(\$x||\$y) || (\$x+\$y) && abs((\$x-\$y)/(\$x+\$y)) < \$EPSILON;
>> }
>>
>> (The above is the gist of what I recall, not verbatim production
>> code, but I think I got it right.)
>>
>> I'm not very knowledgeable about numerics, so I was wondering
>> whether this was the best way to compare floats for equality.

>The best way to compare floats for equality is highly dependent on
>why you want to compare floats for equality.

That's interesting. I can't think of any examples in which method
X would be better than method Y for some purposes but worse for
others. Can you give an example?

In recent times, I've needed to compare floats in unit tests
(expected vs. obtained). I typically use an approach similar to
the one I described in my OP (i.e. declare two floats as "equal"
if the difference between them is below some tolerance), except
that I just pull the tolerance out of thin air (I usually go for
2**-20, instead of having a halfway defensible procedure for
determining it.

bill

bill, Aug 11, 2005
5. ### Anno SiegelGuest

bill <> wrote in comp.lang.perl.misc:
>
>
>
>
> I just recently came across this snippet:
>
> my \$EPSILON = 1;
> \$EPSILON /= 2 while 0.5 + \$EPSILON/2 > 0.5;
>
> where \$EPSILON was later used, e.g., to compare floats for equality:
>
> sub float_equal {
> my (\$x, \$y) = @_;
> return !(\$x||\$y) || (\$x+\$y) && abs((\$x-\$y)/(\$x+\$y)) < \$EPSILON;
> }
>
> (The above is the gist of what I recall, not verbatim production
> code, but I think I got it right.)
>
> I'm not very knowledgeable about numerics, so I was wondering
> whether this was the best way to compare floats for equality.
>
> I thought that this would be a FAQ, but I couldn't find it.
>
> The first edition of The Perl Cookbook recommends converting both
> numbers to the same string format, and then doing a string comparison:
>
> sub equal {
> my (\$A, \$B, \$dp) = @_;
> return sprintf("%.\${dp}g", \$A) eq sprintf("%.\${dp}g", \$B);
> }

For an ad-hoc approach I simply use "eq" on the float-point arguments
(without sprintf). "If it prints the same, it is the same."

> The main reason I don't like this approach is that, in my experience,
> some numerical objects, such as Math::BigFloat and Math:ari,
> don't work well with sprintf.

For comparison of non-native number formats you'll have to resort to
special methods. Unless the package comes with its own sprintf(),
sprintf-based methods won't work. The epsilon-finding procedure
would probably loop forever.

> Also, I recall that in C there are pre-defined constants (e.g.
> DBL_EPSILON in limits.h) that can be used for this kind of tests.
> Is there something similar in Perl?

Those constants are in POSIX.

> P.S. If someone in Perl publishing reads this: a book on Perl
> numerics is sorely needed (fundamentals, algorithms, recipes, best
> practices) is sorely needed.

Perl doesn't have its own numerics, it uses the native operations
of the C compiler. A book about Perl numerics would have to be one
about computer numerics in general, which abound.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the

Anno Siegel, Aug 12, 2005
6. ### billGuest

In <ddhtkk\$iuq\$-Berlin.DE> -berlin.de (Anno Siegel) writes:

>For an ad-hoc approach I simply use "eq" on the float-point arguments
>(without sprintf). "If it prints the same, it is the same."

Another objection I have against string-based methods (other than
the fact that it doesn't work well with certain numeric classes)
is the loss of precision incurred by translating to base 10. If
there were a way to do a "binary eq", where the string representation
was in base 2 instead of base 10, your approach would be fine with
me.

>> Also, I recall that in C there are pre-defined constants (e.g.
>> DBL_EPSILON in limits.h) that can be used for this kind of tests.
>> Is there something similar in Perl?

>Those constants are in POSIX.

Thanks!

>If you want to post a followup via groups.google.com, don't use
>the broken "Reply" link at the bottom of the article. Click on
>"show options" at the top of the article, then click on the

Is it my imagination or did Google Groups go to the dogs after they
"overhauled it" about a year ago? It's got to be the buggiest of

bill

bill, Aug 12, 2005
7. ### Anno SiegelGuest

bill <> wrote in comp.lang.perl.misc:
> In <ddhtkk\$iuq\$-Berlin.DE>
> -berlin.de (Anno Siegel) writes:
>
> >For an ad-hoc approach I simply use "eq" on the float-point arguments
> >(without sprintf). "If it prints the same, it is the same."

>
> Another objection I have against string-based methods (other than
> the fact that it doesn't work well with certain numeric classes)
> is the loss of precision incurred by translating to base 10. If
> there were a way to do a "binary eq", where the string representation
> was in base 2 instead of base 10, your approach would be fine with
> me.

But you use rounding, or epsilon fuzz, to get rid of spurious precision.
If you round off only a single decimal digit, you're giving up more
precision than the conversion to decimal costs in the first place.

Something like that "binary eq" could probably be written (in Perl
or otherwise), but what would it gain? In effect, it gives you a
choice of epsilons that are powers of two, instead of powers of ten
like rounding. The practical difference is small.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the

Anno Siegel, Aug 12, 2005
8. ### billGuest

In <ddi69e\$o30\$-Berlin.DE> -berlin.de (Anno Siegel) writes:

>bill <> wrote in comp.lang.perl.misc:
>> In <ddhtkk\$iuq\$-Berlin.DE>
>> -berlin.de (Anno Siegel) writes:
>>
>> >For an ad-hoc approach I simply use "eq" on the float-point arguments
>> >(without sprintf). "If it prints the same, it is the same."

>>
>> Another objection I have against string-based methods (other than
>> the fact that it doesn't work well with certain numeric classes)
>> is the loss of precision incurred by translating to base 10. If
>> there were a way to do a "binary eq", where the string representation
>> was in base 2 instead of base 10, your approach would be fine with
>> me.

>But you use rounding, or epsilon fuzz, to get rid of spurious precision.
>If you round off only a single decimal digit, you're giving up more
>precision than the conversion to decimal costs in the first place.

>Something like that "binary eq" could probably be written (in Perl
>or otherwise), but what would it gain? In effect, it gives you a
>choice of epsilons that are powers of two, instead of powers of ten
>like rounding. The practical difference is small.

The effect is to leave it up to the user to decide, instead of
forcing him/her to rationalize a suboptimal situation after the
fact. The phrase "practical difference" makes a huge blanket
1%? 0.0001%? The answer obviously depends on the application, and
my philosophy is to give as much power to the user as possible.
In this case, the best that can be done is to allow a binary
specification of precision, and let the user avail him/herself of
this as desired.

I suppose "binary eq" could be written using pack/unpack, but even
though I've been programming Perl for 8+ years and consider myself
pretty proficient at it, I remain a pack/unpack illiterate. For
some reason, even after I study pack/unpack intensively, its
operations seem mysterious to me, and therefore my minds retains
whatever I learn about it for roughly one nanosecond. Maybe I
should just read the perl source to finally understand it.

bill

bill, Aug 12, 2005
9. ### Anno SiegelGuest

bill <> wrote in comp.lang.perl.misc:
> In <ddi69e\$o30\$-Berlin.DE>
> -berlin.de (Anno Siegel) writes:
>
> >bill <> wrote in comp.lang.perl.misc:
> >> In <ddhtkk\$iuq\$-Berlin.DE>
> >> -berlin.de (Anno Siegel) writes:

[snip comparison of floats]

> >Something like that "binary eq" could probably be written (in Perl
> >or otherwise), but what would it gain? In effect, it gives you a
> >choice of epsilons that are powers of two, instead of powers of ten
> >like rounding. The practical difference is small.

>
> The effect is to leave it up to the user to decide, instead of
> forcing him/her to rationalize a suboptimal situation after the
> fact. The phrase "practical difference" makes a huge blanket
> 1%? 0.0001%?

Hmmm? Even the rounding approach gives you the choice of exactly
that? I don't get the big diff. The right approach to float
comparison is an epsilon approach, where (in principle) epsilon
must be specified for each comparison. The same goes for rounding,
whether done in decimal or binary. In principle, the number of
places to round must be specified on each comparison. There's your
chance to decide what's good enough. Nothing is being forced on you.

The rounding approach is basically equivalent to epsilon values
that are powers of ten (or two), just more convenient than a
specific comparison procedure. You get fewer choices of epsilon,
but you don't need that many.

> The answer obviously depends on the application, and
> my philosophy is to give as much power to the user as possible.
> In this case, the best that can be done is to allow a binary
> specification of precision, and let the user avail him/herself of
> this as desired.

Nah... that would depend on the odds and ends of the local representation
of floating point numbers. That's not a good thing for a program to
depend on. Specify the "range of equality" numerically, explicitly
or implicitly through rounding.

> I suppose "binary eq" could be written using pack/unpack, but even
> though I've been programming Perl for 8+ years and consider myself
> pretty proficient at it, I remain a pack/unpack illiterate. For
> some reason, even after I study pack/unpack intensively, its
> operations seem mysterious to me, and therefore my minds retains
> whatever I learn about it for roughly one nanosecond. Maybe I
> should just read the perl source to finally understand it.

That's not the way to understand pack/unpack. To learn about them,
play with them. Few people learn all the pack/unpack specifications
by heart, that's something to look up in each case. But it's useful
to acquaint oneself with the basic possibilities, so write a few test
cases and see what you get. It isn't used often, but for some things
it is just the ticket. One of them is reading of fixed-column data,
another (somewhat surprisingly) is counting the one-bits in a bit
string. Pack/unpack might be used in a "binary compare", but not
before we know what exactly it does.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the

Anno Siegel, Aug 12, 2005
10. ### Guest

bill <> wrote:
> In <20050811160034.410\$> writes:
>
> >bill <> wrote:
> >> I just recently came across this snippet:
> >>
> >> my \$EPSILON = 1;
> >> \$EPSILON /= 2 while 0.5 + \$EPSILON/2 > 0.5;
> >>
> >> where \$EPSILON was later used, e.g., to compare floats for equality:
> >>
> >> sub float_equal {
> >> my (\$x, \$y) = @_;
> >> return !(\$x||\$y) || (\$x+\$y) && abs((\$x-\$y)/(\$x+\$y)) < \$EPSILON;
> >> }
> >>
> >> (The above is the gist of what I recall, not verbatim production
> >> code, but I think I got it right.)
> >>
> >> I'm not very knowledgeable about numerics, so I was wondering
> >> whether this was the best way to compare floats for equality.

>
> >The best way to compare floats for equality is highly dependent on
> >why you want to compare floats for equality.

>
> That's interesting. I can't think of any examples in which method
> X would be better than method Y for some purposes but worse for
> others.

You mean in general, or just in this context?

> Can you give an example?

If you are running an optimization, or an interation-until-convergence,
where the score function loses all physical significance after 3 digits but
you converge out to 28 digits, merely because your machine supports that
precision, you are likely to waste hours or decades of computer time for no
purpose.

>
> In recent times, I've needed to compare floats in unit tests
> (expected vs. obtained). I typically use an approach similar to
> the one I described in my OP (i.e. declare two floats as "equal"
> if the difference between them is below some tolerance), except
> that I just pull the tolerance out of thin air (I usually go for
> 2**-20, instead of having a halfway defensible procedure for
> determining it.

The defensible tolerance level is the one that makes sense given the
physical interpretation of the floats. Otherwise your algorithm
which gives a perfectly good answer will be deemed wrong simply because
your machine precision is high enough to detect a meaningless difference.
Or an algorithm that gives a fatally wrong answer (on a particular machine)
will be deemed right merely because your machine precision is too low to
realize it is wrong.

Xho

--