upping numerical precision

rpardee · Mar 16, 2009

Hey All,

I got me this fancy method for classifying documents that basically
does this at one point:

p = 1
words.each do |w|
p *= calc_prob(w)
end
chi = -2.0 * Math.log(p)

I'm finding that p is often going to 0.0 b/c the numbers returned by
calc_prob are sometimes outlandishly small (or there are just so many
words in the doc that the loop runs long enough to zero out the
variable p). This causes problems for the call to Math.log of course
(e.g., Errno::EDOM).

I have tried two things. First, after some desperate flailing on
google I added:

require 'rational'
require 'mathn'

to my script and hoped that ruby would read my mind WRT using
rationals where possible & that rationals would extend the reach of
ruby's arithmetic into the too-outlandishly-small-for-floats range.

When that did not seem to avail me, I put this in my words.do loop:

if p == 0.0 then
p = Float::MIN
end

That works, but makes me wonder if there's a smarter thing to do w/
those rational and mathn libs to really get the effect I hoped for
just from including them in my script.

Is there?

Many thanks!

-Roy

Siep Korteling · Mar 16, 2009

Hey All, (...)
chi = -2.0 * Math.log(p)

I'm finding that p is often going to 0.0 b/c the numbers returned by
calc_prob are sometimes outlandishly small (...)
That works, but makes me wonder if there's a smarter thing to do w/
those rational and mathn libs to really get the effect I hoped for
just from including them in my script.

Is there?

Many thanks!

-Roy

Are you sure you required 'mathn' before defining your calc_prob method?

big = 10**100

small = 1/big
p small.zero? # true

require 'mathn'
small = 1/big
p small.zero? # false
p small.class # Rational

p -2.0*Math.log(small)

hth,

Siep

Roy Pardee · Mar 17, 2009

Are you sure you required 'mathn' before defining your calc_prob method?

big = 10**100

small = 1/big
p small.zero? # true

require 'mathn'
small = 1/big
p small.zero? # false
p small.class # Rational

p -2.0*Math.log(small)

hth,

Siep

Thanks for the response! I think the issue may be that I'm not doing
any division--just multiplication. Check it out:

irb(main):001:0> require 'mathn'
=> true
irb(main):002:0> x = 0.5
=> 0.5
irb(main):003:0> 1000.times do
irb(main):004:1* x *= x
irb(main):005:1> end
=> 1000
irb(main):006:0> x
=> 0.0
irb(main):007:0> x.class
=> Float
irb(main):008:0>

But the more I think about it, the more I think I'm fussing over
nothing (ha ha!). I think if my p var goes to zero, I should just set
it = Float::MIN & break out of that loop. My calc_prob method will
only ever return values <= 1, so there's no sense in letting it
continue to spin down the value of p (if you can tell what I'm trying
to say).

Thanks!

-Roy

t3ch.dude · Mar 17, 2009

Thanks for the response! I think the issue may be that I'm not doing
any division--just multiplication. Check it out:

irb(main):001:0> require 'mathn'
=> true
irb(main):002:0> x = 0.5
=> 0.5
irb(main):003:0> 1000.times do
irb(main):004:1* x *= x
irb(main):005:1> end
=> 1000
irb(main):006:0> x
=> 0.0
irb(main):007:0> x.class
=> Float
irb(main):008:0>

But the more I think about it, the more I think I'm fussing over
nothing (ha ha!). I think if my p var goes to zero, I should just set
it = Float::MIN & break out of that loop. My calc_prob method will
only ever return values <= 1, so there's no sense in letting it
continue to spin down the value of p (if you can tell what I'm trying
to say).

Thanks!

-Roy

Roy,

It all depends on how much range of data you want. If you need more
granularity at the tiny end, you can always re-normalize... just
initialize p to be 1e6 or something, rather than 1. Then after the log
you can just subtract the constant exponent to get back to your
original range.

-t3ch.dude

Sander Land · Mar 17, 2009

I don't think you need more precision. Basic math can help you here:
log(a*b) =3D log(a) + log(b)

so

logp=3D0
words.each do |w|
logp +=3D Math.log( calc_prob(w) )
end
chi =3D -2.0 * logp

Engineering numerical format PEP discussion	25	Apr 26, 2010
Standard Library patches - rational, delegate	6	May 11, 2005
Using regexes versus "in" membership test?	6	Dec 12, 2012
RNGs: A double KISS	10	Apr 14, 2010
Excel and Ruby	16	Oct 13, 2010
[SUMMARY] String Equations (#112)	0	Feb 8, 2007
Question About TCPServer & TCPSocket classes	4	Sep 6, 2008
[ANN] main-4.0.0 (for avdi)	0	Oct 21, 2009

upping numerical precision

rpardee

Siep Korteling

Roy Pardee

t3ch.dude

Sander Land

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads