can multi-core improve single funciton?

O

oyster

I mean this
Code:
def fib(n):
    if n<=1:
        return 1
    return fib(n-1)+fib(n-2)

useCore(1)
timeit(fib(500)) #this show 20 seconds

useCore(2)
timeit(fib(500)) #this show 10 seconds

Is it possible?

and more, can threads/multi-processors/clusters be used to improve fib?

thanx
 
S

Steven D'Aprano

I mean this
Code:
def fib(n):
if n<=1:
return 1
return fib(n-1)+fib(n-2)

useCore(1)
timeit(fib(500)) #this show 20 seconds

useCore(2)
timeit(fib(500)) #this show 10 seconds

Is it possible?

What does useCore(n) mean?

and more, can threads/multi-processors/clusters be used to improve fib?

No. The best way to improve fib is to improve fib, not to try running it
on faster hardware.

Your algorithm makes *many* recursive calls to fib() for each call:

fib(1) => 1 function call
fib(2) => 3 function calls
fib(3) => 5 function calls
fib(4) => 9 function calls
fib(5) => 15 function calls
fib(6) => 25 function calls
fib(7) => 41 function calls
fib(8) => 67 function calls
fib(9) => 109 function calls
....
fib(34) => 18454929 function calls
fib(35) => 29860703 function calls
fib(36) => 48315633 function calls
....
fib(498) => 1.723e+104 function calls
fib(499) => 2.788e+104 function calls
fib(500) => 4.511e+104 function calls

Calling fib(500) the way you do will make more than 450 thousand billion
billion billion billion billion billion billion billion billion billion
billion function calls. You claim that you calculated it in just 20
seconds. On my PC, it takes over a minute to calculate fib(38). I
estimate it would take over five hours to calculate fib(50), and fib(100)
would probably take 16 million years. I call shenanigans.
 
G

Gerhard Weis

Once I have seen Haskell code, that ran fibonacci on a 4 core system.

The algorithm itself basically used an extra granularity paramet until
which new threads will be sparked. e.g. fib(40, 36) means, calculate
fib(40) and spark threads until n=36.
1. divide: fib(n-1), fib(n-2)
2. divide: fib(n-2), fib(n-3)
3. divide: fib(n-3), fib(n-4)

We tried this code on a 4 core machine using only 1 core and all 4 cores.
1 core wallclock: ~10s
4 core wallclock: ~3s

So there is a signifcant speedup, but I guess, this only works with
Haskell out of the box.
I have not tried it, but I think you can simulate Haskell's behaviour,
by caching the results of fib(n).
(We tried the same algorithm in Java. While it utilized all 4 cores,
there was no speedup. This is why I think, that result caching is the
only way to speedup fibonacci.

cheers

Gerhard


I mean this
Code:
def fib(n):
if n<=1:
return 1
return fib(n-1)+fib(n-2)

useCore(1)
timeit(fib(500)) #this show 20 seconds

useCore(2)
timeit(fib(500)) #this show 10 seconds

Is it possible?

What does useCore(n) mean?

and more, can threads/multi-processors/clusters be used to improve fib?

No. The best way to improve fib is to improve fib, not to try running it
on faster hardware.

Your algorithm makes *many* recursive calls to fib() for each call:

fib(1) => 1 function call
fib(2) => 3 function calls
fib(3) => 5 function calls
fib(4) => 9 function calls
fib(5) => 15 function calls
fib(6) => 25 function calls
fib(7) => 41 function calls
fib(8) => 67 function calls
fib(9) => 109 function calls
...
fib(34) => 18454929 function calls
fib(35) => 29860703 function calls
fib(36) => 48315633 function calls
...
fib(498) => 1.723e+104 function calls
fib(499) => 2.788e+104 function calls
fib(500) => 4.511e+104 function calls

Calling fib(500) the way you do will make more than 450 thousand billion
billion billion billion billion billion billion billion billion billion
billion function calls. You claim that you calculated it in just 20
seconds. On my PC, it takes over a minute to calculate fib(38). I
estimate it would take over five hours to calculate fib(50), and fib(100)
would probably take 16 million years. I call shenanigans.
 
S

Steven D'Aprano

Once I have seen Haskell code, that ran fibonacci on a 4 core system.

The algorithm itself basically used an extra granularity paramet until
which new threads will be sparked. e.g. fib(40, 36) means, calculate
fib(40) and spark threads until n=36. 1. divide: fib(n-1), fib(n-2)
2. divide: fib(n-2), fib(n-3)
3. divide: fib(n-3), fib(n-4)

We tried this code on a 4 core machine using only 1 core and all 4
cores. 1 core wallclock: ~10s
4 core wallclock: ~3s

Three seconds to calculate fib(40) is terrible. Well, it's a great saving
over ten seconds, but it's still horribly, horribly slow.

Check this out, on a two-core 2.2 GHz system:

7.2956085205078125e-05

That's five orders of magnitude faster. How did I do it? I used a better
algorithm.


def fib(n, a=1, b=1):
if n==0:
return a
elif n==1:
return b
return fib(n-1, b, a+b)


And for the record:
0.00083398818969726562

Less than a millisecond, versus millions of years for the OP's algorithm.
I know which I would choose. Faster hardware can only go so far in hiding
the effect of poor algorithms.
 
G

Gabriel Genellina

En Tue, 10 Feb 2009 06:45:37 -0200, Steven D'Aprano
Three seconds to calculate fib(40) is terrible. Well, it's a great saving
over ten seconds, but it's still horribly, horribly slow.

Check this out, on a two-core 2.2 GHz system:


7.2956085205078125e-05

That's five orders of magnitude faster. How did I do it? I used a better
algorithm.


def fib(n, a=1, b=1):
if n==0:
return a
elif n==1:
return b
return fib(n-1, b, a+b)

I agree with your comment, but this is not the right way to write it in
Python. This style is fine in a language with tail-call optimization --
but fib(1000) raises `RuntimeError: maximum recursion depth exceeded`

The same thing after removing recursion can compute the 2089 digits of
fib(10000) without problems:

def fib(n):
a = b = 1
while n:
n, a, b = n-1, b, a+b
return a
Less than a millisecond, versus millions of years for the OP's algorithm.
I know which I would choose. Faster hardware can only go so far in hiding
the effect of poor algorithms.

Completely agree!
 
C

Chris Rebert

Three seconds to calculate fib(40) is terrible. Well, it's a great saving
over ten seconds, but it's still horribly, horribly slow.

Check this out, on a two-core 2.2 GHz system:


7.2956085205078125e-05

That's five orders of magnitude faster. How did I do it? I used a better
algorithm.


def fib(n, a=1, b=1):
if n==0:
return a
elif n==1:
return b
return fib(n-1, b, a+b)


And for the record:

0.00083398818969726562

Less than a millisecond, versus millions of years for the OP's algorithm.
I know which I would choose. Faster hardware can only go so far in hiding
the effect of poor algorithms.

Indeed. And apparently memoization can hide it also:

$ #memoization added to OP's code
$ python -m timeit -s 'from memoized_naive_recursive_version import
fib' 'fib(40)'
1000000 loops, best of 3: 0.639 usec per loop
$ #your code
$ python -m timeit -s 'from your_version import fib' 'fib(40)'
100000 loops, best of 3: 15.4 usec per loop
$ cat memoized_naive_recursive_version.py
def memoize(func):
cache = {}
def memoized(*args):
if args in cache:
return cache[args]
else:
result = func(*args)
cache[args] = result
return result
return memoized

@memoize
def fib(n):
if n <= 1:
return 1
else:
return fib(n-1) + fib(n-2)

However, the naive recursive version fails by exceeding the maximum
recursion depth if N is too large, whereas your version continues to
succeed for a much higher limit of N.

Cheers,
Chris
 
N

Niklas Norrthon

Three seconds to calculate fib(40) is terrible. Well, it's a great saving
over ten seconds, but it's still horribly, horribly slow.

Check this out, on a two-core 2.2 GHz system:


7.2956085205078125e-05

That's five orders of magnitude faster. How did I do it? I used a better
algorithm.

def fib(n, a=1, b=1):
    if n==0:
        return a
    elif n==1:
        return b
    return fib(n-1, b, a+b)

And for the record:


225591516161936330872512695036072072046011324913758190588638866418474627738686883405015987052796968498626L

According to the common definition of fibonacci numbers fib(0) = 0, fib
(1) = 1 and fib(n) = fib(n-1) + fib(n-2) for n > 1. So the number
above is fib(501).
0.00083398818969726562

And now for my version (which admitedly isn't really mine, and returns
slightly incorrect fib(n) for large values of n, due to the limited
floating point precision).
return int(golden_section ** n / sqrt5 + 0.5)
225591516161940121919323945317755919750165306733355143970858461525064153963081278412888159063487104942080L
1.9555558083084179e-05


Less than a millisecond, versus millions of years for the OP's algorithm.
I know which I would choose. Faster hardware can only go so far in hiding
the effect of poor algorithms.

I agree 100%

/Niklas Norrthon
 
G

Gerhard Weis

def fib(n, a=1, b=1):
if n==0:
return a
elif n==1:
return b
return fib(n-1, b, a+b)


And for the record:

225591516161936330872512695036072072046011324913758190588638866418474627738686883405015987052796968498626L timeit.Timer('fib(500)',

0.00083398818969726562
Less than a millisecond, versus millions of years for the OP's algorithm.
I know which I would choose. Faster hardware can only go so far in hiding
the effect of poor algorithms.

I agree to 100% to this.

btw. the timeings are not that different for the naive recursion in
OP's version and yours.
fib(500) on my machine:
OP's: 0.00116 (far away from millions of years)
This here: 0.000583
Gabriel's while loop: 0.000246

The fastest algorithm I've found does fib(1000000) in .25seconds on my machine.
However, I have not found a way to parallelize this algorithm and I
think it is not possible.

To come back to the original question. Yes it is possible; however
there are quite some restrictions about it. In case of the naive
recursion it was possible to get a speedup of more than 3 times on a 4
core machine. With the highly optimised version mentioned above, I had
no success in utilizing more than one core.

As conclusion I would say, Yes, a single function can profit from
multiple cores, but sometimes a better algorithm delivers better
results than using more cores. However, it is possible that a slower
but parallelizable algorithm might outperform the best serial
algorithm, with enough cores :).

Gerhard
 
S

Steven D'Aprano

btw. the timeings are not that different for the naive recursion in OP's
version and yours.
fib(500) on my machine:
OP's: 0.00116 (far away from millions of years)
This here: 0.000583

I don't believe those timings are credible. On my machine, it took a
minute to calculate fib(38), and while my machine isn't exactly the
fastest box possible, nor is it especially slow.

I don't wish to imply that you are deliberately lying, but your result of
0.00116 seconds for the naive version of fib(500) is so unrealistic in my
experience that I believe you must be confused. Perhaps you've timed a
less naive fib() but thought it was the naive version.

Unless somebody can point out an error in my analysis, I'm sticking to my
earlier claim that the naive version of fib(500) requires an unbelievably
huge number of function calls: significantly more than the value of fib
(500) itself. See my earlier post in this thread for details.
 
S

Steven D'Aprano

According to the common definition of fibonacci numbers fib(0) = 0, fib
(1) = 1 and fib(n) = fib(n-1) + fib(n-2) for n > 1. So the number above
is fib(501).

So it is. Oops, off by one error!

Or, if you prefer, it's the right algorithm for a Lucas sequence with
first two values 1 and 1 instead of 0 and 1. :)


And now for my version (which admitedly isn't really mine, and returns
slightly incorrect fib(n) for large values of n, due to the limited
floating point precision).

The floating point version is nice, but it starts giving incorrect
answers relatively early, from n=71. But if you don't need accurate
results (a relative error of 3e-15 for n=71), it is very fast.
 
G

Gerhard Weis

I don't believe those timings are credible. On my machine, it took a
minute to calculate fib(38), and while my machine isn't exactly the
fastest box possible, nor is it especially slow.

I don't wish to imply that you are deliberately lying, but your result of
0.00116 seconds for the naive version of fib(500) is so unrealistic in my
experience that I believe you must be confused. Perhaps you've timed a
less naive fib() but thought it was the naive version.

Unless somebody can point out an error in my analysis, I'm sticking to my
earlier claim that the naive version of fib(500) requires an unbelievably
huge number of function calls: significantly more than the value of fib
(500) itself. See my earlier post in this thread for details.

I am sorry for the wrong timing, I mixed up the function names. The
naive version used partly your version and partly the naive recursion.
So less naive is a good description :)

after fixing it:
naive fib(38): ~40seconds
 
C

Cameron Laird

.
.
.
The floating point version is nice, but it starts giving incorrect
answers relatively early, from n=71. But if you don't need accurate
results (a relative error of 3e-15 for n=71), it is very fast.
.
.
.
While my personal opinion is that it's silly to
characterize an error of 3e-15 as not "accurate",
I think more constructive is to focus on the fact
that the closed-form solution can be touched up
to give a precise integral solution, while re-
taining its (approximately) O(log n) run-time
cost.
 
S

Steven D'Aprano

.
.
.
.
.
.
While my personal opinion is that it's silly to characterize an error of
3e-15 as not "accurate",

That's a *relative* error. The absolute error is 1 for n=71, 59 for n=80,
1196093 for n=100, and 1875662300654090482610609259 for n=200. As far as
I can tell, the absolute error grows without bounds as n increases -- and
it overflows at n=1475.

I agree that a relative error is small, and if your use allows it, then
by all means use the inexact floating point function. But if you need
exact results, then you won't get it using floats.


I think more constructive is to focus on the
fact that the closed-form solution can be touched up to give a precise
integral solution, while re- taining its (approximately) O(log n)
run-time cost.

I doubt that. Since both phi and sqrt(5) are irrational numbers, it would
require an infinite number of integer digits to get precise integral
solutions unless there was some sort of freakish cancellation. But you
probably could get a very good integral solution which gives exact
results up to whatever limit you wanted, bound only by the amount of
memory and time you were willing to spend.
 
S

sturlamolden

I mean this
Code:
def fib(n):
    if n<=1:
        return 1
    return fib(n-1)+fib(n-2)[/QUOTE]


Let's rewrite that slightly and see...



def fib(n, _map=None):
    if not _map: _map = map
    if n > 2:
        return sum(_map(fib, (n-1, n-2)))
    else:
        return 1


if __name__ == '__main__':

    from multiprocessing import Pool
    from time import clock

    p = Pool()
    n = 36

    t0 = clock()
    fib(n, p.map)
    t1 = clock()

    print 'parallel t: %f seconds' % (t1-t0)

    t0 = clock()
    fib(n)
    t1 = clock()

    print 'sequential t: %f seconds' % (t1-t0)


With two cores:

E:\>python fibotest.py
parallel t: 31.300226 seconds
sequential t: 48.070695 seconds


Yes it can!




Sturla Molden
 
G

Gabriel Genellina

On Feb 10, 7:28 am, oyster <[email protected]> wrote:
Let's rewrite that slightly and see...

def fib(n, _map=None):
if not _map: _map = map
if n > 2:
return sum(_map(fib, (n-1, n-2)))
else:
return 1


With two cores:

E:\>python fibotest.py
parallel t: 31.300226 seconds
sequential t: 48.070695 seconds


Yes it can!

Are you kidding?
This is like building a house one brick at a time, but before placing a
new brick, you throw all the previous work and start again from start.
Certainly two workers would help more than one - but the best way to do it
is *NOT* to repeat all that additional work over and over in the first
place.
Even my Pentium I MMX 233MHz can compute fib(36) thousand of times faster
than that with the right algorithm. So I don't see the point in
parallelizing if you're going to get infinitely worse results...
 
P

Paul Rubin

Gabriel Genellina said:
Even my Pentium I MMX 233MHz can compute fib(36) thousand of times faster
than that with the right algorithm. So I don't see the point in
parallelizing if you're going to get infinitely worse results...

The point is to test the parallelization scheme. Recursive fibonacci
is a standard benchmark for this sort of thing. Sturlamolden's test
may be inspired by a recent similar GHC test:

http://donsbot.wordpress.com/2007/11/29/
 
C

Cameron Laird

That's a *relative* error. The absolute error is 1 for n=71, 59 for n=80,
1196093 for n=100, and 1875662300654090482610609259 for n=200. As far as
I can tell, the absolute error grows without bounds as n increases -- and
it overflows at n=1475.

I agree that a relative error is small, and if your use allows it, then
by all means use the inexact floating point function. But if you need
exact results, then you won't get it using floats.




I doubt that. Since both phi and sqrt(5) are irrational numbers, it would
require an infinite number of integer digits to get precise integral
solutions unless there was some sort of freakish cancellation. But you
probably could get a very good integral solution which gives exact
results up to whatever limit you wanted, bound only by the amount of
memory and time you were willing to spend.
.
.
.
There *is* freakish cancellation.

I entirely recognize that Python builds in floating-point
calculations of limited precision. The closed-form expres-
sion for Fibonacci, though, is exact, and can be coded in
terms of, for example, Python infinite-precision integers.
If there's sufficient interest, I'll cheerfully do so, al-
though only after meeting my own deadlines for February
for commitments outside comp.lang.python.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top