How can I speed up a script that iterates over a large range (600 billion)?

J

John Salerno

I'm working on the Project Euler exercises and I'm stumped on problem
3:

"What is the largest prime factor of the number 600851475143 ?"

Now, I've actually written functions to get a list of the factors of
any given number, and then another function to get the prime numbers
from that list. It works fine with small numbers, but when I try to
feed my get_factors function with the above number (600 billion),
naturally it takes forever! But according to the Project Euler
website:

"I've written my program but should it take days to get to the answer?

Absolutely not! Each problem has been designed according to a "one-
minute rule", which means that although it may take several hours to
design a successful algorithm with more difficult problems, an
efficient implementation will allow a solution to be obtained on a
modestly powered computer in less than one minute."

But it definitely takes more than a minute, and I still haven't gotten
it to end yet without just canceling it myself.

Here is what I have so far. Initially the get_factors function just
iterated over the entire range(2, n + 1), but since a number can't
have a factor greater than half of itself, I tried to shorten the
range by doing range(2, n //2), but that still leaves 300 billion
numbers to go through.

def get_factors(number):
factors = [number]

for n in range(2, number // 2):
if number % n == 0:
factors.append(n)

return factors


def get_primes(number_list):
primes = number_list[:]

for n in number_list:
for x in range(2, n):
if n % x == 0:
primes.remove(n)
break

return primes


print(max(get_primes(get_factors(600851475143))))


Also, I want to make it clear that I DO NOT WANT THE ANSWER. I really
want to solve this myself, but the reason I'm asking about it is to
see if there really is some way to change this code so that it can get
an answer in less than one minute, as the website says should be
possible. A hint about what I need to do would be nice, but not an
answer. I just don't see any way to get the factors without iterating
over the entire range of values, though.

Thanks!
 
I

Ian Kelly

Here is what I have so far. Initially the get_factors function just
iterated over the entire range(2, n + 1), but since a number can't
have a factor greater than half of itself, I tried to shorten the
range by doing range(2, n //2), but that still leaves 300 billion
numbers to go through.

Without giving you the answer, I will note that the range can be
further reduced by quite a lot (I'm talking orders of magnitude, not
just by half).
def get_primes(number_list):
   primes = number_list[:]

   for n in number_list:
       for x in range(2, n):
           if n % x == 0:
               primes.remove(n)
               break

   return primes

Also, primality testing and factorization are very similar problems,
and the same range optimization could be applied here as well.

Cheers,
Ian
 
I

Irmen de Jong

I'm working on the Project Euler exercises and I'm stumped on problem
3:

"What is the largest prime factor of the number 600851475143 ?"

Now, I've actually written functions to get a list of the factors of
any given number, and then another function to get the prime numbers
from that list. It works fine with small numbers, but when I try to
feed my get_factors function with the above number (600 billion),
naturally it takes forever!

You need a better algorithm to calculate primes, and iterate over primes
instead of over the full (half, or even better, sqrt(n)) range of
possible values. You also should optimize the stop condition, once you
find that the number can no longer be divided by larger primes you can
stop the loop.

For instance to get the prime factors of the number 1000 you'd iterate
over the prime numbers 2,3,5 and conclude that

1000=2*2*2*5*5*5, so 5 would be the largest prime factor.

No need to try larger primes than 5, let alone go through 1..sqrt(1000).

The Sieve of Eratosthenes is a well known algorithm to calculate primes
with reasonable efficiency.

Irmen
 
M

Mel

John said:
I'm working on the Project Euler exercises and I'm stumped on problem
3:

"What is the largest prime factor of the number 600851475143 ?" [ ... ]
Here is what I have so far. Initially the get_factors function just
iterated over the entire range(2, n + 1), but since a number can't
have a factor greater than half of itself, I tried to shorten the
range by doing range(2, n //2), but that still leaves 300 billion
numbers to go through.

def get_factors(number):
factors = [number]

for n in range(2, number // 2):
if number % n == 0:
factors.append(n)

return factors


def get_primes(number_list):
primes = number_list[:]

for n in number_list:
for x in range(2, n):
if n % x == 0:
primes.remove(n)
break

return primes


print(max(get_primes(get_factors(600851475143))))


Also, I want to make it clear that I DO NOT WANT THE ANSWER. I really
want to solve this myself, but the reason I'm asking about it is to
see if there really is some way to change this code so that it can get
an answer in less than one minute, as the website says should be
possible. A hint about what I need to do would be nice, but not an
answer. I just don't see any way to get the factors without iterating
over the entire range of values, though.

It certainly can be done faster. I ran it against the factor finder that I
wrote, and it popped up the answer

mwilson@tecumseth:~$ bin/factors.py 600851475143
71 839 1471 ...

before I could glance at my watch. factors.py works, as does yours, by
testing for small factors first, but it divides them out as it goes, so it
tends to do its work on smallish numbers. And since the smallest factors
are taken out as soon as possible, they have to be the prime ones.

Good hunting, Mel.
 
I

Irmen de Jong

On 21-06-11 22:10, Irmen de Jong wrote:
[stuff]

I didn't read the last paragraph of John's message until just now, and
now realize that what I wrote is likely way too much information for
what he asked.
I'm sorry. Next time I'll read everything until and including the last
full stop.

Irmen
 
M

MRAB

I'm working on the Project Euler exercises and I'm stumped on problem
3:

"What is the largest prime factor of the number 600851475143 ?"

Now, I've actually written functions to get a list of the factors of
any given number, and then another function to get the prime numbers
from that list. It works fine with small numbers, but when I try to
feed my get_factors function with the above number (600 billion),
naturally it takes forever! But according to the Project Euler
website:

"I've written my program but should it take days to get to the answer?

Absolutely not! Each problem has been designed according to a "one-
minute rule", which means that although it may take several hours to
design a successful algorithm with more difficult problems, an
efficient implementation will allow a solution to be obtained on a
modestly powered computer in less than one minute."

But it definitely takes more than a minute, and I still haven't gotten
it to end yet without just canceling it myself.
[snip]
A non-prime is the product of a prime and another number which may or
may not be a prime. Look for the smallest prime and repeat.

On a modern PC, if it takes more than, say, a second for the given
number, you're doing it wrong. :)
 
J

John Salerno

On 21-06-11 22:10, Irmen de Jong wrote:
[stuff]

I didn't read the last paragraph of John's message until just now, and
now realize that what I wrote is likely way too much information for
what he asked.
I'm sorry. Next time I'll read everything until and including the last
full stop.

Irmen

Don't worry, I was still unclear about what to do after reading all
the responses, even yours! But one thing that made me feel better was
that I wasn't having a Python problem as much as a *math* problem. I
changed my get_factors function to only go as far as the square root
of the number in question, and that yielded an answer immediately. :)

However, even after reading the Wikipedia page about prime numbers and
trial division, I'm still a little unclear as to why the square root
of the number is the upper bound of the range you need to check.
 
V

Vlastimil Brom

2011/6/21 John Salerno said:
However, even after reading the Wikipedia page about prime numbers and
trial division, I'm still a little unclear as to why the square root
of the number is the upper bound of the range you need to check.
--

There are likely be some more elaborated proofs, but it seems
sufficiently evident, that with the factors being the square root you
get some kind of "middle position"; with other factors (e.g. two for
simplicity), one of them could be greater, while the another has to be
smaller; this smaller one would have been discovered already, while
searching continuously until the square root of the given number.

vbr
 
I

Irmen de Jong

On 21-06-11 22:10, Irmen de Jong wrote:
[stuff]

I didn't read the last paragraph of John's message until just now, and
now realize that what I wrote is likely way too much information for
what he asked.
I'm sorry. Next time I'll read everything until and including the last
full stop.

Irmen

Don't worry, I was still unclear about what to do after reading all
the responses, even yours! But one thing that made me feel better was
that I wasn't having a Python problem as much as a *math* problem. I
changed my get_factors function to only go as far as the square root
of the number in question, and that yielded an answer immediately. :)

Ok, cool :)

However, even after reading the Wikipedia page about prime numbers and
trial division, I'm still a little unclear as to why the square root
of the number is the upper bound of the range you need to check.

If there is an exact divisor d >= √n, then the quotient n/d is also a divisor of n, and
that quotient is <= √n. So if we don't find such a quotient before reaching √n, it's not
useful to search for d > √n (because no divisors would be found).

Irmen
 
I

Ian Kelly

Don't worry, I was still unclear about what to do after reading all
the responses, even yours! But one thing that made me feel better was
that I wasn't having a Python problem as much as a *math* problem. I
changed my get_factors function to only go as far as the square root
of the number in question, and that yielded an answer immediately. :)

However, even after reading the Wikipedia page about prime numbers and
trial division, I'm still a little unclear as to why the square root
of the number is the upper bound of the range you need to check.

Careful, note that the greatest prime factor may actually be greater
than the square root. It's just that it's possible to find it without
iterating past the square root. This is because for each p that is a
factor of n, q is also a factor of n, where p * q = n. If p >
sqrt(n), then q < sqrt(n). Therefore you can find p by finding q and
dividing n / q.
 
J

John Salerno

Careful, note that the greatest prime factor may actually be greater
than the square root.  It's just that it's possible to find it without
iterating past the square root.  This is because for each p that is a
factor of n, q is also a factor of n, where p * q = n.  If p >
sqrt(n), then q < sqrt(n).  Therefore you can find p by finding q and
dividing n / q.

Oh! Now it makes sense! That first sentence helped to put it into
perspective, too. The Wikipedia page says more or less the same thing,
but this paragraph just made more sense to me. :)

Thanks for the all the advice everyone. Now I'm on to problem #4, and
I'm stumped again, but that's what's fun! :)
 
T

Terry Reedy

Absolutely not! Each problem has been designed according to a "one-
minute rule", which means that although it may take several hours to
design a successful algorithm with more difficult problems, an
efficient implementation will allow a solution to be obtained on a
modestly powered computer in less than one minute."

That statement is for C, not Python. Python is efficient with human
time, but not machine time. If something really takes a minute in C,
allow yourself at least 10 minutes or even more with plain CPython.
 
P

Paul Rubin

Terry Reedy said:
If something really takes a minute in C,
allow yourself at least 10 minutes or even more with plain CPython.

No. The idea of the Euler problems is to think up sane algorithms to
solve them, not micro-optimize or use low level languages on crappy
algorithms.

n=600851475143
for d in xrange(2,n):
if d*d > n: break
while n%d == 0: n //= d
print n

finishes on my laptop with no noticable pause. The trick is to stop
testing once you hit the square root of the number. There is at least
one extremely obvious optimization I didn't bother with (above 2, only
test odd divisors), that would have doubled the speed on top of that.
 
P

Paul Rubin

John Salerno said:
However, even after reading the Wikipedia page about prime numbers and
trial division, I'm still a little unclear as to why the square root
of the number is the upper bound of the range you need to check.

Suppose p is the smallest divisor of n, and p > sqrt(n).
("Divisor" here means p=1 doesn't count).

p being a divisor of n means there is some q such that n = pq.
That means q = n / p.

If p > sqrt(n) that means that q must be < sqrt(n). But that
contradicts the claim that p is the smallest divisor.

So we know that if there is a divisor at all, it must be <= sqrt(n)
and if we don't find one by then, n must be prime.
 
J

John Salerno

::sigh:: Well, I'm stuck again and it has to do with my get_factors
function again, I think. Even with the slight optimization, it's
taking forever on 20! (factorial, not excitement) :) It's frustrating
because I have the Python right, but I'm getting stuck on the math.

The problem:

"What is the smallest positive number that is evenly divisible by all
of the numbers from 1 to 20?"



Here's the function (it's in the problem3.py file, hence the import
below):

import math

def get_factors(number):
factors = []

for n in range(2, int(math.sqrt(number))):
if number % n == 0:
factors.append(n)
factors.append(number // n)

return factors

And here's my new script for the new exercise:

import math
from problem3 import get_factors

max_num = 20
n = math.factorial(max_num)
factors = get_factors(n)
div_all = []

for x in factors:
for y in range(2, max_num+1):
if x % y != 0:
break
elif y == max_num:
div_all.append(x)

print(min(div_all))

It could easily be that I'm simply approaching it all wrong. I just
thought that maybe using the factorial of the highest number in the
range (in this case, 20) would be an easy way of finding which numbers
to test.
 
P

Paul Rubin

John Salerno said:
It's frustrating because I have the Python right, but I'm getting
stuck on the math....
"What is the smallest positive number that is evenly divisible by all
of the numbers from 1 to 20?"

The answer is lcm [1,2,3, ... 20]. You can figure out how to implement
lcm.

The Euler problems are not really programming exercises. They are
exercises in math and algorithms. Quite a lot of them involve thinking
clever and fast ways to do stuff that would be trivial (but too slow) by
brute force. In general, once you figure out the right algorithm,
writing the code is easy. But you have to be fairly mathematically
attuned, to have any chance of spotting the algorithm.

If you want programming exercises that are less mathematical, there are
some nice ones at rubyquiz.com. They are intended for Ruby but of
course you can solve them in Python.
 
M

MRAB

::sigh:: Well, I'm stuck again and it has to do with my get_factors
function again, I think. Even with the slight optimization, it's
taking forever on 20! (factorial, not excitement) :) It's frustrating
because I have the Python right, but I'm getting stuck on the math.

The problem:

"What is the smallest positive number that is evenly divisible by all
of the numbers from 1 to 20?"
You don't need factorials, just remember that each of the numbers can
be expressed as the product of a multiset of prime factors.
 
J

John Salerno

John Salerno said:
It's frustrating because I have the Python right, but I'm getting
stuck on the math....
"What is the smallest positive number that is evenly divisible by all
of the numbers from 1 to 20?"

The answer is lcm [1,2,3, ... 20].  You can figure out how to implement
lcm.

The Euler problems are not really programming exercises.  They are
exercises in math and algorithms.  Quite a lot of them involve thinking
clever and fast ways to do stuff that would be trivial (but too slow) by
brute force.  In general, once you figure out the right algorithm,
writing the code is easy.  But you have to be fairly mathematically
attuned, to have any chance of spotting the algorithm.

If you want programming exercises that are less mathematical, there are
some nice ones at rubyquiz.com.  They are intended for Ruby but of
course you can solve them in Python.

Thanks. So far they are helping me with Python too, but definitely not
as much as more general exercises would, I'm sure. The part about
writing the code is fun, but once that's done, I seem to end up stuck
with an inefficient implementation because I don't know the math
tricks behind the problem.

I'll check out rubyquiz.com. I've been searching for some Python
exercises to do but haven't found too many sites with them, at least
not in such a nice and organized way as Project Euler.
 
M

Mel

John said:
::sigh:: Well, I'm stuck again and it has to do with my get_factors
function again, I think. Even with the slight optimization, it's
taking forever on 20! (factorial, not excitement) :) It's frustrating
because I have the Python right, but I'm getting stuck on the math.

The problem:

"What is the smallest positive number that is evenly divisible by all
of the numbers from 1 to 20?"



Here's the function (it's in the problem3.py file, hence the import
below):

import math

def get_factors(number):
factors = []

for n in range(2, int(math.sqrt(number))):
if number % n == 0:
factors.append(n)
factors.append(number // n)

return factors

And here's my new script for the new exercise:

import math
from problem3 import get_factors

max_num = 20
n = math.factorial(max_num)
factors = get_factors(n)
div_all = []

for x in factors:
for y in range(2, max_num+1):
if x % y != 0:
break
elif y == max_num:
div_all.append(x)

print(min(div_all))

It could easily be that I'm simply approaching it all wrong. I just
thought that maybe using the factorial of the highest number in the
range (in this case, 20) would be an easy way of finding which numbers
to test.

These are almost "trick questions" in a way, because of the math behind
them. If the question were "What is the tallest high-school student in
Scranton, PA?" then searching a population for the property would be the
only way to go. BUT you can also build up the answer knowing the
factorization of all the numbers up to 20.

Mel.
 
B

Benjamin Kaplan

That statement is for C, not Python. Python is efficient with human time,
but not machine time. If something really takes a minute in C, allow
yourself at least 10 minutes or even more with plain CPython.

Python is the second most popular language on Project Euler, at 14358
users compared to 15897 who use C/C++. I'm pretty sure they don't
assume you use C. Although Python's longs do make some of the early
problems really really easy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top