How can I speed up a script that iterates over a large range (600 billion)?

J

John Salerno

John said:
::sigh:: Well, I'm stuck again and it has to do with my get_factors
function again, I think. Even with the slight optimization, it's
taking forever on 20! (factorial, not excitement)  :) It's frustrating
because I have the Python right, but I'm getting stuck on the math.
The problem:
"What is the smallest positive number that is evenly divisible by all
of the numbers from 1 to 20?"
Here's the function (it's in the problem3.py file, hence the import
below):
import math
def get_factors(number):
    factors = []
    for n in range(2, int(math.sqrt(number))):
        if number % n == 0:
            factors.append(n)
            factors.append(number // n)
    return factors
And here's my new script for the new exercise:
import math
from problem3 import get_factors
max_num = 20
n = math.factorial(max_num)
factors = get_factors(n)
div_all = []
for x in factors:
    for y in range(2, max_num+1):
        if x % y != 0:
            break
        elif y == max_num:
            div_all.append(x)

It could easily be that I'm simply approaching it all wrong. I just
thought that maybe using the factorial of the highest number in the
range (in this case, 20) would be an easy way of finding which numbers
to test.

These are almost "trick questions" in a way, because of the math behind
them.  If the question were "What is the tallest high-school student in
Scranton, PA?" then searching a population for the property would be the
only way to go.  BUT you can also build up the answer knowing the
factorization of all the numbers up to 20.

        Mel.

I think you're right. I just read the next problem and it is similar
in style, i.e. the example solution involves a small set of numbers
which I can write a script for and it would execute within a second,
but when I expand the script to include the larger set of numbers for
the problem, it then takes ages to execute, making me feel like the
trick isn't to figure out the right code, but the right math.
 
T

Terry Reedy


No what? My conditional statement is correct. It turns out not to apply
here. Great.
The idea of the Euler problems is to think up sane algorithms to
solve them, not micro-optimize or use low level languages on crappy
algorithms.

n=600851475143
for d in xrange(2,n):
if d*d> n: break
while n%d == 0: n //= d
print n

finishes on my laptop with no noticable pause.

If the best C program for a problem takes 10 seconds or more, then
applying the same 1 minute limit to Python is insane, and contrary to
the promotion of good algorithm thinking.
 
P

Paul Rubin

Terry Reedy said:
If the best C program for a problem takes 10 seconds or more, then
applying the same 1 minute limit to Python is insane, and contrary to
the promotion of good algorithm thinking.

The Euler problems are all designed to be doable in 1 minute or less and
the Euler project started in 2001, when personal computers were probably
10x slower than they are today. So they shouldn't take more than 6
seconds on a modern computer if you're thoughtful. On a reasonable
modern computer (maybe even a 2001 computer), they should all be doable
in < 1 minute in python, probably well under. They can probably all be
done in under 1 second in C.

The "largest prime factor of 600851475143" problem we're discussing took
0.017 user cpu seconds in completely unoptimized python on my laptop
using the second-most-naive algoritm possible, including loading the
interpreter from the command line. Executing an empty file takes the
same amount of time. The algorithm would probably be >10x faster in C
with a little bit of tweaking. The problems are about math cleverness,
not CPU resources. Some of them are pretty hard, but if your solution
is taking more than a minute in Python, it means you should be looking
for a better algorithm, not a faster language.
 
C

Chris Torek

Now that the exercise has been solved...

Instead of "really short code to solve the problem", how about
some "really long code"? :)

I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:

$ factor 99999999999999999
99999999999999999: 3 3 2071723 5363222357

The python code is rather too slow for this particular number (I
gave up on it finding 5363222357) but works quite well on 600851475143,
or even, e.g., 12186606004219:

$ python factor.py 600851475143 12186606004219
600851475143: 71 839 1471 6857
12186606004219: 2071723 5882353

While searching for speed hacks I came across a few interesting
tricks, particularly TZ<omega>TZIOY's mod-30 scheme (erat3) at
stackoverflow.com (see URLs in the comments), which only really
works well in Python 2.7 and later (using itertools.compress).
Here is the end result (run with 2.5 and 2.7, I have no 3.x installed
anywhere convenient, and of course the print calls would change):

import itertools
import sys

def count(start = 0, step = 1):
"""
Yield count starting from <start> and counting up by <step>.
Same as itertools.count() except for the <step> argument, and
allowing non-"int" arguments.

Python 2.7 and later provides this directly, via itertools.

Note: it's usually faster to use itertools.islice(itertools.count(...))
than to run the "while True" loop below, so we do that if we can.
"""
if (sys.version_info[0] > 2 or
(sys.version_info[0] == 2 and sys.version_info[1] >= 7)):
return itertools.count(start, step)
if isinstance(start, int) and isinstance(step, int):
if step == 1:
return itertools.count(start)
if 1 < step < 5: # 5 upper bound is a guess
return itertools.islice(itertools.count(start), 0, None, step)
def f(start, step):
while True:
yield start
start += step
return f(start, step)

# Sieve of Eratosthenes-based prime-number generator.
#
# Based on David Eppstein's for python 2.3(?) and subsequent
# discussion -- see
# http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
#
# See also:
# http://oreilly.com/pub/a/python/excerpt/pythonckbk_chap1/index1.html?page=last
#
# http://stackoverflow.com/questions/...or-of-prime-numbers-in-python/3796442#3796442
def primes():
"""
Yields sequence of prime numbers via Sieve of Eratosthenes.
"""
# Keep the state from the last time we abandoned the
# generator, in primes.primes and primes.D. We can then
# very quickly re-yield previously-saved primes. We're
# going to skip even numbers below, we so prime the
# "primes so far" list with 2.
#
# For the fancy (erat3) version, we need to pre-provide
# 2, 3, and 5, and pre-load D. Having done that we can
# start either version at 7.
try:
primes.primes
except AttributeError:
primes.primes = [2, 3, 5]
for p in primes.primes:
yield p
# OK, now we need a mapping holding known-composite values
# (numbers "struck from the sieve").
try:
D = primes.D
except AttributeError:
D = primes.D = { 9: 3, 25: 5 }
# D[q] exists if q is composite; its value is the first prime
# number that proves that q is composite. (However, only odd
# numbers appear in D.)
q = p + 2 # where we start sieve-ing, below
if sys.version_info[0] == 2 and sys.version_info[1] < 7:
for q in count(q, 2):
p = D.pop(q, None)
if p is None:
# q was not marked composite, so it's prime; moreover,
# q-squared is composite (and odd) and its first prime
# factor is q.
primes.primes.append(q)
D[q * q] = q
yield q
else:
# q is composite, p is its first prime factor -- e.g.,
# q=9 and p=3, or q=15 and p=3. Extend the sieve:
# find first natural number k where q + 2kp is not already
# already known as composite. (e.g., if q=9 and p=3, k=1
# so that we mark D[15] as composite, with 3 as its first
# prime factor.) Note that we are skipping even numbers,
# so p and q are both odd, so q+p is even, q+2p is odd,
# q+3p is even, q+4p is odd, etc. We don't need to mark
# even-numbered composites in D, which is why we only look
# at q + 2kp.
twop = p + p
x = q + twop # next odd multiple of p
while x in D: # skip already-known composites
x += twop
D[x] = p
else:
# 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
MASK = (1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0)
MODULOS = frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

# If we started counting from 7, we'd want:
# itertools.compress(itertools.count(7,2), itertools.cycle(MASK))
# But we start counting from q which means we need to clip off
# the first ((q - 7) % 30) // 2 items:
offset = ((q - 7) % 30) // 2
for q in itertools.compress(itertools.count(q, 2),
itertools.islice(itertools.cycle(MASK), offset, None, 1)):
p = D.pop(q, None)
if p is None:
D[q * q] = q
primes.primes.append(q)
yield q
else:
twop = p + p
x = q + twop
while x in D or (x % 30) not in MODULOS:
x += twop
D[x] = p

def factors(num):
"""
Return all the prime factors of the given number.
"""
if num < 0:
num = -num
if num < 2:
return
for p in primes():
q, r = divmod(num, p)
while r == 0:
yield p
if q == 1:
return
num = q
q, r = divmod(num, p)

if __name__ == '__main__':
for arg in (sys.argv[1:] if len(sys.argv) > 1 else ['600851475143']):
try:
arg = int(arg)
except ValueError, error:
print error
else:
print '%d:' % arg,
for fac in factors(arg):
print fac,
sys.stdout.flush()
print
 
P

Paul Rubin

Chris Torek said:
def primes():
"""
Yields sequence of prime numbers via Sieve of Eratosthenes.
"""

I think this has the same order-complexity but is enormously slower in
practice, and runs out of recursion stack after a while. Exercise: spot
the recursion.

from itertools import islice, ifilter, count

def primes():
a = count(2)
while True:
p = a.next()
yield p
a = ifilter(lambda t,p=p: t%p, a)

# print first 100 primes
print list(islice(primes(), 100))
 
I

Ian Kelly

I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:

This is a generator-based sieve I wrote a while back to solve the
PRIME1 problem at SPOJ. The problem is to generate all the prime
numbers within specified ranges, where the numbers are great enough
that a full sieve would run out of memory, and the ranges are wide
enough that a O(sqrt(n)) test on each number would simply take too
long:

https://www.spoj.pl/problems/PRIME1/

The script is not terribly impressive from a technical standpoint, but
what tickles me about it is its bootstrappiness; the set that the
"primes" generator checks to determine whether each number is prime is
actually built from the output of the generator, which itself contains
no actual primality-testing logic. Hope you like it:

8<--------------------------------------------------------------------
import math

def primes(m, n):
# Yield all the primes in the range [m, n), using the nonprimes set
# as a reference. Except for 2, only odd integers are considered.
if m <= 2:
yield 2
m = 3
elif m % 2 == 0:
m += 1 # Force m to be odd.
for p in xrange(m, n, 2):
if p not in nonprimes:
yield p

# Read all the bounds to figure out what we need to store.
bounds = [map(int, raw_input().split(' ')) for t in xrange(input())]
limit = max(n for (m, n) in bounds)
sqrt_limit = int(math.sqrt(limit))

# Mark odd multiples of primes as not prime. Even multiples
# do not need to be marked since primes() won't try them.
nonprimes = set()
for p in primes(3, sqrt_limit+1):
# Mark odd nonprimes within the base range. p*3 is the first
# odd multiple of p; p+p is the increment to get to the next
# odd multiple.
nonprimes.update(xrange(p*3, sqrt_limit+1, p+p))

# Mark odd nonprimes within each of the requested ranges.
for (m, n) in bounds:
# Align m to the first odd multiple of p in the range
# (or the last odd multiple before the range).
m -= (m % (p + p) - p)
m = max(m, p*3)
nonprimes.update(xrange(m, n+1, p+p))

# Generate and write the primes over each input range.
first = True
for (m, n) in bounds:
if not first:
print
first = False
for p in primes(m, n+1):
print p
8<--------------------------------------------------------------------
 
A

Anny Mous

Chris said:
Now that the exercise has been solved...

Instead of "really short code to solve the problem", how about
some "really long code"? :)

I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:

$ factor 99999999999999999
99999999999999999: 3 3 2071723 5363222357

The python code is rather too slow for this particular number (I
gave up on it finding 5363222357)

It shouldn't take more than a few seconds to factorise 10**17-1 even in pure
Python. On my not-very-powerful PC, using a home-brew pure-Python function
(code to follow), I get all four factors in under four seconds.

In comparison, I ran your script for over five minutes before giving up, it
still hadn't found the fourth factor.

Don't be disappointed though, you're in illustrious company. Getting the
Sieve of Eratosthenes *badly* wrong is one of the most common mistakes,
second only to getting involved in land wars in Asia. See this paper by
Melissa O'Neill:

http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

What people often describe as the Sieve of Eratosthenes is frequently the
Sieve of Euler, which while mathematically elegant, is computationally
crap. As O'Neill calls it, the Unfaithful Sieve.

In my home-brew primes module, I have this to say about three common
algorithms, including the classic Unfaithful Sieve, Turner's algorithm:


# === Prime number generators to avoid ===

# The following three prime generators are supplied for educational
# purposes, specifically as a terrible warning on how NOT to do it.
#
# Their performance starts at bad in the case of trial_division(), falls to
# terrible with turner(), and ends up with the utterly awful
# naive_division(). None of these three have acceptable performance; they
# are barely tolerable even for the first 100 primes. Their only advantage
# is that they are somewhat easy to understand.


Here's my factors() function. Adapting it to be a generator is left as an
exercise.


def factors(n):
"""factors(integer) -> [list of factors]
[3, 7, 11]

Returns the (mostly) prime factors of integer n. For negative integers,
-1 is included as a factor. If n is 0 or 1, [n] is returned as the only
factor. Otherwise all the factors will be prime.
"""
if n != int(n):
raise ValueError('argument must be an integer')
elif n in (0, 1, -1):
return [n]
elif n < 0:
return [-1] + factors(-n)
assert n >= 2
result = []
for p in sieve():
if p*p > n: break
while n % p == 0:
result.append(p)
n //= p
if n != 1:
result.append(n)
return result


def sieve():
"""Yield prime integers efficiently.

This uses the Sieve of Eratosthenes, modified to generate the primes
lazily rather than the traditional version which operates on a fixed
size array of integers.
"""
# This implementation based on a version by Melissa E. O'Neill,
# as reported by Gerald Britton:
# http://mail.python.org/pipermail/python-list/2009-January/1188529.html
innersieve = sieve()
prevsq = 1
table = {}
i = 2
while True:
if i in table:
prime = table
del table
nxt = i + prime
while nxt in table:
nxt += prime
table[nxt] = prime
else:
yield i
if i > prevsq:
j = next(innersieve)
prevsq = j**2
table[prevsq] = j
i += 1


I don't claim that my version of the sieve above will break any
computational records, but it performs quite well for my needs.
 
C

Chris Angelico

           prime = table
           del table


I don't fully understand your algorithm, but I think these two lines
can be rewritten as:
prime=table.pop(i)

Interesting algo. A recursive generator, not sure I've seen one of those before.

ChrisA
 
M

MRAB

Now that the exercise has been solved...

Instead of "really short code to solve the problem", how about
some "really long code"? :)

I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:

$ factor 99999999999999999
99999999999999999: 3 3 2071723 5363222357

The python code is rather too slow for this particular number (I
gave up on it finding 5363222357) but works quite well on 600851475143,
or even, e.g., 12186606004219:

$ python factor.py 600851475143 12186606004219
600851475143: 71 839 1471 6857
12186606004219: 2071723 5882353
[snip]
This code isn't particularly efficient, but it's fast enough:

import math

n = 99999999999999999
limit = math.sqrt(n)
test = 2
factors = []
while test <= limit:
if n % test == 0:
factors.append(test)
n //= test
limit = math.sqrt(n)
else:
test += 1

factors.append(n)

print(factors)
 
I

Ian Kelly

def sieve():
   """Yield prime integers efficiently.

   This uses the Sieve of Eratosthenes, modified to generate the primes
   lazily rather than the traditional version which operates on a fixed
   size array of integers.
   """
   # This implementation based on a version by Melissa E. O'Neill,
   # as reported by Gerald Britton:
   # http://mail.python.org/pipermail/python-list/2009-January/1188529.html
   innersieve = sieve()
   prevsq = 1
   table  = {}
   i = 2
   while True:
       if i in table:
           prime = table
           del table
           nxt = i + prime
           while nxt in table:
               nxt += prime
           table[nxt] = prime
       else:
           yield i
           if i > prevsq:
               j = next(innersieve)
               prevsq = j**2
               table[prevsq] = j
       i += 1


This appears to have a small bug in it, but perhaps it doesn't matter.
At the "yield i" statement, it is possible that i > prevsq. It could
be that i == next(innersieve)**2, in which case i is yielded as prime
despite being composite. This would then cause additional errors
further along.

The only way this could happen would be if there were two consecutive
primes such that all the numbers between their squares are composite,
thereby failing to add the next prime to the table until after its
square has been reached. This seems a rather unlikely scenario, but I
can't say for certain that it never happens.

The Haskell version does not contain this flaw, as far as I can tell.

Cheers,
Ian
 
I

Irmen de Jong

Thanks. So far they are helping me with Python too, but definitely not
as much as more general exercises would, I'm sure. The part about
writing the code is fun, but once that's done, I seem to end up stuck
with an inefficient implementation because I don't know the math
tricks behind the problem.

I'll check out rubyquiz.com. I've been searching for some Python
exercises to do but haven't found too many sites with them, at least
not in such a nice and organized way as Project Euler.

There's also SPOJ; http://www.spoj.pl/problems/classical/
It has tons of problems, many math related, but also silly ones like this:
https://www.spoj.pl/problems/JAVAC/
and perhaps even useful ones like this:
https://www.spoj.pl/problems/ONP/

Irmen
 
T

Terry Reedy

The Euler problems

are not the only algorithm problems posted on the web.
the Euler project started in 2001, when personal computers were probably
10x slower than they are today.

Yes, and I am still using a 2003 XP/Athlon machine, which is adequate
for running my Python programs, though not recent games.
 
S

Slaunger

As a general note concerning the use of Python on Project Euler, and
the one minute guideline.

For problems 1-100, each problem is easily solved in less than 1
minute processing time *if* the algorithms and math is done "right"
and with thought.

My project Euler scripts solves the first 100 problems with an average
of 0.91 secs/problem on a 4 y old std business Laptop running 32 bit
Win XP. Of these, one problem takes 18 secs.

For some of the later problems it certainly becomes very difficult to
do all problems within 1 minute if you use Python on an ordinary
processing platform. There you need to resort to a compiled language
like C, C++, or dedicated mathematical software packages, which
implement complex mathematical functions using highly efficient native
libraries.

Kim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top