Prime number generator


C

Chris Angelico

And now for something completely different.

I knocked together a prime number generator, just for the fun of it,
that works like a Sieve of Eratosthenes but unbounded. It keeps track
of all known primes and the "next composite" that it will produce -
for instance, after yielding 13, the prime map will be {2: 20, 3: 18,
5: 20, 7: 21, 11: 22, 13: 26}, each one mapped to the first multiple
greater than 13.

Notable in the algorithm is an entire lack of division, or even
multiplication. Everything is done with addition.

So, a few questions. Firstly, is there a stdlib way to find the key
with the lowest corresponding value? In the above map, it would return
3, because 18 is the lowest value in the list. I want to do this with
a single pass over the dictionary. Secondly, can the "while
i<smallest... i+=1" loop become a for...range? It's almost asking for
it, but not quite there. Thirdly, is there any sort of half-sane
benchmark that I can compare this code to? And finally, whose wheel
did I reinvent here? What name would this algorithm have?

Code tested on Python 3.3, would probably run fine on pretty much any
Python that supports yield, though I don't have a Py2.2 to test from
__future__ import generators on!

ChrisA

# -- start --
def primes():
"""Generate an infinite series of prime numbers."""
i=2
yield 2
prime={2:2} # Map a prime number to its next composite (but bootstrap with 2:2)
while True:
# Find the smallest value in prime[] and its key.
# Is there a standard library way to do this??
# (If two values are equal smallest, either can be returned.)
prm=None
for p,val in prime.items():
if prm is None or val<smallest:
prm,smallest=p,val
prime[prm]+=prm
while i<smallest:
yield i
prime=i+i
i+=1
if i==smallest: i+=1

gen=primes()
for i in range(30):
print(next(gen),end="\t") # Star Trek?
print()
# -- end --
 
Ad

Advertisements

B

Bas

]
So, a few questions. Firstly, is there a stdlib way to find the key
with the lowest corresponding value? In the above map, it would return
3, because 18 is the lowest value in the list. I want to do this with
a single pass over the dictionary.

In [1]: prime = {2: 20, 3: 18, 5: 20, 7: 21, 11: 22, 13: 26}

In [2]: smallest_key = min(prime.iteritems(), key=lambda k_v: k_v[1])[0]

In [3]: smallest_key
Out[3]: 3

Still trying to figure out your algorithm ...
 
C

Chris Angelico

]
So, a few questions. Firstly, is there a stdlib way to find the key
with the lowest corresponding value? In the above map, it would return
3, because 18 is the lowest value in the list. I want to do this with
a single pass over the dictionary.

In [1]: prime = {2: 20, 3: 18, 5: 20, 7: 21, 11: 22, 13: 26}

In [2]: smallest_key = min(prime.iteritems(), key=lambda k_v: k_v[1])[0]

In [3]: smallest_key
Out[3]: 3

Well, that does answer the question. Unfortunately the use of lambda
there has a severe performance cost (roughly doubles the total run
time, when I ask for the thousandth prime), without majorly improving
readability. I'll bear it in mind if there's a way to make that work
on either readability or performance, but otherwise, I'll stick with
the explicit loop. Thanks anyway!
Still trying to figure out your algorithm ...

It's pretty simple. (That's a bad start, I know!) Like the Sieve of
Eratosthenes, it locates prime numbers, then deems every multiple of
them to be composite. Unlike the classic sieve, it does the "deem"
part in parallel. Instead of marking all the multiples of 2 first,
then picking three and marking all the multiples of 3, then 5, etc,
this function records the fact that it's up to (say) 42 in marking
multiples of 2, and then when it comes to check if 43 is prime or not,
it moves to the next multiple of 2. This requires memory to store the
previously-known primes, similarly to other methods, but needs no
multiplication or division.

ChrisA
 
B

bas

Well, that does answer the question. Unfortunately the use of lambda
there has a severe performance cost [ ...]
If you care about speed, you might want to check the heapq module. Removingthe smallest item and inserting a new item in a heap both cost O(log(N)) time, while finding the minimum in a dictionary requires iterating over the whole dictionary, which cost O(N) time.

(untested)
#before loop
from heapq import *
primes = [(2,2)] #heap of tuples (multiple, prime). start with 1 item, sono need for heapify

#during loop
smallest, prm = heappop(primes)
heappush(primes, (smallest+prm, prm))

#when new prime found
heappush(primes, (i+i, i))
Still trying to figure out your algorithm ...
It's pretty simple. [...]
I understand what you are trying, but it is not immediately clear to me that it works correctly if for example a smallest factor appears twice in the list. I don't have time for it now, but it for sure can be simplified. The while loop, for example, can be replaced by an if, since it will never execute more than once (since if i is prime > 2, i+1 will be divisible by 2)
 
S

Steven D'Aprano

And now for something completely different.

I knocked together a prime number generator, just for the fun of it,
that works like a Sieve of Eratosthenes but unbounded. [...]
So, a few questions. Firstly, is there a stdlib way to find the key with
the lowest corresponding value? In the above map, it would return 3,
because 18 is the lowest value in the list.

Not from a dict, but with a list you would probably use the bisect module.

I want to do this with a
single pass over the dictionary. Secondly, can the "while i<smallest...
i+=1" loop become a for...range? It's almost asking for it, but not
quite there. Thirdly, is there any sort of half-sane benchmark that I
can compare this code to? And finally, whose wheel did I reinvent here?
What name would this algorithm have?

I can't directly answer that question, but I can make a shameless plug
for this:

https://pypi.python.org/pypi/pyprimes

If your prime generator performs better than, or worse than, all of those
in the above module, I may have to steal it from you :)
 
C

Chris Angelico

Well, that does answer the question. Unfortunately the use of lambda
there has a severe performance cost [ ...]
If you care about speed, you might want to check the heapq module. Removing the smallest item and inserting a new item in a heap both cost O(log(N))time, while finding the minimum in a dictionary requires iterating over the whole dictionary, which cost O(N) time.

Ehh, speed isn't the ultimate. I was just trying to avoid something
that worked out ridiculously slow (a Python function call IS quite
slow). I haven't profiled the code to find out where the bulk of the
time is spent, but switching in the lambda-based version doubled total
run time, so I didn't like it :)
(untested)
#before loop
from heapq import *
primes = [(2,2)] #heap of tuples (multiple, prime). start with 1 item, so no need for heapify

#during loop
smallest, prm = heappop(primes)
heappush(primes, (smallest+prm, prm))

#when new prime found
heappush(primes, (i+i, i))

Ahh, that's the bit I should have thought of! Of course.

My original thought experiment had involved basically a really long
list, like the classic Sieve but getting longer as time moves on, with
composites replaced by None and primes with their next-markers, which
I then collapsed to a dict. Always I was thinking in terms of indexing
with the prime to get its next composite. Here's the code involving
heapq:

# -- start --
def primes():
"""Generate an infinite series of prime numbers."""
from heapq import heappush,heappop
i=2
yield 2
prime=[(2,2)] # Heap
while True:
smallest, prm = heappop(prime)
heappush(prime, (smallest+prm, prm))
while i<smallest:
yield i
heappush(prime, (i+i, i))
i+=1
if i==smallest: i+=1

gen=primes()
print([next(gen) for i in range(10)])
for i in range(1000):
next(gen) # Star Trek?
print("The next prime number is:",next(gen))
# -- end --

And that's significantly shorter, clearer, AND faster than the original. Thanks!
Still trying to figure out your algorithm ...
It's pretty simple. [...]
I understand what you are trying, but it is not immediately clear to me that it works correctly if for example a smallest factor appears twice in the list. I don't have time for it now, but it for sure can be simplified. The while loop, for example, can be replaced by an if, since it will never execute more than once (since if i is prime > 2, i+1 will be divisible by 2)

Ah, good point. Again, I originally was starting from 1, so the while
loop was necessary to catch 2 and 3 in the first run. When I changed
it to start at 2 and explicitly yield it first, I didn't notice to
change that.

It works correctly with the smallest multiple appearing twice because
it won't yield primes until the smallest value is higher than the
current next-prime. So when it's just yielded 11, for instance, both
the 2 and 3 slots hold 12; advancing one of those does nothing,
advancing the other allows the bottom loop to notice that 13 is now
lower than the minimum value (which will then be 14).

ChrisA
 
Ad

Advertisements

C

Chris Angelico

I can't directly answer that question, but I can make a shameless plug
for this:

https://pypi.python.org/pypi/pyprimes

If your prime generator performs better than, or worse than, all of those
in the above module, I may have to steal it from you :)

Ha, that's what I was hoping for. My algorithm outperforms several of
yours! Look!

Rosuav: 0.10868923284942639
awful_primes: 16.55546780386893
naive_primes1: 2.6105180965737276
naive_primes2: 1.358270674066116
trial_division: 0.06926075800136644
turner: 0.5736550315752424
croft: 0.007141969160883832
sieve: 0.01786707528470899
cookbook: 0.014790147909859996
wheel: 0.015050236831779529

Okay, so I outperform only algorithms listed as toys... :| Here's
similar figures, going to a higher cutoff (I kept it low for the sake
of awful_primes) and using only the algos designed for real use:

Rosuav: 2.1318494389650082
croft: 0.11628042111497416
sieve: 0.26868582459502566
cookbook: 0.21551174800149164
wheel: 0.4761577239565362

I didn't seriously think that this would outperform mathematically
proven and efficient algorithms, it was just a toy. But at least now I
know: It's roughly 20 times slower than a leading algo. And that's
after the bas-suggested improvement of using a heap.

But hey. If you want the code, you're welcome to it - same with anyone
else. Here's the heap version as used in the above timings. MIT
license.

def primes():
"""Generate an infinite series of prime numbers."""
from heapq import heappush,heappop
i=2
yield 2
prime=[(2,2)] # Heap
while True:
smallest, prm = heappop(prime)
heappush(prime, (smallest+prm, prm))
if i<smallest:
yield i
heappush(prime, (i+i, i))
i+=1
if i==smallest: i+=1
# ----

Enjoy!

ChrisA
 
J

Joshua Landau

Well, that does answer the question. Unfortunately the use of lambda
there has a severe performance cost [ ...]
If you care about speed, you might want to check the heapq module. Removing the smallest item and inserting a new item in a heap both cost O(log(N)) time, while finding the minimum in a dictionary requires iterating over the whole dictionary, which cost O(N) time.

Actually, because it's a list under the hood I'd imagine push and pop
still take O(n) time :/.
Ehh, speed isn't the ultimate. I was just trying to avoid something
that worked out ridiculously slow (a Python function call IS quite
slow). I haven't profiled the code to find out where the bulk of the
time is spent, but switching in the lambda-based version doubled total
run time, so I didn't like it :)
(untested)
#before loop
from heapq import *
primes = [(2,2)] #heap of tuples (multiple, prime). start with 1 item,so no need for heapify

#during loop
smallest, prm = heappop(primes)
heappush(primes, (smallest+prm, prm))

#when new prime found
heappush(primes, (i+i, i))

Ahh, that's the bit I should have thought of! Of course.

My original thought experiment had involved basically a really long
list, like the classic Sieve but getting longer as time moves on, with
composites replaced by None and primes with their next-markers, which
I then collapsed to a dict. Always I was thinking in terms of indexing
with the prime to get its next composite. Here's the code involving
heapq:

# -- start --
def primes():
"""Generate an infinite series of prime numbers."""
from heapq import heappush,heappop
i=2
yield 2
prime=[(2,2)] # Heap
while True:
smallest, prm = heappop(prime)
heappush(prime, (smallest+prm, prm))
while i<smallest:
yield i
heappush(prime, (i+i, i))
i+=1
if i==smallest: i+=1

gen=primes()
print([next(gen) for i in range(10)])
for i in range(1000):
next(gen) # Star Trek?
print("The next prime number is:",next(gen))
# -- end --

And that's significantly shorter, clearer, AND faster than the original. Thanks!

AFAICT, that's exactly my code but using a less-efficient storage
medium and a *much* more efficient sorting mechanism. It'd be
interesting what could happen if heapq didn't reject blists -- they
have better efficiency for changing list sizes (which this code does a
lot of).

Thanks for the heads-up on heapq, by the way -- it's new to me and a
blimmin' good idea.

PS: It's faster to use heapreplace(...) than
heappop(...);heappush(...) but it only saves a few %.
 
I

Ian Kelly

Actually, because it's a list under the hood I'd imagine push and pop
still take O(n) time :/.

It shouldn't. You can implement push by appending the new item and
then getting it into the right place by performing O(log n) swaps.
Likewise for pop, you can update the heap with O(log n) swaps and then
removing the tail element. Appending or removing the tail element of
a list is amortized O(1).
PS: It's faster to use heapreplace(...) than
heappop(...);heappush(...) but it only saves a few %.

The problem with heapreplace here is that the value to be pushed
is dependent on the value returned by pop.
 
J

Joshua Landau

It shouldn't. You can implement push by appending the new item and
then getting it into the right place by performing O(log n) swaps.
Likewise for pop, you can update the heap with O(log n) swaps and then
removing the tail element. Appending or removing the tail element of
a list is amortized O(1).

Genius. Bas replied off-list that it won't matter too much anyway as
they're over-allocated, but this makes tons of sense.
The problem with heapreplace here is that the value to be pushed
is dependent on the value returned by pop.

That's fine because indexing is much faster than pop. The "few %" was
a real statistic from working, timed code.
 
A

Albert van der Horst

And now for something completely different.

I knocked together a prime number generator, just for the fun of it,
that works like a Sieve of Eratosthenes but unbounded. It keeps track
of all known primes and the "next composite" that it will produce -
for instance, after yielding 13, the prime map will be {2: 20, 3: 18,
5: 20, 7: 21, 11: 22, 13: 26}, each one mapped to the first multiple
greater than 13.

Notable in the algorithm is an entire lack of division, or even
multiplication. Everything is done with addition.

So, a few questions. Firstly, is there a stdlib way to find the key
with the lowest corresponding value? In the above map, it would return
3, because 18 is the lowest value in the list. I want to do this with
a single pass over the dictionary. Secondly, can the "while
i<smallest... i+=1" loop become a for...range? It's almost asking for
it, but not quite there. Thirdly, is there any sort of half-sane
benchmark that I can compare this code to? And finally, whose wheel
did I reinvent here? What name would this algorithm have?

Notice that all values from i on are possibly present.
So you are better off with a list indexed by forthcoming i's and
each item containing a list of primes. What you do then, more or less,
is keep track of all dividers of primes to be.
This promises to be reasonable efficient.
I've done a similar thing in Forth.

I've also done a slightly different but related parallel program on a
multi-processor Forth machine where each processor takes care of one
prime.

There is an unpleasant fact about this kind of generators.
If you want to go unbounded, you've no choice but remember all
primes. If you go bounded, you need to remember 168 up til 1M,
say sqrt(limit)/log(limit). This dramatic difference (and the lack
of processors) leads one quickly to decide for some upper bound.
Code tested on Python 3.3, would probably run fine on pretty much any
Python that supports yield, though I don't have a Py2.2 to test from
__future__ import generators on!

I had problems with the print statement on a 2 version, fixed easy
enough.
ChrisA

# -- start --
def primes():
"""Generate an infinite series of prime numbers."""
i=2
yield 2
prime={2:2} # Map a prime number to its next composite (but bootstrap with 2:2)
while True:
# Find the smallest value in prime[] and its key.
# Is there a standard library way to do this??
# (If two values are equal smallest, either can be returned.)
prm=None
for p,val in prime.items():
if prm is None or val<smallest:
prm,smallest=p,val
prime[prm]+=prm
while i<smallest:
yield i
prime=i+i
i+=1
if i==smallest: i+=1

gen=primes()
for i in range(30):
print(next(gen),end="\t") # Star Trek?
print()
# -- end --
 
Ad

Advertisements

B

bryanjugglercryptographer

Chris said:
It's pretty simple. (That's a bad start, I know!) Like the Sieve of
Eratosthenes, it locates prime numbers, then deems every multiple of
them to be composite. Unlike the classic sieve, it does the "deem"
part in parallel. Instead of marking all the multiples of 2 first,
then picking three and marking all the multiples of 3, then 5, etc,
this function records the fact that it's up to (say) 42 in marking
multiples of 2, and then when it comes to check if 43 is prime or not,
it moves to the next multiple of 2. This requires memory to store the
previously-known primes, similarly to other methods, but needs no
multiplication or division.

Knuth points to the method, using a priority queue, in exercise 15 of section 5.2.3 of /Sorting and Searching/, and credits it to "B. A. Chartres".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top