Python 3.3 vs. MSDOS Basic

John Immarino · Feb 18, 2013

I coded a Python solution for Problem #14 on the Project Euler website. I was very surprised to find that it took 107 sec. to run even though it's a pretty simple program. I also coded an equivalent solution for the problem in the old MSDOS basic. (That's the 16 bit app of 1980s vintage.) It ran in 56 sec. Is there a flaw in my coding, or is Python really this slow in this particular application. MSDOS Basic usually runs at a snails pace compared to Python.

Below is the problem and the code:

The following iterative sequence is defined for the set of positive integers:

n â†’ n/2 (n is even)
n â†’ 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:
13 â†’ 40 â†’ 20 â†’ 10 â†’ 5 â†’ 16 â†’ 8 â†’ 4 â†’ 2 â†’ 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

max=0
m=0
while m<=1000000:
m+=1
count=0
n=m
while n!=1:
count+=1
if n%2==0:
n=n//2
else:
n=3*n+1
if count>max:
max=count
num=m
print(num,max)

Ian Kelly · Feb 18, 2013

I coded a Python solution for Problem #14 on the Project Euler website. Iwas very surprised to find that it took 107 sec. to run even though it's apretty simple program. I also coded an equivalent solution for the problem in the old MSDOS basic. (That's the 16 bit app of 1980s vintage.) It ranin 56 sec. Is there a flaw in my coding, or is Python really this slow in this particular application. MSDOS Basic usually runs at a snails pace compared to Python.

Well, I don't see anything that looks especially slow in that code,
but the algorithm that you're using is not very efficient. I rewrote
it using dynamic programming (details left as an exercise), which got
the runtime down to about 4 seconds.

Chris Angelico · Feb 18, 2013

I coded a Python solution for Problem #14 on the Project Euler website. Iwas very surprised to find that it took 107 sec. to run even though it's apretty simple program. I also coded an equivalent solution for the problem in the old MSDOS basic. (That's the 16 bit app of 1980s vintage.) It ranin 56 sec. Is there a flaw in my coding, or is Python really this slow in this particular application. MSDOS Basic usually runs at a snails pace compared to Python.

BASIC does a lot less. If you wrote an 8086 assembly language
interpreter in Python, it'd run fairly slowly too

Python isn't
really the world's best language for number crunching inside a machine
word; though if this were a major project, I would recommend looking
into Cython, as it lets you translate a few critical portions of your
code to C while leaving the rest in Python.

In order to get some useful stats, I added a little timing code to
your original; on my Windows XP laptop, running Python 3.3, your
version took 212.64 seconds to get to a result (namely, 837799 with a
count of 524).

Here's how I'd code it:

import time
start=time.time()
max=0
for m in range(1,1000001):
n=m
count=0
while n>1:
if n%2: n=3*n+1
else: n//=2
count+=1
if count>max: max,num=count,m
if not m&16383: print("->",m,count)
print(num,max)
print(time.time()-start)

(You'll see the same timing information that I added to yours. It adds
immeasurably to the run-time, and gives some early idea of how it's
going.)

Running under Python 2.6, both your version and mine take about 90
seconds to run. But under Python 3.3, where (among other things)
range() yields values lazily, my version is significantly faster than
yours. BUT! Both versions, under 3.3, are significantly *slower* than
under 2.6. My first thought is that it's because Py2 has different
types for 'int' and 'long', and Py3 doesn't (effectively, everything's
a long), so I added an L suffix to every number and ran each of them
under 2.6 again. Seems that was the bulk of the difference, though not
all.

Pythonistas, does this count as a regression, or is Python
sufficiently "not a number crunching language" that we don't care?

(range = my code, as above; while = original version with a C-style
loop counter)
range py3: 171.07846403121948
while py3: 212.64104509353638
range py2: 87.859000206
while py2: 86.4059998989
range py2 longs: 190.530999899
while py2 longs: 176.125999928

For comparison purposes, I also coded up the equivalent in Pike.
Pike's a very similar language to Python, but with a C-like syntax,
and certain optimizations - including, significantly to this exercise,
an integer type that sits within a machine word if it can (though
it'll happily go arbitrary precision when it's needed to). It pretends
to the programmer that it's a Py3-style "everything's an int", but
underneath, functions more like Py2 with separate short and long
types. The result: 22.649 seconds to reach the same conclusion.

How long did your BASIC version take, and how long did the Python
version on the same hardware?

This sort of pure number crunching isn't really where a modern high
level language shines. You'll come to *really* appreciate Python as
soon as you start working with huge arrays, dictionaries, etc. This is
a job for C, really.

ChrisA

Chris Angelico · Feb 18, 2013

How long did your BASIC version take, and how long did the Python
version on the same hardware?

Oops, my bad, you already posted the figures

And I forgot to ask:
Which Python version didyou use?

ChrisA

Chris Angelico · Feb 18, 2013

Oops, my bad, you already posted the figures And I forgot to ask:
Which Python version didyou use?

ChrisA

Doh. I'm having a great day of not reading properly, today. (I blame
checking mail on the bus, it took me over an hour to read this one
message and I'd forgotten the subject line by the time I got to the
end.) Python 3.3, right there in the header. Disregard me!

ChrisA

Chris Angelico · Feb 18, 2013

Well, I don't see anything that looks especially slow in that code,
but the algorithm that you're using is not very efficient. I rewrote
it using dynamic programming (details left as an exercise), which got
the runtime down to about 4 seconds.

Did it involve a dictionary, mapping a value to its count, so that any
time you hit a value you've seen, you can short-cut it? That was my
first optimization consideration, though I didn't implement it in any
version, so as to keep the timings comparable.

ChrisA

Ian Kelly · Feb 18, 2013

Did it involve a dictionary, mapping a value to its count, so that any
time you hit a value you've seen, you can short-cut it? That was my
first optimization consideration, though I didn't implement it in any
version, so as to keep the timings comparable.

Ayup.

Alexander Blinne · Feb 19, 2013

Am 18.02.2013 20:13, schrieb John Immarino:

I coded a Python solution for Problem #14 on the Project Euler website. I was very surprised to find that it took 107 sec. to run even though it's a pretty simple program. I also coded an equivalent solution for the problem in the old MSDOS basic. (That's the 16 bit app of 1980s vintage.) It ran in 56 sec. Is there a flaw in my coding, or is Python really this slow in this particular application. MSDOS Basic usually runs at a snails pace compared to Python.

max=0
m=0
while m<=1000000:
m+=1
count=0
n=m
while n!=1:
count+=1
if n%2==0:
n=n//2
else:
n=3*n+1
if count>max:
max=count
num=m
print(num,max)

I cannot compare my timings with basic but python 2.7.3 and python 3.2.3
are both equally slow hier (~50 sec).
pypy is a lot faster (only some old version 1.7.0, current versions
should be faster still) with about 5 sec.

The following C-Program:

#include <stdio.h>

int main(void) {

int max = 0;
int m = 0;
long int n;
int count;
int num;

while(m<=1000000) {
m++;
n = m;
count = 0;

while(n != 1) {
count++;
if(n % 2 == 0) {
n = n / 2;
}
else {
n = n*3 + 1;
}
}

if(count > max) {
max = count;
num = m;
}
}

printf("%d, %d\n", num, max);
}

Does the job in just under 1 sec.

Greetings
Alexander

Dennis Lee Bieber · Feb 19, 2013

I coded a Python solution for Problem #14 on the Project Euler website. I was very surprised to find that it took 107 sec. to run even though it's a pretty simple program. I also coded an equivalent solution for the problem in the old MSDOS basic. (That's the 16 bit app of 1980s vintage.) It ran in 56 sec. Is there a flaw in my coding, or is Python really this slow in this particular application. MSDOS Basic usually runs at a snails pace compared to Python.

max=0

"max" is a bad name -- it masks the built-in max() function

m=0
while m<=1000000:
m+=1

Since "m" is only modified here and has a value of 1 for the first
pass through, you can replace those three lines with

for m in xrange(1, 1000001): #python 2.x, just use range() for 3.x

count=0
n=m

while n!=1:
count+=1
if n%2==0:
n=n//2
else:
n=3*n+1

Avoid the comparison to 0 by reversing the then/else actions... Any
0 result is false.

-=-=-=-=-
import time

mx = 0

start = time.time()
for m in xrange(1, 1000001):
count = 0
n = m
while n > 1:
count += 1
if n % 2: # 0 means false
n = 3 * n + 1
else:
n = n // 2

if count > mx:
mx, num = count, m

end = time.time()

print num, mx
print end-start
-=-=-=-=-
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

E:\UserData\Wulfraed\My Documents>cd "Python Progs"

E:\UserData\Wulfraed\My Documents\Python Progs>Script1.py
837799 524
83.2030000687

E:\UserData\Wulfraed\My Documents\Python Progs>

Terry Reedy · Feb 19, 2013

I coded a Python solution for Problem #14 on the Project Euler
website. I was very surprised to find that it took 107 sec. to run
even though it's a pretty simple program. I also coded an equivalent
solution for the problem in the old MSDOS basic. (That's the 16 bit
app of 1980s vintage.) It ran in 56 sec. Is there a flaw in my
coding, or is Python really this slow in this particular application.
MSDOS Basic usually runs at a snails pace compared to Python.

I find this surprising too. I am also surprised that it even works,
given that the highest intermediate value is about 57 billion and I do
not remember that Basic had infinite precision ints.

The following iterative sequence is defined for the set of positive
integers:

n â†’ n/2 (n is even) n â†’ 3n + 1 (n is odd)

Note that if n is odd, 3n + 1 is even (and not 1!), so one may take two
steps with (3n + 1)/2.

Using the rule above and starting with 13, we generate the following
sequence: 13 â†’ 40 â†’ 20 â†’ 10 â†’ 5 â†’16 â†’ 8 â†’ 4 â†’ 2 â†’ 1

It can be seen that this sequence (starting at 13 and finishing at 1)
contains 10 terms. Although it has not been proved yet (Collatz
Problem), it is thought that all starting numbers finish at 1.
https://en.wikipedia.org/wiki/Collatz_conjecture

Which starting number, under one million, produces the longest
chain?

I suppose 'print(837799)' would not count as a proper solution.

NOTE: Once the chain starts the terms are allowed to go above one
million.

Here is my slightly revised code with timings on a good, 20 month old
win 7 machine.

from time import time
start = time()

num, max = 0, 0
for m in range(1, 1000001):
n = m
count=0
while n !=1:
if n & 1: #n % 2:
n = (3*n + 1) // 2
count += 2
else:
n = n//2
count += 1
if count > max:
num = m
max = count

print(num, max , time()-start)

# original: 837799, 524 steps, 53.9 secs
# for ... range: 52.3
# reverse inner if 49.0
# double step 39.1
# n & 1 instead of n % 2 for test: 36.0, 36.0, 35.9
# n>>1 instead of n//2: 34.7, 36.1, 36.2;
# this may be internally optimized, so skip

I do not see any fluff left to remove, unless one takes the major step
of saving already calculated values in an array.

Since the highest intermediate value of n is 56991483520 (445245965
*2**7, from adding "if n > maxn: maxn = n" to the odd branch, before
dividing by 2), the array would have to be limited to a much lower
value, say a few million.

John Immarino · Feb 19, 2013

Doh. I'm having a great day of not reading properly, today. (I blame

checking mail on the bus, it took me over an hour to read this one

message and I'd forgotten the subject line by the time I got to the

end.) Python 3.3, right there in the header. Disregard me!

ChrisA

Thanks,Chris. I'm a newbie to Python and didn't realize that it's not as good at number crunching as some of the others. It does seem to do better than Basic with numbers in lists as opposed to arrays in Basic.

John Immarino · Feb 19, 2013

Doh. I'm having a great day of not reading properly, today. (I blame

checking mail on the bus, it took me over an hour to read this one

message and I'd forgotten the subject line by the time I got to the

end.) Python 3.3, right there in the header. Disregard me!

ChrisA

Thanks,Chris. I'm a newbie to Python and didn't realize that it's not as good at number crunching as some of the others. It does seem to do better than Basic with numbers in lists as opposed to arrays in Basic.

John Immarino · Feb 19, 2013

max=0

Click to expand...

"max" is a bad name -- it masks the built-in max() function

m=0

Click to expand...

while m<=1000000:

Click to expand...

m+=1

Click to expand...

Since "m" is only modified here and has a value of 1 for the first

pass through, you can replace those three lines with

for m in xrange(1, 1000001): #python 2.x, just use range() for 3.x

count=0

while n!=1:

if n%2==0:

n=3*n+1

Click to expand...

Avoid the comparison to 0 by reversing the then/else actions... Any

0 result is false.

-=-=-=-=-

import time

mx = 0

start = time.time()

for m in xrange(1, 1000001):

count = 0

n = m

while n > 1:

count += 1

if n % 2: # 0 means false

n = 3 * n + 1

else:

n = n // 2

if count > mx:

mx, num = count, m

end = time.time()

print num, mx

print end-start

-=-=-=-=-

Microsoft Windows XP [Version 5.1.2600]

(C) Copyright 1985-2001 Microsoft Corp.

E:\UserData\Wulfraed\My Documents>cd "Python Progs"

E:\UserData\Wulfraed\My Documents\Python Progs>Script1.py

837799 524

83.2030000687

E:\UserData\Wulfraed\My Documents\Python Progs>

--

Wulfraed Dennis Lee Bieber AF6VN

(e-mail address removed) HTTP://wlfraed.home.netcom.com/

Thanks, your suggestions are well taken.

John Immarino · Feb 19, 2013

max=0

Click to expand...

"max" is a bad name -- it masks the built-in max() function

m=0

Click to expand...

while m<=1000000:

Click to expand...

m+=1

Click to expand...

Since "m" is only modified here and has a value of 1 for the first

pass through, you can replace those three lines with

for m in xrange(1, 1000001): #python 2.x, just use range() for 3.x

count=0

while n!=1:

if n%2==0:

n=3*n+1

Click to expand...

Avoid the comparison to 0 by reversing the then/else actions... Any

0 result is false.

-=-=-=-=-

import time

mx = 0

start = time.time()

for m in xrange(1, 1000001):

count = 0

n = m

while n > 1:

count += 1

if n % 2: # 0 means false

n = 3 * n + 1

else:

n = n // 2

if count > mx:

mx, num = count, m

end = time.time()

print num, mx

print end-start

-=-=-=-=-

Microsoft Windows XP [Version 5.1.2600]

(C) Copyright 1985-2001 Microsoft Corp.

E:\UserData\Wulfraed\My Documents>cd "Python Progs"

E:\UserData\Wulfraed\My Documents\Python Progs>Script1.py

837799 524

83.2030000687

E:\UserData\Wulfraed\My Documents\Python Progs>

--

Wulfraed Dennis Lee Bieber AF6VN

(e-mail address removed) HTTP://wlfraed.home.netcom.com/

Thanks, your suggestions are well taken.

Chris Angelico · Feb 19, 2013

Thanks,Chris. I'm a newbie to Python and didn't realize that it's not as good at number crunching as some of the others. It does seem to do better than Basic with numbers in lists as opposed to arrays in Basic.

Yes, Python is excellent at data handling. I'll cheerfully use Python
to manipulate huge lists or arrays, and its performance at that is
usually well within the "good enough" range (for instance, anything
that manipulates the file system will be waiting on my disks, not on
Python). It's an excellent tool in the toolkit, just not the one
solution to everything. (Nothing's that!)

ChrisA

Nick Mellor · Feb 19, 2013

Hi John,

Thanks for the problem. I've been writing Python for about 4 years now and am beginning to feel like I'm writing much better Python code.

Python does fine on this problem if you play to its strengths. The following uses dictionary lookups to store previously computed sequence lengths, thus saving a lot of work. The problem is very "sparse", i.e. there are huge gaps between numbers that are actually used in the solution, making dictionaries a better fit than lists.

This code crosses the line in under 3s on a 64-bit laptop. MS-DOS BASIC anyone?

I tried precomputing powers of 2 and multiples of 2, but to my surprise it made very little difference to timings. Even though precomputing n//2 is fast, I think again this is because the problem is sparse and the time the computer saves is not offset by the cost of precomputing many multiples of 2 that are never needed.

Best wishes,

Nick

And the winner is 837799 with sequence length 524
Time (s): 2.924168109893799
Sequence is:
[837799, 2513398, 1256699, 3770098, 1885049, 5655148, 2827574, 1413787, 4241362, 2120681, 6362044, 3181022, 1590511, 4771534, 2385767, 7157302, 3578651, 10735954, 5367977, 16103932, 8051966, 4025983, 12077950, 6038975, 18116926, 9058463, 27175390, 13587695, 40763086, 20381543, 61144630, 30572315, 91716946, 45858473, 137575420, 68787710, 34393855, 103181566, 51590783, 154772350, 77386175, 232158526, 116079263, 348237790, 174118895, 522356686, 261178343, 783535030, 391767515, 1175302546, 587651273, 1762953820, 881476910, 440738455, 1322215366, 661107683, 1983323050, 991661525, 2974984576, 1487492288, 743746144, 371873072, 185936536, 92968268, 46484134, 23242067, 69726202, 34863101, 104589304, 52294652, 26147326, 13073663, 39220990, 19610495, 58831486, 29415743, 88247230, 44123615, 132370846, 66185423, 198556270, 99278135, 297834406, 148917203, 446751610, 223375805, 670127416, 335063708, 167531854, 83765927, 251297782, 125648891, 376946674, 188473337, 565420012, 282710006, 141355003, 424065010, 212032505, 636097516, 318048758, 159024379,477073138, 238536569, 715609708, 357804854, 178902427, 536707282, 268353641, 805060924, 402530462, 201265231, 603795694, 301897847, 905693542, 452846771, 1358540314, 679270157, 2037810472, 1018905236, 509452618, 254726309, 764178928, 382089464, 191044732, 95522366, 47761183, 143283550, 71641775, 214925326, 107462663, 322387990, 161193995, 483581986, 241790993, 725372980, 362686490, 181343245, 544029736, 272014868, 136007434, 68003717, 204011152,102005576, 51002788, 25501394, 12750697, 38252092, 19126046, 9563023, 28689070, 14344535, 43033606, 21516803, 64550410, 32275205, 96825616, 48412808,24206404, 12103202, 6051601, 18154804, 9077402, 4538701, 13616104, 6808052, 3404026, 1702013, 5106040, 2553020, 1276510, 638255, 1914766, 957383, 2872150, 1436075, 4308226, 2154113, 6462340, 3231170, 1615585, 4846756, 2423378, 1211689, 3635068, 1817534, 908767, 2726302, 1363151, 4089454, 2044727, 6134182, 3067091, 9201274, 4600637, 13801912, 6900956, 3450478, 1725239, 5175718, 2587859, 7763578, 3881789, 11645368, 5822684, 2911342, 1455671, 4367014, 2183507, 6550522, 3275261, 9825784, 4912892, 2456446, 1228223, 3684670,1842335, 5527006, 2763503, 8290510, 4145255, 12435766, 6217883, 18653650, 9326825, 27980476, 13990238, 6995119, 20985358, 10492679, 31478038, 15739019, 47217058, 23608529, 70825588, 35412794, 17706397, 53119192, 26559596, 13279798, 6639899, 19919698, 9959849, 29879548, 14939774, 7469887, 22409662, 11204831, 33614494, 16807247, 50421742, 25210871, 75632614, 37816307, 113448922, 56724461, 170173384, 85086692, 42543346, 21271673, 63815020, 31907510, 15953755, 47861266, 23930633, 71791900, 35895950, 17947975, 53843926, 26921963, 80765890, 40382945, 121148836, 60574418, 30287209, 90861628, 45430814, 22715407, 68146222, 34073111, 102219334, 51109667, 153329002, 76664501, 229993504, 114996752, 57498376, 28749188, 14374594, 7187297, 21561892, 10780946, 5390473, 16171420, 8085710, 4042855, 12128566, 6064283, 18192850, 9096425, 27289276, 13644638, 6822319, 20466958, 10233479, 30700438, 15350219, 46050658, 23025329, 69075988, 34537994, 17268997, 51806992, 25903496, 12951748, 6475874, 3237937, 9713812, 4856906, 2428453, 7285360, 3642680, 1821340, 910670, 455335, 1366006, 683003, 2049010, 1024505, 3073516, 1536758, 768379, 2305138, 1152569, 3457708, 1728854, 864427, 2593282, 1296641, 3889924, 1944962, 972481, 2917444, 1458722, 729361, 2188084, 1094042, 547021, 1641064, 820532, 410266, 205133, 615400, 307700, 153850, 76925, 230776, 115388, 57694, 28847, 86542, 43271, 129814, 64907, 194722, 97361, 292084, 146042, 73021, 219064, 109532, 54766, 27383, 82150, 41075, 123226, 61613, 184840, 92420, 46210, 23105, 69316, 34658, 17329, 51988, 25994, 12997, 38992, 19496, 9748, 4874, 2437, 7312, 3656, 1828, 914, 457, 1372, 686, 343, 1030, 515, 1546, 773, 2320, 1160, 580, 290, 145, 436, 218, 109, 328, 164, 82, 41, 124, 62, 31, 94, 47, 142, 71, 214, 107, 322, 161, 484, 242, 121, 364, 182, 91, 274, 137, 412, 206, 103, 310, 155, 466, 233, 700, 350, 175, 526, 263, 790, 395, 1186, 593, 1780, 890, 445, 1336, 668, 334, 167, 502, 251, 754, 377, 1132,566, 283, 850, 425, 1276, 638, 319, 958, 479, 1438, 719, 2158, 1079, 3238,1619, 4858, 2429, 7288, 3644, 1822, 911, 2734, 1367, 4102, 2051, 6154, 3077, 9232, 4616, 2308, 1154, 577, 1732, 866, 433, 1300, 650, 325, 976, 488, 244, 122, 61, 184, 92, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1]
Sparsity calculations...
Computed sequence lengths 2168611
Largest term: 56991483520
Test range: 1 1000000
Biggest gap: 4508198208
Sparsity: 0.00175%

# If True, will precompute powers of 2 and multiples of 2
# in practice this made little difference on 64-bit hardware
OPTIMISE = True

def build_sequence(n):
"""return sequence as a list given the starting number
Uses the trail of data left by compute_sequence"""
tmp = compute_sequence(n)
sequence = []
while n:
sequence.append(n)
n = next_num[n]
return sequence

def compute_sequence(n):
"""lazily compute sequences for Collatz problem"""
if n in seqlength:
return seqlength[n]
if n not in next_num:
# NOTE: (some) evens are pre-computed
next_num[n] = 3 * n + 1 if n % 2 else n // 2
seqlength[n] = 1 + compute_sequence(next_num[n])
return seqlength[n]

import time
start = time.time()

highest_number = int(1000000)
highest_term = highest_number * 3 + 1
highest_term += 1 if highest_term % 2 else 0

next_num = {2:1}
if OPTIMISE:
# quickly pre-compute (some of) the evens (used for n = n//2 if n is even)
# how many should we precompute? Any mathematicians?
doubles = range(2, highest_term, 2)
numbers = range(1, highest_term//2)
next_num = dict(zip(doubles, numbers))
# mark 1 as the end-point of any sequence
next_num[1] = 0

# initialise the sequence lengths
seqlength = {}
seqlength[1] = 0
seqlength[2] = 1
if OPTIMISE:
# powers of 2 are trivial: 2**n has sequence length n
n = 2
pwr = 4
while pwr < highest_term:
seqlength[pwr] = n
pwr = pwr * 2
n += 1
max_length = 0
for n in range(3, highest_number + 1):
length = compute_sequence(n)
if length > max_length:
max_length = length
winning_number = n
print ("And the winner is {0} with sequence length {1}".format(winning_number, max_length))
end = time.time()
print ("Time (s): ", (end-start))

print ("Sequence is:")
print (build_sequence(winning_number))

# Sparsity calculation
sorted_seqlengths = sorted(seqlength.keys())
print ("Sparsity calculations...")
print ("Computed sequence lengths", len(seqlength))
largest_term = sorted_seqlengths[-1]
print ("Largest term: ", largest_term)
print ("Test range: ", 1, highest_number)
gaps = (second - first for first, second in zip(sorted_seqlengths[0:-1], sorted_seqlengths[1:]))
biggest_gap = 0
for n in gaps:
if biggest_gap < n:
biggest_gap = n
print ("Biggest gap: ", n)
print ("Sparsity: {0:.5f}%".format(highest_number / largest_term * 100))

Nick Mellor · Feb 19, 2013

Hi John,

Thanks for the problem. I've been writing Python for about 4 years now and am beginning to feel like I'm writing much better Python code.

Python does fine on this problem if you play to its strengths. The following uses dictionary lookups to store previously computed sequence lengths, thus saving a lot of work. The problem is very "sparse", i.e. there are huge gaps between numbers that are actually used in the solution, making dictionaries a better fit than lists.

This code crosses the line in under 3s on a 64-bit laptop. MS-DOS BASIC anyone?

I tried precomputing powers of 2 and multiples of 2, but to my surprise it made very little difference to timings. Even though precomputing n//2 is fast, I think again this is because the problem is sparse and the time the computer saves is not offset by the cost of precomputing many multiples of 2 that are never needed.

Best wishes,

Nick

And the winner is 837799 with sequence length 524
Time (s): 2.924168109893799
Sequence is:
[837799, 2513398, 1256699, 3770098, 1885049, 5655148, 2827574, 1413787, 4241362, 2120681, 6362044, 3181022, 1590511, 4771534, 2385767, 7157302, 3578651, 10735954, 5367977, 16103932, 8051966, 4025983, 12077950, 6038975, 18116926, 9058463, 27175390, 13587695, 40763086, 20381543, 61144630, 30572315, 91716946, 45858473, 137575420, 68787710, 34393855, 103181566, 51590783, 154772350, 77386175, 232158526, 116079263, 348237790, 174118895, 522356686, 261178343, 783535030, 391767515, 1175302546, 587651273, 1762953820, 881476910, 440738455, 1322215366, 661107683, 1983323050, 991661525, 2974984576, 1487492288, 743746144, 371873072, 185936536, 92968268, 46484134, 23242067, 69726202, 34863101, 104589304, 52294652, 26147326, 13073663, 39220990, 19610495, 58831486, 29415743, 88247230, 44123615, 132370846, 66185423, 198556270, 99278135, 297834406, 148917203, 446751610, 223375805, 670127416, 335063708, 167531854, 83765927, 251297782, 125648891, 376946674, 188473337, 565420012, 282710006, 141355003, 424065010, 212032505, 636097516, 318048758, 159024379,477073138, 238536569, 715609708, 357804854, 178902427, 536707282, 268353641, 805060924, 402530462, 201265231, 603795694, 301897847, 905693542, 452846771, 1358540314, 679270157, 2037810472, 1018905236, 509452618, 254726309, 764178928, 382089464, 191044732, 95522366, 47761183, 143283550, 71641775, 214925326, 107462663, 322387990, 161193995, 483581986, 241790993, 725372980, 362686490, 181343245, 544029736, 272014868, 136007434, 68003717, 204011152,102005576, 51002788, 25501394, 12750697, 38252092, 19126046, 9563023, 28689070, 14344535, 43033606, 21516803, 64550410, 32275205, 96825616, 48412808,24206404, 12103202, 6051601, 18154804, 9077402, 4538701, 13616104, 6808052, 3404026, 1702013, 5106040, 2553020, 1276510, 638255, 1914766, 957383, 2872150, 1436075, 4308226, 2154113, 6462340, 3231170, 1615585, 4846756, 2423378, 1211689, 3635068, 1817534, 908767, 2726302, 1363151, 4089454, 2044727, 6134182, 3067091, 9201274, 4600637, 13801912, 6900956, 3450478, 1725239, 5175718, 2587859, 7763578, 3881789, 11645368, 5822684, 2911342, 1455671, 4367014, 2183507, 6550522, 3275261, 9825784, 4912892, 2456446, 1228223, 3684670,1842335, 5527006, 2763503, 8290510, 4145255, 12435766, 6217883, 18653650, 9326825, 27980476, 13990238, 6995119, 20985358, 10492679, 31478038, 15739019, 47217058, 23608529, 70825588, 35412794, 17706397, 53119192, 26559596, 13279798, 6639899, 19919698, 9959849, 29879548, 14939774, 7469887, 22409662, 11204831, 33614494, 16807247, 50421742, 25210871, 75632614, 37816307, 113448922, 56724461, 170173384, 85086692, 42543346, 21271673, 63815020, 31907510, 15953755, 47861266, 23930633, 71791900, 35895950, 17947975, 53843926, 26921963, 80765890, 40382945, 121148836, 60574418, 30287209, 90861628, 45430814, 22715407, 68146222, 34073111, 102219334, 51109667, 153329002, 76664501, 229993504, 114996752, 57498376, 28749188, 14374594, 7187297, 21561892, 10780946, 5390473, 16171420, 8085710, 4042855, 12128566, 6064283, 18192850, 9096425, 27289276, 13644638, 6822319, 20466958, 10233479, 30700438, 15350219, 46050658, 23025329, 69075988, 34537994, 17268997, 51806992, 25903496, 12951748, 6475874, 3237937, 9713812, 4856906, 2428453, 7285360, 3642680, 1821340, 910670, 455335, 1366006, 683003, 2049010, 1024505, 3073516, 1536758, 768379, 2305138, 1152569, 3457708, 1728854, 864427, 2593282, 1296641, 3889924, 1944962, 972481, 2917444, 1458722, 729361, 2188084, 1094042, 547021, 1641064, 820532, 410266, 205133, 615400, 307700, 153850, 76925, 230776, 115388, 57694, 28847, 86542, 43271, 129814, 64907, 194722, 97361, 292084, 146042, 73021, 219064, 109532, 54766, 27383, 82150, 41075, 123226, 61613, 184840, 92420, 46210, 23105, 69316, 34658, 17329, 51988, 25994, 12997, 38992, 19496, 9748, 4874, 2437, 7312, 3656, 1828, 914, 457, 1372, 686, 343, 1030, 515, 1546, 773, 2320, 1160, 580, 290, 145, 436, 218, 109, 328, 164, 82, 41, 124, 62, 31, 94, 47, 142, 71, 214, 107, 322, 161, 484, 242, 121, 364, 182, 91, 274, 137, 412, 206, 103, 310, 155, 466, 233, 700, 350, 175, 526, 263, 790, 395, 1186, 593, 1780, 890, 445, 1336, 668, 334, 167, 502, 251, 754, 377, 1132,566, 283, 850, 425, 1276, 638, 319, 958, 479, 1438, 719, 2158, 1079, 3238,1619, 4858, 2429, 7288, 3644, 1822, 911, 2734, 1367, 4102, 2051, 6154, 3077, 9232, 4616, 2308, 1154, 577, 1732, 866, 433, 1300, 650, 325, 976, 488, 244, 122, 61, 184, 92, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1]
Sparsity calculations...
Computed sequence lengths 2168611
Largest term: 56991483520
Test range: 1 1000000
Biggest gap: 4508198208
Sparsity: 0.00175%

# If True, will precompute powers of 2 and multiples of 2
# in practice this made little difference on 64-bit hardware
OPTIMISE = True

def build_sequence(n):
"""return sequence as a list given the starting number
Uses the trail of data left by compute_sequence"""
tmp = compute_sequence(n)
sequence = []
while n:
sequence.append(n)
n = next_num[n]
return sequence

def compute_sequence(n):
"""lazily compute sequences for Collatz problem"""
if n in seqlength:
return seqlength[n]
if n not in next_num:
# NOTE: (some) evens are pre-computed
next_num[n] = 3 * n + 1 if n % 2 else n // 2
seqlength[n] = 1 + compute_sequence(next_num[n])
return seqlength[n]

import time
start = time.time()

highest_number = int(1000000)
highest_term = highest_number * 3 + 1
highest_term += 1 if highest_term % 2 else 0

next_num = {2:1}
if OPTIMISE:
# quickly pre-compute (some of) the evens (used for n = n//2 if n is even)
# how many should we precompute? Any mathematicians?
doubles = range(2, highest_term, 2)
numbers = range(1, highest_term//2)
next_num = dict(zip(doubles, numbers))
# mark 1 as the end-point of any sequence
next_num[1] = 0

# initialise the sequence lengths
seqlength = {}
seqlength[1] = 0
seqlength[2] = 1
if OPTIMISE:
# powers of 2 are trivial: 2**n has sequence length n
n = 2
pwr = 4
while pwr < highest_term:
seqlength[pwr] = n
pwr = pwr * 2
n += 1
max_length = 0
for n in range(3, highest_number + 1):
length = compute_sequence(n)
if length > max_length:
max_length = length
winning_number = n
print ("And the winner is {0} with sequence length {1}".format(winning_number, max_length))
end = time.time()
print ("Time (s): ", (end-start))

print ("Sequence is:")
print (build_sequence(winning_number))

# Sparsity calculation
sorted_seqlengths = sorted(seqlength.keys())
print ("Sparsity calculations...")
print ("Computed sequence lengths", len(seqlength))
largest_term = sorted_seqlengths[-1]
print ("Largest term: ", largest_term)
print ("Test range: ", 1, highest_number)
gaps = (second - first for first, second in zip(sorted_seqlengths[0:-1], sorted_seqlengths[1:]))
biggest_gap = 0
for n in gaps:
if biggest_gap < n:
biggest_gap = n
print ("Biggest gap: ", n)
print ("Sparsity: {0:.5f}%".format(highest_number / largest_term * 100))

Terry Reedy · Feb 19, 2013

Running under Python 2.6, both your version and mine take about 90
seconds to run. But under Python 3.3, where (among other things)
range() yields values lazily, my version is significantly faster than
yours. BUT! Both versions, under 3.3, are significantly *slower* than
under 2.6. My first thought is that it's because Py2 has different
types for 'int' and 'long', and Py3 doesn't (effectively, everything's
a long), so I added an L suffix to every number and ran each of them
under 2.6 again. Seems that was the bulk of the difference, though not
all.

Pythonistas, does this count as a regression, or is Python
sufficiently "not a number crunching language" that we don't care?

Both. This brute-force algorithm is almost pure number crunching. This
is the sort of thing pypy and cython are good at speeding up. (I leave
out numpy only because it is not an array-oriented problem.)

I put a counter in the inner loop of my improved version the does
(3*n+1)//2 in one step and got 87 826 478 in 40 seconds (without the
counter). That is 2 million loops per second and each loop does a
compare, one or two integer ops, and creates and releases one or two ints.

If I were doing a lot of int crunching like this with CPython and were
building my own interpreter, I would greatly expand the range of
pre-allocated 'small' ints to avoid some of the repeated allocation and
de-allocation. On a multi-gibibyte machine, allocating up to 1000000
instead of 256 would be feasible.

As Ian noted, an intelligent algorithm in CPython can match pypy and is
in the ballpark of C, but is much easier to write in Python than C. It
is possible that Ian's code could be improved further. A pre-allocated
arrray + dict might be faster. Whenever an odd value is filled in,
powers of 2 times that value can also be.

Anssi Saari · Feb 19, 2013

John Immarino said:
I coded a Python solution for Problem #14 on the Project Euler
website. I was very surprised to find that it took 107 sec. to run
even though it's a pretty simple program. I also coded an equivalent
solution for the problem in the old MSDOS basic. (That's the 16 bit
app of 1980s vintage.)

Just out of curiosity, can you post the basic version as well?

Serhiy Storchaka · Feb 19, 2013

max=0
m=0
while m<=1000000:
m+=1
count=0
n=m
while n!=1:
count+=1
if n%2==0:
n=n//2
else:
n=3*n+1
if count>max:
max=count
num=m
print(num,max)

Some minor tips:

1. Use range() for m iteration.
2. Instead of "if n%2==0:" use just "if n%2:".
3. Convert all you code to a function. Python is a little faster with
locals than with globals.

In sum all this tips will speedup your code about 2x.

And one big tip:

Use cashing (and recursion). This will speedup your code more than 10x.

Python battle game help	2	Feb 23, 2023
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
Python code problem	2	Apr 23, 2023
python 3.3 bz2 decompression testing results	0	Jun 16, 2012
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Python point location of intersect between two lines	0	Feb 28, 2018
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Minimum Total Difficulty	0	Nov 15, 2023

Python 3.3 vs. MSDOS Basic

John Immarino

Ian Kelly

Chris Angelico

Chris Angelico

Chris Angelico

Chris Angelico

Ian Kelly

Alexander Blinne

Dennis Lee Bieber

Terry Reedy

John Immarino

John Immarino

John Immarino

John Immarino

Chris Angelico

Nick Mellor

Nick Mellor

Terry Reedy

Anssi Saari

Serhiy Storchaka

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads