# arithmetic persistence

M

#### Marc Girod

A Sunday topic...
I bought not long ago a deck of cards with mathematical puzzles
(question on the face, answer on the back) by Martin Gardner.
One puzzle dealt with the issue of /persistence/ in the mathematical
sense.
Take an integer (decimal representation).
Take the product of the digits its representation is made of.
This gets you a new, smaller, number.
Recurse until the representation takes a single digit.
The persistence is the number of steps it took.

Unclear? Sorry I gave the deck to a friend.
But 0 is the first number of persistence 0.
10 is the first of p(1).
25 is the first of p(2).
39 is the first of p(3).
77 is the first of p(4).
Martin Gardner's question was: what is the first number of p(5)?

After some time of failing to get an answer by just thinking, I wrote
a perl script: p1
-8<----------------------
#!/usr/bin/perl -w

use strict;
use vars qw(\$ver);

sub prod {
my @d = @_;
my \$p = 1;
\$p = \$p * \$_ for @d;
return \$p;
}

sub pers {
my (\$s, \$i) = @_;
if (\$i) {
push @{\$i}, \$s;
} else {
\$i = [];
print " \$s: " if \$ver;
}
my @d = \$s =~ /(\d)/g;
if (@d > 1) {
my \$p = prod @d;
return pers(\$p, \$i);
}
print scalar @{\$i}, ' (', join(' ', @{\$i}), ")\n" if \$ver;
return scalar @{\$i};
}

my (\$i, \$n, \$p) = (1, 1, 0);
while (\$i < 10000000) {
\$p = pers(\$i);
if (\$p == \$n) {
\$ver = 1;
pers(\$i);
\$ver = 0;
\$n++;
}
\$i++;
}
-8<-----------------

It gave me the result, but also a timing (on my laptop, looking
for 10 millions integers, and getting the first result of p(8)):

real 5m41.736s
user 5m40.175s
sys 0m0.108s

I thought that it was pretty lousy, and could be optimized, by
This brought in the question dealt with in a recent thread, of the
efficiency of hashes.
Trying to limit the size of the hash I use, I came up with the
following p2:
-8<-----------------------
#!/usr/bin/perl -w

use strict;
use vars qw(\$ver %c \$h);

sub prod {
my @d = @_;
my \$p = 1;
\$p = \$p * \$_ for @d;
return \$p;
}

sub pers {
my (\$s, \$i) = @_;
print " \$s: " if \$ver and !\$i;
my @d = \$s =~ /(\d)/g;
if (@d > 1) {
@d = sort @d;
if (\$d) {
shift @d while @d and \$d == 1;
if (@d) {
my \$k = join '', @d;
if (\$c{\$k}) {
if (defined \$i) { push @{\$i}, @{\$c{\$k}} } else { \$i = \$c{\$k} }
my \$r = scalar @{\$i};
print " \$r", ' (', join(' ', @{\$i}), ")\n" if \$ver;
\$h++;
return \$r;
}
my \$p = prod @d;
return pers(\$p, []) unless defined \$i;
push @{\$i}, \$s;
my \$r = pers(\$p, \$i);
\$c{\$k} = \$i;
return \$r;
} else {
if (defined \$i) {push @{\$i}, \$s} else { \$i = [] }
return pers(1, \$i);
}
} else {
if (defined \$i) {push @{\$i}, \$s} else { \$i = [] }
return pers(0, \$i);
}
}
if (defined \$i) {push @{\$i}, \$s} else { \$i = [] }
my \$r = scalar @{\$i};
print " \$r", ' (', join(' ', @{\$i}), ")\n" if \$ver;
return \$r;
}

my (\$i, \$n, \$p) = (1, 1, 0);
# \$ver = 1;
while (\$i < 10000000) {
\$p = pers(\$i);
if (\$p == \$n) {
\$ver = 1;
pers(\$i);
\$ver = 0;
\$n++;
}
\$i++;
}
print "#keys: ", scalar keys %c, "\nHits: \$h\n";
-8<----------------

The point is I was disappointed with the result:

real 4m32.386s
user 4m30.334s
sys 0m0.124s

Especially because 1 billion trials takes more time than 10 times 100
millions.
The numbers are larger, of course...
But, how can one do better?
The word 'persistence' makes it a bit awkward to search Google...
This script also gets soon into the 32 bit limit. Getting beyond is a
new challenge.

My resulting hash (for 10 millions) is not huge, and it was used:
#keys: 324
Hits: 1916050

Marc

W

#### Willem

Marc Girod wrote:
) A Sunday topic...
) I bought not long ago a deck of cards with mathematical puzzles
) (question on the face, answer on the back) by Martin Gardner.
) One puzzle dealt with the issue of /persistence/ in the mathematical
) sense.
) Take an integer (decimal representation).
) Take the product of the digits its representation is made of.
) This gets you a new, smaller, number.
) Recurse until the representation takes a single digit.
) The persistence is the number of steps it took.
)
) Unclear? Sorry I gave the deck to a friend.
) But 0 is the first number of persistence 0.
) 10 is the first of p(1).
) 25 is the first of p(2).
) 39 is the first of p(3).
) 77 is the first of p(4).
) Martin Gardner's question was: what is the first number of p(5)?
)
) After some time of failing to get an answer by just thinking, I wrote
) a perl script: p1
) -8<----------------------
) #!/usr/bin/perl -w

) -8<-----------------
)
) It gave me the result, but also a timing (on my laptop, looking
) for 10 millions integers, and getting the first result of p(8)):
)
) real 5m41.736s
) user 5m40.175s
) sys 0m0.108s
)
) I thought that it was pretty lousy, and could be optimized, by
) caching the already done calculations.
) This brought in the question dealt with in a recent thread, of the
) efficiency of hashes.
) Trying to limit the size of the hash I use, I came up with the
) following p2:

A hash ? What results were you caching ??

) But, how can one do better?

Cache the right thing. IE, the persistence of all numbers lower than x.
You already use recursion to calculate persistence(x).
Now, if you replace the recursive call with a lookup in the cache,
you'll get each result in a single step.

) The word 'persistence' makes it a bit awkward to search Google...
) This script also gets soon into the 32 bit limit. Getting beyond is a
) new challenge.

Where do you get into the 32 bit limit ??

Here's an example program I whipped up just now:
On my box, it finds P(8) in 15 seconds. (check: it's 2677889)

use strict;
use warnings;
use List::Util qw(reduce);

my @persis;
my \$found = 0;

for (0 .. 9) { \$persis[\$_] = 0; }

for (my \$i = 10; \$found < 8; \$i++) {
my \$prod = reduce { \$a * \$b } split('', \$i);
\$persis[\$i] = my \$pers = \$persis[\$prod] + 1;
if (\$pers > \$found) {
\$found = \$pers;
print "\$i is the first of p(\$pers)\n";
}
}

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

S

#### sln

Where do you get into the 32 bit limit ??

Here's an example program I whipped up just now:
On my box, it finds P(8) in 15 seconds. (check: it's 2677889)

use strict;
use warnings;
use List::Util qw(reduce);

my @persis;
my \$found = 0;

for (0 .. 9) { \$persis[\$_] = 0; }

for (my \$i = 10; \$found < 8; \$i++) {
my \$prod = reduce { \$a * \$b } split('', \$i);
\$persis[\$i] = my \$pers = \$persis[\$prod] + 1;
if (\$pers > \$found) {
\$found = \$pers;
print "\$i is the first of p(\$pers)\n";
}
}

10 is the first of p(1)
25 is the first of p(2)
39 is the first of p(3)
77 is the first of p(4)
679 is the first of p(5)
6788 is the first of p(6)
68889 is the first of p(7)
2677889 is the first of p(8)
26888999 is the first of p(9)

Anyone care to run this out to p(20)

-sln

M

#### Marc Girod

A hash ?  What results were you caching ??

The decomposition of unique products.
OK: I was doing extra work there to get some extra output.
Not strictly needed for my problem statement.
My output gave:

10: 1 (0)
25: 2 (10 0)
39: 3 (27 14 4)
77: 4 (49 36 18 8)
679: 5 (378 168 48 32 6)
6788: 6 (2688 768 336 45 20 0)
68889: 7 (27648 2688 768 336 45 20 0)
2677889: 8 (338688 27648 2688 768 336 45 20 0)
Cache the right thing.  IE, the persistence of all numbers lower than x..

I was trying to get one step smarter, there...
trying to skip saving duplicates results for numbers
with the same digits in different order.
You already use recursion to calculate persistence(x).
Now, if you replace the recursive call with a lookup in the cache,
you'll get each result in a single step.

I was doing that inside the function.
Where do you get into the 32 bit limit ??

First number with persistence 10?
Here's an example program I whipped up just now:
On my box, it finds P(8) in 15 seconds.  (check: it's 2677889)

Your box is faster than mine: I get 44 s.
Your loop is different, you stop after 2.7 million when I go to 10.
And you get a different low level calculation.

But OK, putting those into my algorithm, I get down to 1m13, so still
about 1.5 times slower than you.
OK, I strip my collecting the decomposition...
Now, we are comparing the same things, and I get 31s.
It is not as clean as yours, granted:

#!/usr/bin/perl -w

use strict;
use vars qw(\$ver %c);
use List::Util qw(reduce);
use vars qw(\$a \$b);

sub pers {
my \$s = shift;
my @d = split('', \$s);
if (@d > 1) {
@d = sort @d;
if (\$d) {
shift @d while @d and \$d == 1;
if (@d) {
my \$k = join '', @d;
return \$c{\$k} if \$c{\$k};
my \$p = reduce { \$a * \$b } @d;
\$c{\$k} = pers(\$p, 1) + 1;
return \$c{\$k};
}
}
return 1;
}
return 0;
}

my (\$i, \$n, \$p) = (1, 1, 0);
while (\$n < 9) {
\$p = pers(\$i);
if (\$p == \$n) {
print " \$i: \$n\n";
\$n++;
}
\$i++;
}

Marc

W

#### Willem

) 10 is the first of p(1)
) 25 is the first of p(2)
) 39 is the first of p(3)
) 77 is the first of p(4)
) 679 is the first of p(5)
) 6788 is the first of p(6)
) 68889 is the first of p(7)
) 2677889 is the first of p(8)
) 26888999 is the first of p(9)
)
) Anyone care to run this out to p(20)

Hell, I didn't have enough memory to get p(10) like this.
Even in C, allocating one byte for one result.

Perhaps a smarter approach would help. Or a stupider one... ^^;;

My perl brute force approach took 25 seconds to find p(8).
Which isn't that much more than the 15 it took the DP version.

BTW: A simple C program took less than 2 seconds to find p(9).
Adding a simple pruning cut that in half.

NB: The pruning version is even faster than the caching version.
Now running it for p(10)...

I guess the OP was using some very slow programming techniques.

If anyone cares, I can explain how the pruning version works.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

M

#### Marc Girod

First number with persistence 10?

OK. I missed that. Perl gets nicely above that.
And you get a different low level calculation.

This didn't make much difference.
My prod was maybe even slightly faster than using reduce.
And using a global regexp marginally slower than split.

Marc

M

#### Marc Girod

I guess the OP was using some very slow programming techniques.

Don't guess, look at the code!
I wasted time mostly to get some extra data in the printout.
The rest was using a different stop condition (testing n numbers,
instead of looking for the first number with a given persistence).
And I have a slower box.

For the rest, my algorithm was already a pruning version of yours.
If anyone cares, I can explain how the pruning version works.

Yes, please. And compare with mine!

Marc

W

#### Willem

Marc Girod wrote:
)
)> A hash ? ?What results were you caching ??
)
) The decomposition of unique products.
) OK: I was doing extra work there to get some extra output.
) Not strictly needed for my problem statement.
) My output gave:
)
) 10: 1 (0)
) 25: 2 (10 0)
) 39: 3 (27 14 4)
) 77: 4 (49 36 18 8)
) 679: 5 (378 168 48 32 6)
) 6788: 6 (2688 768 336 45 20 0)
) 68889: 7 (27648 2688 768 336 45 20 0)
) 2677889: 8 (338688 27648 2688 768 336 45 20 0)
)
)> Cache the right thing. ?IE, the persistence of all numbers lower than x.
)
) I was trying to get one step smarter, there...
) trying to skip saving duplicates results for numbers
) with the same digits in different order.

Oh I see. Does that help ? I would imagine that looking up
the results in an array would give a big speedup.

)> Here's an example program I whipped up just now:
)> On my box, it finds P(8) in 15 seconds. ?(check: it's 2677889)
)
) Your box is faster than mine: I get 44 s.
) Your loop is different, you stop after 2.7 million when I go to 10.
) And you get a different low level calculation.
)
) But OK, putting those into my algorithm, I get down to 1m13, so still
) about 1.5 times slower than you.

Ah, so the low level calculation was slowing you down a lot ?

) OK, I strip my collecting the decomposition...
) Now, we are comparing the same things, and I get 31s.
) It is not as clean as yours, granted:

Now that it's stripped down I see what you were doing.
The whole decomposition thing was throwing me a loop.
Quite clever! And, as it seems, indeed faster.
You cut out any number that has a 0 anywhere, also.

Like this, you're skipping a lot of calculations indeed,
but at the cost of sorting the digits.

By the way, here's a simple version that's marginally faster
even, and doesn't require lots of memory. It uses a simple
pruning trick to cut off calculation when it knows that a
result isn't good enough.

use strict;
use warnings;
use List::Util qw(reduce);

my \$found = 0;
my \$fnum = 0;

for (my \$i = 10; \$found < 8; \$i++) {
my \$prod = reduce { \$a * \$b } split('', \$i);
next if (\$prod < \$fnum);
my \$cnt = 1;
while (\$prod >= 10) {
\$prod = reduce { \$a * \$b } split('', \$prod);
\$cnt++;
}
if (\$cnt > \$found) {
\$found = \$cnt;
\$fnum = \$i;
print "\$i is the first of p(\$cnt)\n";
}
}

I also wrote this in C, using 64-bit ints, and it turns out that
3778888999 is the first of p(10), which my box found in 2m50.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

S

#### sln

Marc Girod wrote:
)
)> A hash ? ?What results were you caching ??
)
) The decomposition of unique products.
) OK: I was doing extra work there to get some extra output.
) Not strictly needed for my problem statement.
) My output gave:
)
) 10: 1 (0)
) 25: 2 (10 0)
) 39: 3 (27 14 4)
) 77: 4 (49 36 18 8)
) 679: 5 (378 168 48 32 6)
) 6788: 6 (2688 768 336 45 20 0)
) 68889: 7 (27648 2688 768 336 45 20 0)
) 2677889: 8 (338688 27648 2688 768 336 45 20 0)
)
)> Cache the right thing. ?IE, the persistence of all numbers lower than x.
)
) I was trying to get one step smarter, there...
) trying to skip saving duplicates results for numbers
) with the same digits in different order.

Oh I see. Does that help ? I would imagine that looking up
the results in an array would give a big speedup.

)> Here's an example program I whipped up just now:
)> On my box, it finds P(8) in 15 seconds. ?(check: it's 2677889)
)
) Your box is faster than mine: I get 44 s.
) Your loop is different, you stop after 2.7 million when I go to 10.
) And you get a different low level calculation.
)
) But OK, putting those into my algorithm, I get down to 1m13, so still
) about 1.5 times slower than you.

Ah, so the low level calculation was slowing you down a lot ?

) OK, I strip my collecting the decomposition...
) Now, we are comparing the same things, and I get 31s.
) It is not as clean as yours, granted:

Now that it's stripped down I see what you were doing.
The whole decomposition thing was throwing me a loop.
Quite clever! And, as it seems, indeed faster.
You cut out any number that has a 0 anywhere, also.

Like this, you're skipping a lot of calculations indeed,
but at the cost of sorting the digits.

By the way, here's a simple version that's marginally faster
even, and doesn't require lots of memory. It uses a simple
pruning trick to cut off calculation when it knows that a
result isn't good enough.

use strict;
use warnings;
use List::Util qw(reduce);

my \$found = 0;
my \$fnum = 0;

for (my \$i = 10; \$found < 8; \$i++) {
my \$prod = reduce { \$a * \$b } split('', \$i);
next if (\$prod < \$fnum);
my \$cnt = 1;
while (\$prod >= 10) {
\$prod = reduce { \$a * \$b } split('', \$prod);
\$cnt++;
}
if (\$cnt > \$found) {
\$found = \$cnt;
\$fnum = \$i;
print "\$i is the first of p(\$cnt)\n";
}
}

I also wrote this in C, using 64-bit ints, and it turns out that
3778888999 is the first of p(10), which my box found in 2m50.

Thank you. I must analyze this pruning.

-sln

W

#### Willem

Marc Girod wrote:
)
)> I guess the OP was using some very slow programming techniques.
)
) Don't guess, look at the code!
) I wasted time mostly to get some extra data in the printout.

That's silly. You can recalculate the extra data when needed.

) The rest was using a different stop condition (testing n numbers,
) instead of looking for the first number with a given persistence).
) And I have a slower box.

The 5m you quoted, was that for the n numbers ?

) For the rest, my algorithm was already a pruning version of yours.
)
)> If anyone cares, I can explain how the pruning version works.
)
) Yes, please. And compare with mine!

Well, it's quite simple. You use a simple brute force algorithm,
but if the first product calculation turns out to be smaller than
the last smallest persistence you found, you go to the next number.

In other words, when looking for a number with P(10), then the
first product has to be in P(9), so if it's smaller than the
smallest P(9), you can conclude that it's not in P(10).

It turns out that multiplying the simple way, using arithmetics,
is faster even in Perl.

use strict;
use warnings;

my \$found = 0;
my \$fnum = 0;

for (my \$i = 10; \$found < 10; \$i++) {
my \$prod = multiply(\$i);
next if (\$prod < \$fnum);
my \$cnt = 1;
while (\$prod >= 10) {
\$prod = multiply(\$prod);
\$cnt++;
}
if (\$cnt > \$found) {
\$found = \$cnt;
\$fnum = \$i;
print "\$i is the first
of p(\$cnt)\n";
}
}

sub multiply
{
my \$num = shift;
my \$prod = 1;
while (\$prod && \$num) {
\$prod *= (\$num % 10);
\$num = int(\$num / 10);
}
return \$prod;
}

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

M

#### Marc Girod

Like this, you're skipping a lot of calculations indeed,
but at the cost of sorting the digits.

....which is a rising cost, and ends up being prohibitive...
By the way, here's a simple version that's marginally faster
even, and doesn't require lots of memory.  It uses a simple
pruning trick to cut off calculation when it knows that a
result isn't good enough.

Yes, a much simpler idea, indeed.
I have to get out of my first mindset of getting the value anyway.
I also wrote this in C, using 64-bit ints, and it turns out that
3778888999 is the first of p(10), which my box found in 2m50.

And I am nowhere near this, of course.
Thanks,
Marc

W

#### Willem

Marc Girod wrote:
)
)> Like this, you're skipping a lot of calculations indeed,
)> but at the cost of sorting the digits.
)
) ...which is a rising cost, and ends up being prohibitive...

Nah. What's prohibitive is the memory footprint.

)> By the way, here's a simple version that's marginally faster
)> even, and doesn't require lots of memory. ?It uses a simple
)> pruning trick to cut off calculation when it knows that a
)> result isn't good enough.
)
) Yes, a much simpler idea, indeed.
) I have to get out of my first mindset of getting the value anyway.
)
)> I also wrote this in C, using 64-bit ints, and it turns out that
)> 3778888999 is the first of p(10), which my box found in 2m50.
)
) And I am nowhere near this, of course.

Well, this is almost all arithmetics, so Perl just doesn't compare.
0.8 seconds to find P(9) in C, versus 1m36 in Perl.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

M

#### Marc Girod

Nah.  What's prohibitive is the memory footprint.

But this rises quite slowly...

324 keys in the hash for 10 millions
459 for 100
596 for 1 billion...

I'd need to profile.
Well, this is almost all arithmetics, so Perl just doesn't compare.
0.8 seconds to find P(9) in C, versus 1m36 in Perl.

Interesting. I was not aware of this ratio.
Thanks.

Marc

I

#### Ilya Zakharevich

Oh I see. Does that help ? I would imagine that looking up
the results in an array would give a big speedup.

Me too. But it looks like there is little speedup or even slowdown; see below.
use strict;
use warnings;
use List::Util qw(reduce);

my \$found = 0;
my \$fnum = 0;

for (my \$i = 10; \$found < 8; \$i++) {
my \$prod = reduce { \$a * \$b } split('', \$i);
next if (\$prod < \$fnum);
my \$cnt = 1;
while (\$prod >= 10) {
\$prod = reduce { \$a * \$b } split('', \$prod);
\$cnt++;
}
if (\$cnt > \$found) {
\$found = \$cnt;
\$fnum = \$i;
print "\$i is the first of p(\$cnt)\n";
}
}

On my machine what is below is almost an order of magnitude better.
It also allows tuning (first arg is the target for \$found [8 above]);
second arg gives size of cache in decimal digits (should be at least
half of the size of the answer). On machine arguments 8 4, 8 5, 8 6
finish in very similar time - this means that benefits of caching are
eaten by not being able to prune when caching...

Hope this helps,
Ilya

#!/usr/bin/perl -w
use strict;
use List::Util qw(reduce);

my \$found = 0;
my \$fnum = 0;
my \$lim = shift;
my \$cache_lim10 = shift;
my \$cache_lim = 10**\$cache_lim10;
my \$rex_lim = '.' x \$cache_lim10;

my (@prod, @perc, \$prod, \$p1, \$p2, \$cnt, \$i);

sub report_size (\$\$) {
my (\$i, \$cnt) = (shift, shift);
\$found = \$cnt;
\$fnum = \$i;
print "\$i is the first of p(\$cnt)\n";
}

\$prod[\$_] = \$_, \$perc[\$_] = 0 for 0..9;

\$#prod = \$#perc = \$cache_lim;

for my \$i (10..\$cache_lim-1) { # Round 1: cache, no pruning
if (\$i =~ /0/) {
\$prod = \$prod[\$i] = 0;
} else {
\$prod = \$prod[\$i] = (\$i%10) * \$prod[int(\$i/10)];
}
report_size(\$i, \$p1)
if (\$p1 = \$perc[\$i] = \$perc[\$prod] + 1) > \$found;
}

LOOP: # Round 2: non-hashing, pruning
for (my \$i = \$cache_lim; \$found < \$lim; \$i++) {
next if \$i =~ /0/;
\$prod = \$prod[\$i % \$cache_lim]*\$prod[int(\$i / \$cache_lim)];
next if \$prod < \$fnum; # Prune
\$cnt = 1;
while (\$prod >= \$cache_lim) {
next LOOP if \$prod =~ /0/;
\$prod = \$prod[\$prod % \$cache_lim]*\$prod[int(\$prod / \$cache_lim)];
++\$cnt;
}
\$cnt += \$perc[\$prod];
report_size(\$i, \$cnt) if \$cnt > \$found;
}

M

#### Marc Girod

On my machine what is below is almost an order of magnitude better.

Just want to acknowledge.
Thanks.
Marc

M

#### Martin Str|mberg

Thank you for an interesting problem and solutions. However nobody has
so far mentioned the Memoize perl module, which I guess would help
speed up the original solution. But perhaps not the pruning ones.