B
bearophileHUGS
This post is not about practical stuff, so if you have little time,
you may ignore it.
This is a task of the rosettacode.org site:
http://www.rosettacode.org/wiki/Non_Continuous_Subsequences
A subsequence contains some subset of the elements of this sequence,
in the same order. A continuous subsequence is one in which no
elements are missing between the first and last elements of the
subsequence. The task is to enumerate all non-continuous subsequences
for a given sequence.
Translating the Scheme code to Python was easy (and I think this is
quite more readable than the Scheme version):
def ncsub(seq, s=0):
if seq:
x = seq[:1]
xs = seq[1:]
p2 = s % 2
p1 = not p2
return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
p2)
else:
return [[]] if s >= 3 else []
Output:
3, 4], [1, 3, 5], [1, 3], [1, 4, 5], [1, 4], [1, 5], [2, 3, 5], [2, 4,
5], [2, 4], [2, 5], [3, 5]]
To test its speed I use this:
from sys import argv
def ncsub(seq, s=0):
if seq:
x = seq[:1]
xs = seq[1:]
p2 = s % 2
p1 = not p2
return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
p2)
else:
return [[]] if s >= 3 else []
import psyco; psyco.bind(ncsub)
print len( ncsub(range(1, int(argv[1]))) )
On a 2 GHz CPU it needs 6.4s with n=20 (the output contains 524_097
sublists!), and 7.8s without Psyco, so I think the speed isn't bad.
With the libs I have written for D, translating the Python code is not
difficult (with n=20 it runs in 3.72s with the last DMD compiler):
import d.all;
T[][] ncsub(T)(T[] seq, int s=0) {
if (seq.length) {
T[] xs = seq[1..$];
int p2 = s % 2;
int p1 = !p2;
return map((T[] ys){return seq[0]~ys;}, ncsub(xs, s+p1)) ~
ncsub(xs, s+p2);
} else
return s >= 3 ? [DA!(T)] : null;
}
void main() {
foreach (m; xrange(4, 7))
putr(ncsub(range(1, m)));
}
But with normal D the program is short enough anyway (but a slower to
run (4.7s with n=20) and faster to compile, about 0.1s with full
optimizations and about 0.07s without):
import std.stdio: writefln;
T[][] ncsub(T)(T[] seq, int s=0) {
if (seq.length) {
T[][] aux;
foreach (ys; ncsub(seq[1..$], s + !(s % 2)))
aux ~= seq[0] ~ ys;
return aux ~ ncsub(seq[1..$], s + s % 2);
} else
return s >= 3 ? [new T[0]] : null;
}
void main() {
writefln(ncsub([1, 2, 3]));
writefln(ncsub([1, 2, 3, 4]));
writefln(ncsub([1, 2, 3, 4, 5]));
}
The Scheme version is eager, it comes from the first Haskell version,
that I think is lazy.
In Python the management of lazy iterables feels almost bolted-on
compared to Haskell, for example in Haskell lazy iterables don't
exhaust like in Python (because they are immutable), and while you
create a lazy list you can refer to the items already created.
But in many cases you can create a lazy code in Python too, even if
that may be harder. So I have tried to create a lazy version for
Python, hoping to speed up the code avoiding the creation and joining
of most/all sublists, but I have failed so far (once I have created a
lazy Python version, I can probably create a short lazy version in D
too, my D libs contain most of itertools module too).
In my failed attempts I have used chain() to join the sublists,
islice() to slice their items, and iter() to make the management more
uniform when the input seq is a Python list instead of an xrange, etc.
The:
if seq:
can be replaced by something like:
try:
x = seq2.next()
except StopIteration:
...
else:
...
If you have some time to waste you may suggest me how to write a lazy
version in Python
Bye,
bearophile
you may ignore it.
This is a task of the rosettacode.org site:
http://www.rosettacode.org/wiki/Non_Continuous_Subsequences
A subsequence contains some subset of the elements of this sequence,
in the same order. A continuous subsequence is one in which no
elements are missing between the first and last elements of the
subsequence. The task is to enumerate all non-continuous subsequences
for a given sequence.
Translating the Scheme code to Python was easy (and I think this is
quite more readable than the Scheme version):
def ncsub(seq, s=0):
if seq:
x = seq[:1]
xs = seq[1:]
p2 = s % 2
p1 = not p2
return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
p2)
else:
return [[]] if s >= 3 else []
Output:
[[1, 2, 3, 5], [1, 2, 4, 5], [1, 2, 4], [1, 2, 5], [1, 3, 4, 5], [1,ncsub(range(1, 4)) [[1, 3]]
ncsub(range(1, 5)) [[1, 2, 4], [1, 3, 4], [1, 3], [1, 4], [2, 4]]
ncsub(range(1, 6))
3, 4], [1, 3, 5], [1, 3], [1, 4, 5], [1, 4], [1, 5], [2, 3, 5], [2, 4,
5], [2, 4], [2, 5], [3, 5]]
To test its speed I use this:
from sys import argv
def ncsub(seq, s=0):
if seq:
x = seq[:1]
xs = seq[1:]
p2 = s % 2
p1 = not p2
return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
p2)
else:
return [[]] if s >= 3 else []
import psyco; psyco.bind(ncsub)
print len( ncsub(range(1, int(argv[1]))) )
On a 2 GHz CPU it needs 6.4s with n=20 (the output contains 524_097
sublists!), and 7.8s without Psyco, so I think the speed isn't bad.
With the libs I have written for D, translating the Python code is not
difficult (with n=20 it runs in 3.72s with the last DMD compiler):
import d.all;
T[][] ncsub(T)(T[] seq, int s=0) {
if (seq.length) {
T[] xs = seq[1..$];
int p2 = s % 2;
int p1 = !p2;
return map((T[] ys){return seq[0]~ys;}, ncsub(xs, s+p1)) ~
ncsub(xs, s+p2);
} else
return s >= 3 ? [DA!(T)] : null;
}
void main() {
foreach (m; xrange(4, 7))
putr(ncsub(range(1, m)));
}
But with normal D the program is short enough anyway (but a slower to
run (4.7s with n=20) and faster to compile, about 0.1s with full
optimizations and about 0.07s without):
import std.stdio: writefln;
T[][] ncsub(T)(T[] seq, int s=0) {
if (seq.length) {
T[][] aux;
foreach (ys; ncsub(seq[1..$], s + !(s % 2)))
aux ~= seq[0] ~ ys;
return aux ~ ncsub(seq[1..$], s + s % 2);
} else
return s >= 3 ? [new T[0]] : null;
}
void main() {
writefln(ncsub([1, 2, 3]));
writefln(ncsub([1, 2, 3, 4]));
writefln(ncsub([1, 2, 3, 4, 5]));
}
The Scheme version is eager, it comes from the first Haskell version,
that I think is lazy.
In Python the management of lazy iterables feels almost bolted-on
compared to Haskell, for example in Haskell lazy iterables don't
exhaust like in Python (because they are immutable), and while you
create a lazy list you can refer to the items already created.
But in many cases you can create a lazy code in Python too, even if
that may be harder. So I have tried to create a lazy version for
Python, hoping to speed up the code avoiding the creation and joining
of most/all sublists, but I have failed so far (once I have created a
lazy Python version, I can probably create a short lazy version in D
too, my D libs contain most of itertools module too).
In my failed attempts I have used chain() to join the sublists,
islice() to slice their items, and iter() to make the management more
uniform when the input seq is a Python list instead of an xrange, etc.
The:
if seq:
can be replaced by something like:
try:
x = seq2.next()
except StopIteration:
...
else:
...
If you have some time to waste you may suggest me how to write a lazy
version in Python
Bye,
bearophile