Non Continuous Subsequences

Discussion in 'Python' started by bearophileHUGS@lycos.com, Jul 30, 2008.

  1. Guest

    This post is not about practical stuff, so if you have little time,
    you may ignore it.

    This is a task of the rosettacode.org site:
    http://www.rosettacode.org/wiki/Non_Continuous_Subsequences

    A subsequence contains some subset of the elements of this sequence,
    in the same order. A continuous subsequence is one in which no
    elements are missing between the first and last elements of the
    subsequence. The task is to enumerate all non-continuous subsequences
    for a given sequence.

    Translating the Scheme code to Python was easy (and I think this is
    quite more readable than the Scheme version):

    def ncsub(seq, s=0):
    if seq:
    x = seq[:1]
    xs = seq[1:]
    p2 = s % 2
    p1 = not p2
    return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
    p2)
    else:
    return [[]] if s >= 3 else []

    Output:

    >>> ncsub(range(1, 4))

    [[1, 3]]
    >>> ncsub(range(1, 5))

    [[1, 2, 4], [1, 3, 4], [1, 3], [1, 4], [2, 4]]
    >>> ncsub(range(1, 6))

    [[1, 2, 3, 5], [1, 2, 4, 5], [1, 2, 4], [1, 2, 5], [1, 3, 4, 5], [1,
    3, 4], [1, 3, 5], [1, 3], [1, 4, 5], [1, 4], [1, 5], [2, 3, 5], [2, 4,
    5], [2, 4], [2, 5], [3, 5]]


    To test its speed I use this:

    from sys import argv

    def ncsub(seq, s=0):
    if seq:
    x = seq[:1]
    xs = seq[1:]
    p2 = s % 2
    p1 = not p2
    return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
    p2)
    else:
    return [[]] if s >= 3 else []

    import psyco; psyco.bind(ncsub)
    print len( ncsub(range(1, int(argv[1]))) )


    On a 2 GHz CPU it needs 6.4s with n=20 (the output contains 524_097
    sublists!), and 7.8s without Psyco, so I think the speed isn't bad.

    With the libs I have written for D, translating the Python code is not
    difficult (with n=20 it runs in 3.72s with the last DMD compiler):

    import d.all;

    T[][] ncsub(T)(T[] seq, int s=0) {
    if (seq.length) {
    T[] xs = seq[1..$];
    int p2 = s % 2;
    int p1 = !p2;
    return map((T[] ys){return seq[0]~ys;}, ncsub(xs, s+p1)) ~
    ncsub(xs, s+p2);
    } else
    return s >= 3 ? [DA!(T)] : null;
    }

    void main() {
    foreach (m; xrange(4, 7))
    putr(ncsub(range(1, m)));
    }


    But with normal D the program is short enough anyway (but a slower to
    run (4.7s with n=20) and faster to compile, about 0.1s with full
    optimizations and about 0.07s without):

    import std.stdio: writefln;

    T[][] ncsub(T)(T[] seq, int s=0) {
    if (seq.length) {
    T[][] aux;
    foreach (ys; ncsub(seq[1..$], s + !(s % 2)))
    aux ~= seq[0] ~ ys;
    return aux ~ ncsub(seq[1..$], s + s % 2);
    } else
    return s >= 3 ? [new T[0]] : null;
    }

    void main() {
    writefln(ncsub([1, 2, 3]));
    writefln(ncsub([1, 2, 3, 4]));
    writefln(ncsub([1, 2, 3, 4, 5]));
    }


    The Scheme version is eager, it comes from the first Haskell version,
    that I think is lazy.

    In Python the management of lazy iterables feels almost bolted-on
    compared to Haskell, for example in Haskell lazy iterables don't
    exhaust like in Python (because they are immutable), and while you
    create a lazy list you can refer to the items already created.

    But in many cases you can create a lazy code in Python too, even if
    that may be harder. So I have tried to create a lazy version for
    Python, hoping to speed up the code avoiding the creation and joining
    of most/all sublists, but I have failed so far (once I have created a
    lazy Python version, I can probably create a short lazy version in D
    too, my D libs contain most of itertools module too).

    In my failed attempts I have used chain() to join the sublists,
    islice() to slice their items, and iter() to make the management more
    uniform when the input seq is a Python list instead of an xrange, etc.

    The:

    if seq:

    can be replaced by something like:

    try:
    x = seq2.next()
    except StopIteration:
    ...
    else:
    ...

    If you have some time to waste you may suggest me how to write a lazy
    version in Python :)

    Bye,
    bearophile
     
    , Jul 30, 2008
    #1
    1. Advertising

  2. On Wed, 30 Jul 2008 09:32:25 -0700, bearophileHUGS wrote:

    > This post is not about practical stuff, so if you have little time,
    > you may ignore it.
    >
    > This is a task of the rosettacode.org site:
    > http://www.rosettacode.org/wiki/Non_Continuous_Subsequences
    >
    > A subsequence contains some subset of the elements of this sequence,
    > in the same order. A continuous subsequence is one in which no
    > elements are missing between the first and last elements of the
    > subsequence. The task is to enumerate all non-continuous subsequences
    > for a given sequence.
    >
    > Translating the Scheme code to Python was easy (and I think this is
    > quite more readable than the Scheme version):
    >
    > def ncsub(seq, s=0):
    > if seq:
    > x = seq[:1]
    > xs = seq[1:]
    > p2 = s % 2
    > p1 = not p2
    > return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
    > p2)
    > else:
    > return [[]] if s >= 3 else []
    >
    > Output:
    >
    >>>> ncsub(range(1, 4))

    > [[1, 3]]
    >>>> ncsub(range(1, 5))

    > [[1, 2, 4], [1, 3, 4], [1, 3], [1, 4], [2, 4]]
    >>>> ncsub(range(1, 6))

    > [[1, 2, 3, 5], [1, 2, 4, 5], [1, 2, 4], [1, 2, 5], [1, 3, 4, 5], [1,
    > 3, 4], [1, 3, 5], [1, 3], [1, 4, 5], [1, 4], [1, 5], [2, 3, 5], [2, 4,
    > 5], [2, 4], [2, 5], [3, 5]]
    >
    > <snip>
    >
    > But in many cases you can create a lazy code in Python too, even if
    > that may be harder. So I have tried to create a lazy version for
    > Python, hoping to speed up the code avoiding the creation and joining
    > of most/all sublists, but I have failed so far (once I have created a
    > lazy Python version, I can probably create a short lazy version in D
    > too, my D libs contain most of itertools module too).
    >
    > In my failed attempts I have used chain() to join the sublists,
    > islice() to slice their items, and iter() to make the management more
    > uniform when the input seq is a Python list instead of an xrange, etc.


    Try this:

    import itertools

    def ncsub(seq, s = 0):
    if seq:
    x = seq[:1]
    xs = seq[1:]
    p2 = s % 2
    p1 = 1 - p2
    return itertools.chain(
    itertools.imap(lambda ys: x + ys, ncsub(xs, s + p1)),
    ncsub(xs, s + p2))
    else:
    return [[]] if s >= 3 else []

    >>> ncsub(range(1, 4))

    <itertools.chain object at 0xb7de528c>
    >>> list(ncsub(range(1, 4)))

    [[1, 3]]
    >>> list(ncsub(range(1, 5)))

    [[1, 2, 4], [1, 3, 4], [1, 3], [1, 4], [2, 4]]
    >>> list(ncsub(range(1, 6)))

    [[1, 2, 3, 5], [1, 2, 4, 5], [1, 2, 4], [1, 2, 5], [1, 3, 4, 5], [1,
    3, 4], [1, 3, 5], [1, 3], [1, 4, 5], [1, 4], [1, 5], [2, 3, 5], [2, 4,
    5], [2, 4], [2, 5], [3, 5]]
     
    Bruce Frederiksen, Jul 31, 2008
    #2
    1. Advertising

  3. Guest

    Bruce Frederiksen:

    Your solution is a bit different from what I was thinking about (I was
    thinking about a generator function, with yield), but it works.

    This line:
    > return itertools.chain(
    > itertools.imap(lambda ys: x + ys, ncsub(xs, s + p1)),
    > ncsub(xs, s + p2))


    Can be written in a simpler way:
    return chain((x + ys for ys in ncsub(xs, s + p1)), ncsub(xs, s + p2))

    I'll think more about all this,
    bye and thank you,
    bearophile
     
    , Jul 31, 2008
    #3
  4. Mensanator Guest

    On Jul 30, 11:32 am, wrote:
    > This post is not about practical stuff, so if you have little time,
    > you may ignore it.
    >
    > This is a task of the rosettacode.org site:http://www.rosettacode.org/wiki/Non_Continuous_Subsequences
    >
    > A subsequence contains some subset of the elements of this sequence,
    > in the same order. A continuous subsequence is one in which no
    > elements are missing between the first and last elements of the
    > subsequence. The task is to enumerate all non-continuous subsequences
    > for a given sequence.
    >


    That's equivalent to asking which n-bit binary numbers have
    at least one 0 bracketed by 1s, isn't it?

    import gmpy
    import re

    for i in xrange(32):
    s = gmpy.digits(i,2).zfill(5) # convert to base 2 (padded)
    m = re.search('10+1',s) # at least one 0 between 1s
    if m:
    z = [j+1 for j,i in enumerate(s) if i=='1']
    print z

    ## [3, 5]
    ## [2, 5]
    ## [2, 4]
    ## [2, 4, 5]
    ## [2, 3, 5]
    ## [1, 5]
    ## [1, 4]
    ## [1, 4, 5]
    ## [1, 3]
    ## [1, 3, 5]
    ## [1, 3, 4]
    ## [1, 3, 4, 5]
    ## [1, 2, 5]
    ## [1, 2, 4]
    ## [1, 2, 4, 5]
    ## [1, 2, 3, 5]
     
    Mensanator, Jul 31, 2008
    #4
  5. Kay Schluehr Guest

    Here is an evil imperative, non-recursive generator:

    def ncsub(seq):
    n = len(seq)
    R = xrange(n+1)
    for i in xrange(1,2**n):
    S = []
    nc = False
    for j in R:
    k = i>>j
    if k == 0:
    if nc:
    yield S
    break
    elif k%2 == 1:
    S.append(seq[j])
    elif S:
    nc = True

    And here is a cython version without yield expression which won't work
    with cython:

    def ncsub(seq):
    n = len(seq)
    cdef int i,j,k,nc
    res = []
    R = xrange(n+1)
    for i in xrange(1,2**n):
    S = []
    nc = False
    for j in R:
    k = i>>j
    if k == 0:
    if nc:
    res.append(S)
    break
    elif k%2 == 1:
    S.append(seq[j])
    elif S:
    nc = True
    return res


    Application: list(ncsub(range(19)))

    On my 1.5GHz notebook your original version needed 11.2s ( psyco
    version = 9.5s), my pure Python generator got it in 7.75s and the
    cython version in 2.3s.

    [Yes, I have too much time...]
     
    Kay Schluehr, Jul 31, 2008
    #5
  6. Guest

    Kay Schluehr:

    >[Yes, I have too much time...]


    Thank you for your code. If you allow me to, I may put some code
    derived from yours back into the rosettacode.org site.


    >Here is an evil imperative, non-recursive generator:


    I like imperative, non-recursive code :)

    If you count the number of items of the results, you find the sequence
    A002662:
    www.research.att.com/~njas/sequences/A002662
    The series grows as:
    lambda n: sum(C(n, k) for k in xrange(3, n+1))

    I have modified your code a little, here and there, so I can show it
    again. Here are some timings.

    Note:
    n = 19 => result = 261_972 subs
    n = 21 => result = 1_048_365 subs

    Timings, seconds, best of 3:
    N=19 N=21
    v1: 2.51 17.88
    v2: 0.63 3.58
    v3: 2.47 10.65
    v4: 0.61 3.55
    v5: 0.84 5.45
    v6: 0.64 2.67
    v7: 0.58 3.07
    v8: 0.44 1.83
    v9: 0.07 0.21
    v10: 2.22 9.58

    v9 computes the 67_108_512 subs of n=27 in 14.54 s.

    The versions:
    v1) original eager Python+Psyco version
    v2) eager Python+Psyco version
    v3) lazy Python version
    v4) eager Python+Psyco version plus optimization
    v5) eager D version with my libs
    v6) lazy D version with my libs
    v7) eager D version without my libs plus optimization
    v8) lazy D version without my libs
    v9) lazy D version without my libs, no allocations
    10) lazy Python version, no allocations

    Used DMD compiler V.1.033 with:
    -O -release -inline
    Python 2.5.2, Psyco 1.5.2.

    Some comments:
    - The current Python GC manages memory more efficienty than the
    current default D GC. This is a known issue, D is a much younger
    language, and there are various different situations (not just
    regarding its GC) where it's slower (for example regarding I/O,
    associative arrays, etc).
    - Being compiled, and having a very light iteration protocol, D is
    much faster in the lazy version. Avoiding to allocate new memory
    reduces total time a lot.
    - In D when n is small the eager version is faster, but things change
    with n is large, because this time the allocation time and the cache
    misses start to dominate.
    - The Python version doesn't improve much when you know much long the
    result is, while the D version improves significantly. This because
    the list append in Python is much more efficient than in D. Python is
    a much more refined language, and despite D supposed to be a fast
    compiled "to the metal" system language, it's often not faster. For
    example Python dicts are often faster than D AAs. D is currently in
    quick development so its data structures and many details aren't
    optimized. Python is much older and its data structures are well
    optized, lot of smart people has made them fast.
    - If you avoid memory allocation in D you can generally go quite fast.
    - Comparing different languages like this is useful for D, because
    it's young, and doing such comparisons I have found many performance
    problems in it, sometimes they have even being discussed, understood,
    addressed.
    - In D another possible output is an uint (integer not signed always
    32 bits), where the bits = 1 are the items in the current subset. This
    may be fast enough.

    Bye,
    bearophile

    -------------------------

    THE CODE:

    # 1) original eager Python+Psyco version

    from sys import argv

    def ncsub(seq, s=0):
    if seq:
    x = seq[:1]
    xs = seq[1:]
    p2 = s % 2
    p1 = not p2
    return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
    p2)
    else:
    return [[]] if s >= 3 else []

    import psyco; psyco.bind(ncsub)
    n = 10 if len(argv) < 2 else int(argv[1])
    print len( ncsub(range(1, n)) )

    -------------------------

    # 2) eager Python+Psyco version

    from sys import argv

    def ncsub(seq):
    n = len(seq)
    res = []
    for i in xrange(1, 2 ** n):
    S = []
    nc = False
    for j in xrange(n + 1):
    k = i >> j
    if k == 0:
    if nc:
    res.append(S)
    break
    elif k % 2:
    S.append(seq[j])
    elif S:
    nc = True
    return res

    import psyco; psyco.bind(ncsub)
    n = 10 if len(argv) < 2 else int(argv[1])
    print len( ncsub(range(1, n)) )

    -------------------------

    # 3) lazy Python version
    # Psyco not used because makes it slower

    from sys import argv

    def leniter(iterator):
    nelements = 0
    for _ in iterator:
    nelements += 1
    return nelements

    def ncsub(seq):
    n = len(seq)
    for i in xrange(1, 2 ** n):
    S = []
    nc = False
    for j in xrange(n + 1):
    k = i >> j
    if k == 0:
    if nc:
    yield S
    break
    elif k % 2:
    S.append(seq[j])
    elif S:
    nc = True

    n = 10 if len(argv) < 2 else int(argv[1])
    print leniter( ncsub(range(1, n)) )

    -------------------------

    # 4) eager Python+Psyco version plus optimization

    def C(n, k):
    result = 1
    for d in xrange(1, k+1):
    result *= n
    n -= 1
    result /= d
    return result

    # www.research.att.com/~njas/sequences/A002662
    nsubs = lambda n: sum(C(n, k) for k in xrange(3, n+1))

    def ncsub(seq):
    n = len(seq)
    result = [None] * nsubs(n)
    pos = 0

    for i in xrange(1, 2 ** n):
    S = []
    nc = False
    for j in xrange(n + 1):
    k = i >> j
    if k == 0:
    if nc:
    result[pos] = S
    pos += 1
    break
    elif k % 2:
    S.append(seq[j])
    elif S:
    nc = True
    return result

    from sys import argv
    import psyco; psyco.full()

    n = 10 if len(argv) < 2 else int(argv[1])
    print len( ncsub(range(1, n)) )

    -------------------------

    // 5) eager D version with my libs
    // all the D code is generic (templated)

    import d.all, std.conv;

    T[][] ncsub(T)(T[] seq) {
    int n = len(seq);
    T[][] result;
    auto R = xrange(n + 1);
    foreach (i; xrange(1, 1 << n)) {
    T[] S;
    bool nc = false;
    foreach (j; R) {
    int k = i >> j;
    if (k == 0) {
    if (nc)
    result ~= S;
    break;
    } else if (k % 2)
    S ~= seq[j];
    else if (S.length)
    nc = true;
    }
    }
    return result;
    }

    void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    putr( len( ncsub(range(1, n)) ) );
    }

    -------------------------

    // 6) lazy D version with my libs

    import d.all, std.conv;

    struct Ncsub(T) {
    T[] seq;
    void generator() {
    int n = len(seq);
    foreach (i; xrange(1, 1 << n)) {
    T[] S;
    bool nc = false;
    foreach (j; xrange(n + 1)) {
    int k = i >> j;
    if (k == 0) {
    if (nc)
    yield(S);
    break;
    } else if (k % 2)
    S ~= seq[j];
    else if (S.length)
    nc = true;
    }
    }
    }
    mixin Generator!(T[]);
    }

    void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    putr( len( Ncsub!(int)(range(1, n)) ) );
    }

    -------------------------

    // 7) eager D version without my libs plus optimization

    import std.stdio: putr = writefln;
    import std.conv: toInt;

    long C(long n, long k) {
    long result = 1;
    for (long d = 1; d < k+1; d++) {
    result *= n;
    n--;
    result /= d;
    }
    return result;
    }

    long nsubs(long n) {
    // www.research.att.com/~njas/sequences/A002662
    long tot = 0;
    for (int k = 3; k <= n; k++)
    tot += C(n, k);
    return tot;
    }

    T[][] ncsub(T)(T[] seq) {
    auto result = new T[][nsubs(seq.length)];
    int pos;
    for (int i = 1; i < (1 << seq.length); i++) {
    T[] S;
    bool nc = false;
    for (int j; j < seq.length + 1; j++) {
    int k = i >> j;
    if (k == 0) {
    if (nc)
    result[pos++] = S;
    break;
    } else if (k % 2)
    S ~= seq[j];
    else if (S.length)
    nc = true;
    }
    }
    return result;
    }

    void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;

    auto range = new int[n - 1];
    foreach (i, ref el; range)
    el = i + 1;

    putr( ncsub(range).length );
    }

    -------------------------

    // 8) lazy D version without my libs

    import std.conv: toInt;
    import std.stdio: putr = writefln;

    struct Ncsub(T) {
    T[] seq;

    int opApply(int delegate(ref int[]) dg) {
    int result, n = seq.length;

    OUTER:
    for (int i = 1; i < (1 << seq.length); i++) {
    T[] S;
    bool nc = false;
    for (int j; j < seq.length + 1; j++) {
    int k = i >> j;
    if (k == 0) {
    if (nc) {
    result = dg(S);
    if (result)
    break OUTER;
    }
    break;
    } else if (k % 2)
    S ~= seq[j];
    else if (S.length)
    nc = true;
    }
    }

    return result;
    }
    }

    void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;

    auto range = new int[n - 1];
    foreach (i, ref el; range)
    el = i + 1;

    int count;
    foreach (sub; Ncsub!(int)(range))
    count++;
    putr(count);
    }

    -------------------------

    // 9) lazy D version without my libs, no allocations
    // the slicing S[0..len_S] doesn't copy memory

    import std.conv: toInt;
    import std.stdio: putr = writefln;

    struct Ncsub(T) {
    T[] seq;

    int opApply(int delegate(ref int[]) dg) {
    int result, n = seq.length;
    auto S = new int[n];

    OUTER:
    for (int i = 1; i < (1 << seq.length); i++) {
    int len_S;
    bool nc = false;
    for (int j; j < seq.length + 1; j++) {
    int k = i >> j;
    if (k == 0) {
    if (nc) {
    T[] auxS = S[0 .. len_S];
    result = dg(auxS);
    if (result)
    break OUTER;
    }
    break;
    } else if (k % 2)
    S[len_S++] = seq[j];
    else if (len_S)
    nc = true;
    }
    }

    return result;
    }
    }

    void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;

    auto range = new int[n - 1];
    foreach (i, ref el; range)
    el = i + 1;

    int count;
    foreach (sub; Ncsub!(int)(range))
    count++;
    putr(count);
    }

    -------------------------

    # 10) lazy Python version, no allocations

    from sys import argv

    def leniter(iterator):
    nelements = 0
    for _ in iterator:
    nelements += 1
    return nelements

    def ncsub(seq, S):
    n = len(seq)
    for i in xrange(1, 2 ** n):
    lenS = 0
    nc = False
    for j in xrange(n + 1):
    k = i >> j
    if k == 0:
    if nc:
    yield lenS
    break
    elif k % 2:
    S[lenS] = seq[j]
    lenS += 1
    elif lenS:
    nc = True

    n = 10 if len(argv) < 2 else int(argv[1])
    s = [None] * (n-1)
    print leniter( ncsub(range(1, n), s) )

    -------------------------
     
    , Aug 1, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    13
    Views:
    2,375
    Thomas Weidenfeller
    Jan 16, 2006
  2. Steven Bethard

    grouping subsequences with BIO tags

    Steven Bethard, Apr 21, 2005, in forum: Python
    Replies:
    6
    Views:
    323
    Bengt Richter
    Apr 23, 2005
  3. Matthew Wilson
    Replies:
    4
    Views:
    382
    Tim Chase
    Sep 29, 2006
  4. candide
    Replies:
    6
    Views:
    238
    candide
    Apr 1, 2011
  5. Eli Bendersky

    subsequences

    Eli Bendersky, Apr 17, 2006, in forum: Ruby
    Replies:
    14
    Views:
    238
    Daniel Schierbeck
    Apr 19, 2006
Loading...

Share This Page