code for Computer Language Shootout

Discussion in 'Python' started by Jacob Lee, Mar 16, 2005.

  1. Jacob Lee

    Jacob Lee Guest

    There are a bunch of new tests up at shootout.alioth.debian.org for which
    Python does not yet have code. I've taken a crack at one of them, a task
    to print the reverse complement of a gene transcription. Since there are a
    lot of minds on this newsgroup that are much better at optimization than
    I, I'm posting the code I came up with to see if anyone sees any
    opportunities for substantial improvement. Without further ado:

    table = string.maketrans('ACBDGHK\nMNSRUTWVY', 'TGVHCDM\nKNSYAAWBR')

    def show(s):
    i = 0
    for char in s.upper().translate(table)[::-1]:
    if i == 60:
    print
    i = 0
    sys.stdout.write(char)
    i += 1
    print

    def main():
    seq = ''
    for line in sys.stdin:
    if line[0] == '>' or line[0] == ';':
    if seq != '':
    show(seq)
    seq = ''
    print line,
    else:
    seq += line[:-1]
    show(seq)

    main()


    Making seq into a list instead of a string (and using .extend instead of
    the + operator) didn't give any speed improvements. Neither did using a
    dictionary instead of the translate function, or using reversed() instead
    of s[::-1]. The latter surprised me, since I would have guessed using an
    iterator to be more efficient. Since the shootout also tests memory usage,
    should I be using reversed for that reason? Does anyone have any other
    ideas to optimize this code?

    By the way - is there a good way to find out the maximum memory a program
    used (in the manner of the "time" command)? Other than downloading and
    running the shootout benchmark scripts, of course.

    --
    Jacob Lee
    | www.nearestneighbor.net
    Jacob Lee, Mar 16, 2005
    #1
    1. Advertising

  2. Jacob Lee

    Robert Kern Guest

    Jacob Lee wrote:

    > By the way - is there a good way to find out the maximum memory a program
    > used (in the manner of the "time" command)? Other than downloading and
    > running the shootout benchmark scripts, of course.


    Inserting appropriate pauses with raw_input() and recording the memory
    usage using top(1) is a quick and dirty way.

    --
    Robert Kern


    "In the fields of hell where the grass grows high
    Are the graves of dreams allowed to die."
    -- Richard Harter
    Robert Kern, Mar 16, 2005
    #2
    1. Advertising

  3. Jacob Lee wrote:
    > There are a bunch of new tests up at shootout.alioth.debian.org for which
    > Python does not yet have code. I've taken a crack at one of them, a task
    > to print the reverse complement of a gene transcription. Since there are a
    > lot of minds on this newsgroup that are much better at optimization than
    > I, I'm posting the code I came up with to see if anyone sees any
    > opportunities for substantial improvement. Without further ado:
    >
    > table = string.maketrans('ACBDGHK\nMNSRUTWVY', 'TGVHCDM\nKNSYAAWBR')
    >

    string.translate is a good idea. Note you can handle upper-casing and
    translation in one operation by adding a mapping from lower case to the
    complement i.e.,

    table = string.maketrans('ACBDGHK\nMNSRUTWVYacbdghkmnsrutwvy',
    'TGVHCDM\nKNSYAAWBRTGVHCDMKNSYAAWBR')

    > def show(s):
    > i = 0
    > for char in s.upper().translate(table)[::-1]:



    > if i == 60:
    > print
    > i = 0
    > sys.stdout.write(char)
    > i += 1
    > print



    > def main():
    > seq = ''
    > for line in sys.stdin:
    > if line[0] == '>' or line[0] == ';':
    > if seq != '':
    > show(seq)
    > seq = ''
    > print line,
    > else:
    > seq += line[:-1]
    > show(seq)
    >
    > main()
    >


    This looks unwieldy - especially writing to sys.stdout oe character at a time.
    I may not have understood the spec exactly, but isn't this the same as:

    for line in sys.stdin:
    if line and (line[0] == ">" or line[0] == ";"):
    print line
    else:
    print line.translate(table)

    HTH

    Michael
    Michael Spencer, Mar 16, 2005
    #3
  4. Jacob Lee

    Robert Kern Guest

    Here's my solution to the problem[1]:

    [1] http://shootout.alioth.debian.org/benchmark.php?test=revcomp

    import sys
    import string

    basetable = string.maketrans('ACBDGHKMNSRUTWVYacbdghkmnsrutwvy',
    'TGVHCDMKNSYAAWBRTGVHCDMKNSYAAWBR')
    def revcomp(seqlines, linelength=60, basetable=basetable):
    seq = ''.join(seqlines)
    complement = seq.translate(basetable)
    revcomplement = complement[::-1]
    for i in xrange(0, len(revcomplement), linelength):
    print revcomplement[i:i+linelength]

    def main():
    seqlines = []
    for line in sys.stdin:
    if line.startswith('>') or line.startswith(';'):
    if seqlines:
    revcomp(seqlines)
    sys.stdout.write(line)
    seqlines = []
    else:
    seqlines.append(line.strip())
    revcomp(seqlines)

    if __name__ == '__main__':
    main()

    --
    Robert Kern


    "In the fields of hell where the grass grows high
    Are the graves of dreams allowed to die."
    -- Richard Harter
    Robert Kern, Mar 16, 2005
    #4
  5. Jacob Lee wrote:
    > There are a bunch of new tests up at shootout.alioth.debian.org for which
    > Python does not yet have code. I've taken a crack at one of them, a task
    > to print the reverse complement of a gene transcription. Since there are a
    > lot of minds on this newsgroup that are much better at optimization than
    > I, I'm posting the code I came up with to see if anyone sees any
    > opportunities for substantial improvement. Without further ado:
    >
    > table = string.maketrans('ACBDGHK\nMNSRUTWVY', 'TGVHCDM\nKNSYAAWBR')
    >
    > def show(s):
    > i = 0
    > for char in s.upper().translate(table)[::-1]:
    > if i == 60:
    > print
    > i = 0
    > sys.stdout.write(char)
    > i += 1
    > print
    >
    > def main():
    > seq = ''
    > for line in sys.stdin:
    > if line[0] == '>' or line[0] == ';':
    > if seq != '':
    > show(seq)
    > seq = ''
    > print line,
    > else:
    > seq += line[:-1]
    > show(seq)
    >
    > main()


    Don't know if this is faster for your data, but I think you could also
    write this as (untested):

    # table as default argument value so you don't have to do
    # a global lookup each time it's used

    def show(seq, table=string.maketrans('ACBDGHK\nMNSRUTWVY',
    'TGVHCDM\nKNSYAAWBR')
    seq = seq.upper().translate(table)[::-1]
    # print string in slices of length 60
    for i in range(0, len(seq), 60):
    print seq[i:i+60]

    def main():
    seq = []
    # alias methods to avoid repeated lookup
    join = ''.join
    append = seq.append
    for line in sys.stdin:
    # note only one "line[0]" by using "in" test
    if line[0] in ';>':
    # note no need to check if seq is empty; show now prints
    # nothing for an empty string
    show(join(seq))
    print line,
    del seq[:]
    else:
    append(line[:-1])


    > Making seq into a list instead of a string (and using .extend instead of
    > the + operator) didn't give any speed improvements. Neither did using a
    > dictionary instead of the translate function, or using reversed() instead
    > of s[::-1]. The latter surprised me, since I would have guessed using an
    > iterator to be more efficient. Since the shootout also tests memory usage,
    > should I be using reversed for that reason?


    reversed() won't save you any memory -- you're already loading the
    entire string into memory anyway.


    Interesting tidbit:
    del seq[:]
    tests faster than
    seq = []

    $ python -m timeit -s "lst = range(1000)" "lst = []"
    10000000 loops, best of 3: 0.159 usec per loop

    $ python -m timeit -s "lst = range(1000)" "del lst[:]"
    10000000 loops, best of 3: 0.134 usec per loop

    It's probably the right way to go in this case anyway -- no need to
    create a new empty list each time.

    STeVe
    Steven Bethard, Mar 16, 2005
    #5
  6. Jacob Lee

    Jacob Lee Guest

    On Tue, 15 Mar 2005 21:38:48 -0800, Michael Spencer wrote:

    > string.translate is a good idea. Note you can handle upper-casing and
    > translation in one operation by adding a mapping from lower case to the
    > complement i.e.,
    >
    > table = string.maketrans('ACBDGHK\nMNSRUTWVYacbdghkmnsrutwvy',
    > 'TGVHCDM\nKNSYAAWBRTGVHCDMKNSYAAWBR')
    >

    Good call.

    > This looks unwieldy - especially writing to sys.stdout oe character at a
    > time. I may not have understood the spec exactly, but isn't this the
    > same as:
    >
    > for line in sys.stdin:
    > if line and (line[0] == ">" or line[0] == ";"):
    > print line
    > else:
    > print line.translate(table)
    >
    >

    That's the immediately obvious solution, but it doesn't actually fulfill
    the problem requirements. What if your last line is less than 60
    characters long? You no longer will be displaying the input in reverse
    order. Otherwise you'd be right - my solution would be unnecessarily
    unwieldy (and the problem would be much simpler...) .

    --
    Jacob Lee
    | www.nearestneighbor.net
    Jacob Lee, Mar 16, 2005
    #6
  7. Jacob Lee

    Jacob Lee Guest

    On Tue, 15 Mar 2005 22:45:48 -0700, Steven Bethard wrote:

    > # table as default argument value so you don't have to do
    > # a global lookup each time it's used
    >
    > def show(seq, table=string.maketrans('ACBDGHK\nMNSRUTWVY',
    > 'TGVHCDM\nKNSYAAWBR')
    > seq = seq.upper().translate(table)[::-1]
    > # print string in slices of length 60
    > for i in range(0, len(seq), 60):
    > print seq[i:i+60]
    >
    > def main():
    > seq = []
    > # alias methods to avoid repeated lookup
    > join = ''.join
    > append = seq.append
    > for line in sys.stdin:
    > # note only one "line[0]" by using "in" test
    > if line[0] in ';>':
    > # note no need to check if seq is empty; show now prints
    > # nothing for an empty string
    > show(join(seq))
    > print line,
    > del seq[:]
    > else:
    > append(line[:-1])
    >


    Wow - that ran about 10 times faster on a 10MB input file. The significant
    change was the for loop inside the show function: your method avoids the
    increment and if statement and of course has 60x fewer iterations total.

    > reversed() won't save you any memory -- you're already loading the
    > entire string into memory anyway.
    >
    >
    > Interesting tidbit:
    > del seq[:]
    > tests faster than
    > seq = []
    >
    > $ python -m timeit -s "lst = range(1000)" "lst = []"
    > 10000000 loops, best of 3: 0.159 usec per loop
    >
    > $ python -m timeit -s "lst = range(1000)" "del lst[:]"
    > 10000000 loops, best of 3: 0.134 usec per loop
    >
    > It's probably the right way to go in this case anyway -- no need to
    > create a new empty list each time.


    Fascinating - I hadn't thought about that.

    Besides redoing that loop, the remaining optimizations produce less than a
    10% speed-up; in particular, adding the function aliases increases the
    number of lines of code (another benchmark in the shootout), and I would
    imagine that the organizers don't really want that type of special
    optimization (no developer is going to write that in their programs
    unless they have really time-critical code, so this is the sort of hit
    that Python really should take in the contest as penalty for being so
    dynamically nifty ;)). So here's a tentative contest version of the code:

    import sys
    import string

    def show(seq, table=string.maketrans('ACBDGHK\nMNSRUTWVYacbdghkmnsrutwvy',
    'TGVHCDM\nKNSYAAWBRTGVHCDMKNSYAAWBR')):
    seq = seq.translate(table)[::-1]
    for i in range(0, len(seq), 60):
    print seq[i:i+60]

    def main():
    seq = []
    for line in sys.stdin:
    if line[0] in ';>':
    show(''.join(seq))
    print line,
    del seq[:]
    else:
    seq.append(line[:-1])
    show(''.join(seq))

    main()

    --
    Jacob Lee
    | www.nearestneighbor.net
    Jacob Lee, Mar 16, 2005
    #7
  8. Jacob Lee wrote:
    > So here's a tentative contest version of the code:
    >
    > import sys
    > import string
    >
    > def show(seq, table=string.maketrans('ACBDGHK\nMNSRUTWVYacbdghkmnsrutwvy',
    > 'TGVHCDM\nKNSYAAWBRTGVHCDMKNSYAAWBR')):
    > seq = seq.translate(table)[::-1]
    > for i in range(0, len(seq), 60):
    > print seq[i:i+60]
    >
    > def main():
    > seq = []
    > for line in sys.stdin:
    > if line[0] in ';>':
    > show(''.join(seq))
    > print line,
    > del seq[:]
    > else:
    > seq.append(line[:-1])
    > show(''.join(seq))
    >
    > main()


    Looks pretty good. (And yes, I definitely prefer the unaliased ''.join
    and seq.append for readability's sake. Glad to know they try to grade
    on that too.) =)

    Only one other suggestion: "range" in the show function should probably
    be "xrange". "range" is going to create an actual list of however many
    integers, while "xrange" will just create the integers as needed.
    "xrange" thus will be more memory friendly. (It's also occasionally
    faster, though this depends a lot on the rest of the code).

    STeVe
    Steven Bethard, Mar 16, 2005
    #8
  9. Jacob Lee wrote:
    > On Tue, 15 Mar 2005 21:38:48 -0800, Michael Spencer wrote:
    >
    >
    >>string.translate is a good idea. Note you can handle upper-casing and
    >>translation in one operation by adding a mapping from lower case to the
    >>complement i.e.,
    >>
    >>table = string.maketrans('ACBDGHK\nMNSRUTWVYacbdghkmnsrutwvy',
    >> 'TGVHCDM\nKNSYAAWBRTGVHCDMKNSYAAWBR')
    >>

    >
    > Good call.
    >
    >
    >>This looks unwieldy - especially writing to sys.stdout oe character at a
    >>time. I may not have understood the spec exactly, but isn't this the
    >>same as:
    >>
    >>for line in sys.stdin:
    >> if line and (line[0] == ">" or line[0] == ";"):
    >> print line
    >> else:
    >> print line.translate(table)
    >>
    >>

    >
    > That's the immediately obvious solution, but it doesn't actually fulfill
    > the problem requirements. What if your last line is less than 60
    > characters long? You no longer will be displaying the input in reverse
    > order. Otherwise you'd be right - my solution would be unnecessarily
    > unwieldy (and the problem would be much simpler...) .
    >

    yes it would be, wouldn't it ;-)

    How about this then:

    basetable = string.maketrans('ACBDGHKMNSRUTWVYacbdghkmnsrutwvy',
    'TGVHCDMKNSYAAWBRTGVHCDMKNSYAAWBR')

    from collections import deque
    from itertools import islice

    def output(seq, linelength = 60):
    if seq:
    iterseq = iter(seq)
    while iterseq:
    print "".join(islice(iterseq,linelength))

    def revcomp(input = sys.stdin):
    seqlines = deque()
    for line in input:
    if line[0] in ">;":
    output(seqlines)
    print line,
    seqlines.clear()
    else:
    seqlines.extendleft(line.translate(basetable, "\n\r"))
    output(seqlines)


    It would be nice to inline the output function, and re-arrange the iteration so
    that EOF triggers the same suite as line[0] in ">;"

    Michael
    Michael Spencer, Mar 16, 2005
    #9
  10. Jacob Lee

    F. Petitjean Guest

    Le Tue, 15 Mar 2005 23:21:02 -0800, Michael Spencer a écrit :
    > Jacob Lee wrote:
    >> On Tue, 15 Mar 2005 21:38:48 -0800, Michael Spencer wrote:
    >>
    >> Good call.
    >>
    >>

    >
    > How about this then:
    >
    > basetable = string.maketrans('ACBDGHKMNSRUTWVYacbdghkmnsrutwvy',
    > 'TGVHCDMKNSYAAWBRTGVHCDMKNSYAAWBR')
    >
    > from collections import deque
    > from itertools import islice
    >
    > def output(seq, linelength = 60):
    > if seq:
    > iterseq = iter(seq)
    > while iterseq:
    > print "".join(islice(iterseq,linelength))

    I suppose you mean :
    print "".join( str(item) for item in islice(iterseq, linelength) )
    # using python 2.4 genexp
    >
    > def revcomp(input = sys.stdin):
    > seqlines = deque()
    > for line in input:
    > if line[0] in ">;":
    > output(seqlines)
    > print line,
    > seqlines.clear()
    > else:
    > seqlines.extendleft(line.translate(basetable, "\n\r"))
    > output(seqlines)
    >
    >
    > It would be nice to inline the output function, and re-arrange the iteration so
    > that EOF triggers the same suite as line[0] in ">;"
    >
    > Michael
    >
    F. Petitjean, Mar 16, 2005
    #10
  11. Consider keeping the alias for append because it occurs in the
    innermost loop. For maximum readability, write: addline = seq.append

    Move the ''.join() to the show() function. That eliminates a little
    redundancy.

    The test dataset doesn't use the semi-colon comment field. So,
    consider reversing ';>' to '>;'.


    Raymond Hettinger
    Raymond Hettinger, Mar 16, 2005
    #11
  12. F. Petitjean wrote:
    > Le Tue, 15 Mar 2005 23:21:02 -0800, Michael Spencer a écrit :
    >


    >>
    >>def output(seq, linelength = 60):
    >> if seq:
    >> iterseq = iter(seq)
    >> while iterseq:
    >> print "".join(islice(iterseq,linelength))

    >
    > I suppose you mean :
    > print "".join( str(item) for item in islice(iterseq, linelength) )
    > # using python 2.4 genexp
    >

    No, as written, given the seq is a sequence of single character strings already
    (changing the signature might clarify that):

    def output(charseq, linelength = 60):
    if charseq:
    iterseq = iter(charseq)
    while iterseq:
    print "".join(islice(iterseq,linelength))

    >>> output("*" * 120, 40)

    ****************************************
    ****************************************
    ****************************************
    >>> output(['*'] * 120, 40)

    ****************************************
    ****************************************
    ****************************************
    >>>


    Cheers
    Michael
    Michael Spencer, Mar 16, 2005
    #12
  13. Michael Spencer wrote:
    > def output(seq, linelength = 60):
    > if seq:
    > iterseq = iter(seq)
    > while iterseq:
    > print "".join(islice(iterseq,linelength))


    Worth noting: "while iterseq" only works because for this case, you have
    a list iterator, which provides a __len__ method. This approach will
    not work when you have an iterator that does not provide a __len__
    method. For a nice infinite printing loop, try:

    def gen():
    yield '+'*100
    yield '*'*100

    output(gen())

    STeVe
    Steven Bethard, Mar 16, 2005
    #13
  14. Steven Bethard wrote:
    > Michael Spencer wrote:
    >
    >> def output(seq, linelength = 60):
    >> if seq:
    >> iterseq = iter(seq)
    >> while iterseq:
    >> print "".join(islice(iterseq,linelength))

    >
    >
    > Worth noting: "while iterseq" only works because for this case, you have
    > a list iterator, which provides a __len__ method.


    Thanks! I had noted that the file iterator didn't behave like this, but I hadn't
    deduced the reason. Unfortunately, the above construct, while cute, is also
    not terribly speedy.

    >>> print "\n".join(body[-index:-index-linelength:-1]

    ... for index in xrange(1, len(body), linelength))

    is ugly but much faster with an already-existing string

    So, my second attempt is:

    from itertools import groupby

    def revcomp2(input = sys.stdin, linelength = 60):
    basetable = string.maketrans('ACBDGHKMNSRUTWVYacbdghkmnsrutwvy',
    'TGVHCDMKNSYAAWBRTGVHCDMKNSYAAWBR')

    def record(item):
    return item[0] in ">;"

    for header, body in groupby(input, record):
    body = "".join(body)
    if header:
    print body,
    else:
    body = body.translate(basetable, "\n\r")
    print "\n".join(body[-index:-index-linelength:-1]
    for index in xrange(1, len(body), linelength))


    Michael
    Michael Spencer, Mar 16, 2005
    #14
  15. Jacob Lee

    Guest

    We've made it somewhat easier to contribute programs.

    No need to subscribe to the mailing-list.
    No need for a user-id or login.

    See the FAQ "How can I contribute a program?"
    http://shootout.alioth.debian.org/faq.php
    , Mar 29, 2005
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Delaney, Timothy C (Timothy)

    RE: code for Computer Language Shootout

    Delaney, Timothy C (Timothy), Mar 16, 2005, in forum: Python
    Replies:
    3
    Views:
    305
    Raymond Hettinger
    Mar 17, 2005
  2. Replies:
    14
    Views:
    498
    Fredrik Lundh
    Nov 30, 2005
  3. Phil Tomson
    Replies:
    25
    Views:
    300
    Gawnsoft
    Jun 20, 2004
  4. Isaac Gouy
    Replies:
    0
    Views:
    109
    Isaac Gouy
    Aug 3, 2004
  5. Replies:
    9
    Views:
    182
    Isaac Gouy
    Mar 29, 2005
Loading...

Share This Page