Performance issue

Discussion in 'Python' started by Tom Carrick, Apr 2, 2005.

  1. Tom Carrick

    Tom Carrick Guest

    Hi,

    In my attempted learning of python, I've decided to recode an old
    anagram solving program I made in C++. The C++ version runs in less
    than a second, while the python takes 30 seconds. I'm not willing to
    think it's just python being slow, so I was hoping someone could find
    a faster way of doing this. Also, I was wondering if there was a more
    builtin, or just nicer way of converting a string to a list (or using
    the sort function on a list) than making a function for it.

    The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

    Anyway, the code:

    import string

    # Need a function to convert a string to a list to be
    # able to use the sort() function
    def string2list(s):
    l = []
    for i in range(0, len(s)):
    l.append(s)
    return l

    words = []
    found = []

    anagram = raw_input("Find anagrams of word: ")

    f = open('words.txt', 'r')
    file = f.read()
    f.close()

    words = file.splitlines()

    sorted_anagram = anagram.lower()
    sorted_anagram = string2list(anagram)
    sorted_anagram.sort(lambda x, y: cmp(x, y))


    while words:
    if len(words[0]) == len(sorted_anagram):
    wordlist = string2list(words[0])
    wordlist.sort(lambda x, y: cmp(x, y))
    sorted_wordlist = wordlist
    if sorted_anagram == sorted_wordlist:
    found.append(words[0])
    del words[0]

    print "Anagrams of " + anagram + ": "
    while found:
    print found[0] + " "
    del found[0]
     
    Tom Carrick, Apr 2, 2005
    #1
    1. Advertising

  2. Tom Carrick wrote:
    > Hi,
    >
    > In my attempted learning of python, I've decided to recode an old
    > anagram solving program I made in C++. The C++ version runs in less
    > than a second, while the python takes 30 seconds. I'm not willing to
    > think it's just python being slow, so I was hoping someone could find
    > a faster way of doing this.


    I like your attitude, not thinking that it's just Python that is slow :)

    > Also, I was wondering if there was a more
    > builtin, or just nicer way of converting a string to a list (or using
    > the sort function on a list) than making a function for it.


    String to list: list("irmen") # --> ['i','r','m','e','n']
    Sorted list: sorted("irmen") # --> ['e', 'i', 'm', 'n', 'r']
    (the latter works in Python 2.4+)

    >
    > The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
    >
    > Anyway, the code:
    >
    > import string
    >
    > # Need a function to convert a string to a list to be
    > # able to use the sort() function
    > def string2list(s):
    > l = []
    > for i in range(0, len(s)):
    > l.append(s)
    > return l


    .... see above... just replace string2list(s) with sorted(s)

    >
    > words = []
    > found = []
    >
    > anagram = raw_input("Find anagrams of word: ")
    >
    > f = open('words.txt', 'r')
    > file = f.read()
    > f.close()


    Style: don't use 'file' as a variable name, you're hiding the
    builtin 'file' function

    >
    > words = file.splitlines()


    You can obtain this list without reading the file in its entirety,
    by using the readlines method of file objects:

    words=open("words.txt").readlines()

    >
    > sorted_anagram = anagram.lower()
    > sorted_anagram = string2list(anagram)
    > sorted_anagram.sort(lambda x, y: cmp(x, y))


    The lambda is optional and only slows it down :)
    But to get a sorted list of letters, just use sorted(s)
    if you're on Python 2.4+

    > while words:
    > if len(words[0]) == len(sorted_anagram):
    > wordlist = string2list(words[0])
    > wordlist.sort(lambda x, y: cmp(x, y))
    > sorted_wordlist = wordlist


    (same here.. replacing this by sorted(words[0]) probably
    will speed it up rather significantly, partly because
    it avoids the creation of those temporary lists)

    > if sorted_anagram == sorted_wordlist:
    > found.append(words[0])
    > del words[0]
    >
    > print "Anagrams of " + anagram + ": "
    > while found:
    > print found[0] + " "
    > del found[0]


    print " ".join(found)


    Cheers

    --Irmen de Jong
     
    Irmen de Jong, Apr 2, 2005
    #2
    1. Advertising

  3. In <>, Tom Carrick
    wrote:

    > […] Also, I was wondering if there was a more
    > builtin, or just nicer way of converting a string to a list (or using
    > the sort function on a list) than making a function for it.


    Use the `list()` builtin on the string and *just* the `sort()` method::

    In [2]: characters = list('hello')

    In [3]: characters
    Out[3]: ['h', 'e', 'l', 'l', 'o']

    In [4]: characters.sort()

    In [5]: characters
    Out[5]: ['e', 'h', 'l', 'l', 'o']

    > sorted_anagram = anagram.lower()
    > sorted_anagram = string2list(anagram)
    > sorted_anagram.sort(lambda x, y: cmp(x, y))


    sorted_anagram = list(anagram.lower())
    sorted_anagram.sort()

    > while words:
    > if len(words[0]) == len(sorted_anagram):
    > wordlist = string2list(words[0])
    > wordlist.sort(lambda x, y: cmp(x, y))
    > sorted_wordlist = wordlist
    > if sorted_anagram == sorted_wordlist:
    > found.append(words[0])
    > del words[0]


    And here's the performance issue. Deleting the first element of a list
    results in moving all remaining elements one index down. Better iterate
    over the words in a for loop::

    for word in words:
    # use `word` instead of `word[0]` in the loop body.
    ...

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Apr 2, 2005
    #3
  4. In <424ec521$0$149$4all.nl>, Irmen de Jong wrote:

    >> words = file.splitlines()

    >
    > You can obtain this list without reading the file in its entirety,
    > by using the readlines method of file objects:
    >
    > words=open("words.txt").readlines()


    This leaves the newline characters at the end of each line while
    `str.splitlines()` removes them.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Apr 2, 2005
    #4
  5. "Tom Carrick" <> schrieb im Newsbeitrag
    news:...
    | Hi,
    |
    | In my attempted learning of python, I've decided to recode an old
    | anagram solving program I made in C++. The C++ version runs in less
    | than a second, while the python takes 30 seconds. I'm not willing to
    | think it's just python being slow, so I was hoping someone could find
    | a faster way of doing this. Also, I was wondering if there was a more
    | builtin, or just nicer way of converting a string to a list (or using
    | the sort function on a list) than making a function for it.
    |
    | The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
    |
    | Anyway, the code:
    |
    | import string

    You're importing string, but never use it, so you can omit that line.

    |
    | # Need a function to convert a string to a list to be
    | # able to use the sort() function
    | def string2list(s):
    | l = []
    | for i in range(0, len(s)):
    | l.append(s)
    | return l

    No need to write your own function. list(s) already does the trick.

    |
    | words = []
    | found = []
    |
    | anagram = raw_input("Find anagrams of word: ")
    |
    | f = open('words.txt', 'r')
    | file = f.read()
    | f.close()


    I don't have a copy of words.txt, but here's what I would try
    (untested):

    anagram = raw_input("Find anagrams of word: ")
    sorted_anagram = list(sorted(anagram.lower()))
    # If you're Python is pre 2.4 ise
    # sorted_anagram = list(anagram.lower())
    # sorted_anagram.sort() #--sort list in place


    found = []
    # assuming "words.txt" contains a word per line
    # iterate over the lines of the file

    for line in open("/path/to/words.txt"):
    word = line[:-1] # Get rid of trailing newline
    sorted_word = list(sorted(word.lower()))
    if sorted_word == sorted_anagram:
    found.append(word)
    if found:
    print "Anagrams of %s:" % anagram
    for w in found:
    print w
    else:
    print "No anagrams for %s" % anagram


    --

    Vincent Wehren
     
    vincent wehren, Apr 2, 2005
    #5
  6. Tom Carrick

    Thomas Rast Guest

    Tom Carrick <> writes:

    > In my attempted learning of python, I've decided to recode an old
    > anagram solving program I made in C++. The C++ version runs in less
    > than a second, while the python takes 30 seconds.


    Indeed, your program can be improved to run about ten times as fast,
    which (on my system, with 96274 entries in /usr/share/dict/words) is
    below a second.

    In general you should try to move the loops into C code, i.e. use
    built-in functions instead of long 'for' blocks.

    Some comments on your version:

    > import string
    >
    > # Need a function to convert a string to a list to be
    > # able to use the sort() function
    > def string2list(s): [snipped]


    list() achieves the same thing a lot faster.

    > words = []


    You do not need to initialize 'words' here, as you're overwriting it a
    few lines afterwards.

    > found = []
    >
    > anagram = raw_input("Find anagrams of word: ")
    >
    > f = open('words.txt', 'r')
    > file = f.read()
    > f.close()
    >
    > words = file.splitlines()


    Try to avoid assigning to the names of built-in functions if you can.
    Names like 'file', 'list', 'dict', 'map' etc. are often an obvious
    choice, but overwriting them means that you don't "just" know what a
    later use refers to.

    > sorted_anagram = anagram.lower()
    > sorted_anagram = string2list(anagram)
    > sorted_anagram.sort(lambda x, y: cmp(x, y))


    Unless you *really* have to, don't use comparison functions with
    sort(), as they slow the operation considerably. In this (as in most)
    cases, a plain sorted_anagram.sort() does the trick, and in version
    2.4 you can achieve custom sort orders with the optional 'key'
    argument. The sorted() built-in also comes in handy here.

    > while words:
    > if len(words[0]) == len(sorted_anagram):
    > wordlist = string2list(words[0])
    > wordlist.sort(lambda x, y: cmp(x, y))
    > sorted_wordlist = wordlist
    > if sorted_anagram == sorted_wordlist:
    > found.append(words[0])
    > del words[0]


    Avoid this style of looping at all times! Removing the first element
    of a list is O(n), so looping through the whole list as above is
    O(n**2). In most cases you should use a for loop:

    for word in words:
    # do something

    which is O(n) of course. If you do have to loop destructively, pop()
    from the end (which is the default) like so:

    while words:
    word = words.pop()
    # do something

    This is also O(n), because removing the *last* element of a list is
    O(1) (amortized; I suppose the implementation will occasionally shrink
    the underlying array at linear cost).

    > print "Anagrams of " + anagram + ": "
    > while found:
    > print found[0] + " "
    > del found[0]


    I assume you meant not to print a newline between the words, which
    'print' does by default. The best solution in that case is "
    ".join(found).

    A better version (2.4+ only):

    -- 8< -- 8< --
    anagram = raw_input("Find anagrams of word: ")

    words = open('words.txt', 'r')

    sorted_anagram = sorted(anagram.lower())

    found = []

    for word in words.read().splitlines():
    if len(word) == len(anagram) and sorted(word) == sorted_anagram:
    found.append(word)

    print "Anagrams of %s: %s" % (anagram, ' '.join(found))
    -- >8 -- >8 --

    Interestingly, the length comparison makes quite a difference! I
    removed it at first, thinking it was unnecessary. Here are some
    timings:

    * Your original version (for comparison):

    $ time echo stop | python2.4 anagram_slow.py
    [...]
    real 0m9.090s
    user 0m8.790s
    sys 0m0.013s

    * Your version, but with the O(n**2) loop replaced by an O(n) 'for':

    $ time echo stop | python2.4 anagram_forloop.py
    [...]
    real 0m0.221s
    user 0m0.134s
    sys 0m0.014s

    * My version but with the length comparison removed:

    $ time echo stop | python2.4 anagram_no_lencmp.py
    [...]
    real 0m0.408s
    user 0m0.353s
    sys 0m0.010s

    * My version as above:

    $ time echo stop | python2.4 anagram_fast.py
    [...]
    real 0m0.144s
    user 0m0.099s
    sys 0m0.008s

    Hope that helps :)

    - Thomas

    --
    If you want to reply by mail, substitute my first and last name for
    'foo' and 'bar', respectively, and remove '.invalid'.
     
    Thomas Rast, Apr 2, 2005
    #6
  7. Re: Can I play too?

    Scott David Daniels wrote:

    > if __name__ == '__main__':
    > import sys
    > main(sys.argv[1:] or ['anagrams.py'])


    This is *exactly* the kind of testcases I'm looking for to test
    the soon-to-be-released pyvm. Great! I'll be back with results.


    For now, a fast anagrams.py is
    --------------------------------------
    import sys

    WORDS = [ i.rstrip () for i in file ('/usr/share/dict/words') ]

    def findana (anagram):
    sorted_anagram = sorted(anagram.lower())
    len_anagram = len (anagram)
    found = [ word for word in WORDS if len(word)==len_anagram and
    sorted(word)==sorted_anagram ]
    print "Anagrams of %s: %s" % (anagram, ' '.join(found))

    for i in sys.argv [1:]:
    findana (i)
    -----------------------------------------


    And timings....

    time python anagram.pyc stop step words lots pool eat fast slow lamp
    cold door xyzzy
    Anagrams of stop: opts post pots spot stop tops
    Anagrams of step: pest pets sept step
    Anagrams of words: sword words
    Anagrams of lots: lost lots slot
    Anagrams of pool: loop polo pool
    Anagrams of eat: ate eat tea
    Anagrams of fast: fast fats
    Anagrams of slow: lows owls slow
    Anagrams of lamp: lamp palm
    Anagrams of cold: clod cold
    Anagrams of door: door odor
    Anagrams of xyzzy:

    real 0m1.491s
    user 0m1.390s
    sys 0m0.040s

    time pyvm anagram.pyc stop step words lots pool eat fast slow lamp cold
    door xyzzy
    Anagrams of stop: opts post pots spot stop tops
    Anagrams of step: pest pets sept step
    Anagrams of words: sword words
    Anagrams of lots: lost lots slot
    Anagrams of pool: loop polo pool
    Anagrams of eat: ate eat tea
    Anagrams of fast: fast fats
    Anagrams of slow: lows owls slow
    Anagrams of lamp: lamp palm
    Anagrams of cold: clod cold
    Anagrams of door: door odor
    Anagrams of xyzzy:

    real 0m0.923s
    user 0m0.760s
    sys 0m0.070s


    -------
    Stelios
     
    stelios xanthakis, Apr 2, 2005
    #7
  8. Can I play too?

    Thomas Rast wrote:
    > Tom Carrick <> writes:
    >>In my attempted learning of python, I've decided to recode an old
    >>anagram solving program I made in C++. The C++ version runs in less
    >>than a second, while the python takes 30 seconds.

    >
    > Indeed, your program can be improved to run about ten times as fast, ...
    > <great stuff>


    This problem inspired an "all anagrams" program. Using it I was able
    to find the largest anagram group in Shakespeare's first folio in about
    the time you originally found anagrams for an individual word.

    7: owers = rowse = sower = sowre = swore = woers = worse

    ====

    def words(source):
    for line in source:
    for word in line.split():
    yield word


    def all_anagrams(words):
    seen = dict()

    for word in words:
    word = word.lower()
    if word not in seen:
    dorw = ''.join(sorted(word))
    try:
    seen[dorw].append(word)
    except KeyError:
    seen[dorw] = [word]
    if word == dorw:
    continue
    seen[word] = ()
    for group in seen.itervalues():
    if len(group) > 1:
    yield -len(group), sorted(group) # conveniently sortable


    def main(sources):
    for filename in sources:
    dictionary = open(filename, 'r')
    print "All anagrams from %s:" % filename
    try:
    for nsize, group in sorted(all_anagrams(words(dictionary))):
    print '%2s: %s' % (-nsize, ' = '.join(group))
    finally:
    dictionary.close()
    print



    if __name__ == '__main__':
    import sys
    main(sys.argv[1:] or ['anagrams.py'])
     
    Scott David Daniels, Apr 2, 2005
    #8
  9. In <>, Tom Carrick
    wrote:

    > In my attempted learning of python, I've decided to recode an old
    > anagram solving program I made in C++. The C++ version runs in less
    > than a second, while the python takes 30 seconds. I'm not willing to
    > think it's just python being slow, so I was hoping someone could find
    > a faster way of doing this. Also, I was wondering if there was a more
    > builtin, or just nicer way of converting a string to a list (or using
    > the sort function on a list) than making a function for it.
    >
    > The words.txt here is just a copy of FreeBSD's /usr/share/dict/words


    Here's my attempt which builds an anagram dictionary ("sorted word" ->
    list of anagrams) for fast lookup of anagrams::

    #!/usr/bin/env python2.4
    from itertools import imap, ifilter

    WORDS = '/usr/share/dict/words'

    def make_anagram_map(words):
    anagram_map = dict()
    for word in imap(lambda w: w.strip().lower(), words):
    sorted_word = ''.join(sorted(list(word)))
    anagram_map.setdefault(sorted_word, list()).append(word)

    return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))


    def main():
    words_file = open(WORDS, 'r')
    anagram_map = make_anagram_map(words_file)
    words_file.close()

    while True:
    word = raw_input('Find anagrams of word (just enter to end): ')
    if not word:
    break
    try:
    print anagram_map[''.join(sorted(list(word.strip().lower())))]
    except KeyError:
    print 'No anagrams found for %r' % word

    # # Print all anagrams sorted by number of anagrams.
    # print '\n'.join(map(str, sorted(anagram_map.values(), key=len)))
    # print len(anagram_map)


    if __name__ == '__main__':
    main()


    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Apr 2, 2005
    #9
  10. Marc 'BlackJack' Rintsch wrote:
    > def make_anagram_map(words):
    > anagram_map = dict()
    > for word in imap(lambda w: w.strip().lower(), words):
    > sorted_word = ''.join(sorted(list(word)))
    > anagram_map.setdefault(sorted_word, list()).append(word)
    >
    > return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))


    Or if you're afraid of map and filter like me, you can try:

    def make_anagram_map(words):
    anagram_map = {}
    for word in (w.strip().lower() for w in words):
    anagram_map.setdefault(''.join(sorted(word)), []).append(word)
    return dict(sortedword_wordlist
    for sortedword_wordlist in anagram_map.iteritems()
    if len(sortedword_wordlist[1]) > 1)


    py> make_anagram_map(['owers', 'pest', 'rowse', 'pets', 'sower', 'step'])
    {'epst': ['pest', 'pets', 'step'], 'eorsw': ['owers', 'rowse', 'sower']}

    STeVe
     
    Steven Bethard, Apr 2, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. **ham

    GDI+ performance issue

    **ham, Dec 12, 2004, in forum: ASP .Net
    Replies:
    10
    Views:
    7,973
    Steve McLellan
    Dec 14, 2004
  2. jm
    Replies:
    1
    Views:
    511
    alien2_51
    Dec 12, 2003
  3. Paul King

    ASP.NET performance issue

    Paul King, Jul 1, 2004, in forum: ASP .Net
    Replies:
    17
    Views:
    790
    Scott Allen
    Jul 2, 2004
  4. Akshay Kumar

    ASP.NET Session Management Performance issue

    Akshay Kumar, Jul 2, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    454
    Scott Allen
    Jul 2, 2004
  5. Software Engineer
    Replies:
    0
    Views:
    332
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page