Testing for membership speed

Discussion in 'Python' started by Guest, Jun 23, 2004.

  1. Guest

    Guest Guest

    I just ran this stuff for my own knowledge. Though it might be
    useful to some other people to know and maybe spark a discussion.

    I needed a fast way to test for membership, so naturally the
    choices were the builtin containers: lists, dictionaries, and
    tuples. The following is the test code and results:

    import timeit

    lst_i=timeit.Timer('random.randrange(10000) in l','import
    random; l=range(10000)')
    dct_i=timeit.Timer('l.has_key(random.randrange(10000))','import
    random; l=dict([(i,None) for i in xrange(10000)])')
    tup_i=timeit.Timer('random.randrange(10000) in l','import
    random; l=tuple(range(10000))')

    lst_str=timeit.Timer('md5.md5(str(random.randrange(10000))).hexdigest()
    in l','import random,md5; l=[md5.md5(str(i)).hexdigest() for i
    in xrange(10000)]')
    dct_str=timeit.Timer('l.has_key(md5.md5(str(random.randrange(10000))).hexdigest())','import
    random,md5; l=dict([(md5.md5(str(i)).hexdigest(),None) for i
    in xrange(10000)])')
    tup_str=timeit.Timer('md5.md5(str(random.randrange(10000))).hexdigest()
    in l','import random,md5; l=tuple([md5.md5(str(i)).hexdigest()
    for i in xrange(10000)])')

    print 'Integer lookup'
    r=lst_i.repeat(100,100); print 'list:',min(r),max(r);
    r=dct_i.repeat(100,100); print 'dict:',min(r),max(r);
    r=tup_i.repeat(100,100); print 'tupl:',min(r),max(r);
    print 'String lookup'
    r=lst_str.repeat(100,100); print 'list:',min(r),max(r);
    r=dct_str.repeat(100,100); print 'dict:',min(r),max(r);
    r=tup_str.repeat(100,100); print 'tupl:',min(r),max(r);

    [[[Ran on IRIX64 6.5, 24 processors, 12G Memory, 4G Swap, this
    code only uses one processor at %100 the full length of the run]]]
    Python 2.2.3 (#1, Nov 25 2003, 18:58:21) [C] on irix646-n32
    Type "help", "copyright", "credits" or "license" for more
    information.
    >>> ## working on region in file

    /usr/tmp/python-119959673PMu.py...
    Integer lookup
    list: 0.126830816269 0.160212993622
    dict: 0.00362300872803 0.00385618209839
    tupl: 0.119297981262 0.161748170853
    String lookup
    list: 0.142526865005 0.188524961472
    dict: 0.00711393356323 0.00760197639465
    tupl: 0.143892049789 0.186873912811
    >>>


    The results are conclusive, go for dictionaries. But this
    surprised me a little, anyone have insight as to why?

    I was also surprised that tuples and lists scored exactly the
    same. I was hoping that immutable tuples might earn it some
    speed over lists.

    I will eventually need this for testing strings. So the
    doubling of speed for strings over integers for dictionaries
    is a little alarming. Lists and tuples only saw a modest increase.

    Thank you in advance for any clever tricks you suggest.

    -Brian
    Guest, Jun 23, 2004
    #1
    1. Advertising

  2. Guest

    Larry Bates Guest

    The dictionaries are faster because they are indexed.
    Lists/tuples must be searched serially from the
    beginning.

    Dictionary indexes are hashed and the
    hashing algorithm has more to do on a string
    than on an integer (see first article link
    below for explanation).

    You could check into pre-hashing your strings and using
    these as integer keys. It will take longer to build
    the dictionary, but searching should be faster.

    Here are some articles that might interest you:

    http://effbot.org/zone/python-hash.htm

    http://mail.python.org/pipermail/python-dev/2003-June/036556.html

    http://www.egenix.com/files/python/mxTextTools.html

    Searching list or tuple serially from the beginning
    should take approximately the same time. Nothing
    about the mutability can help the search speed.

    HTH,
    Larry Bates
    Syscon, Inc.


    <> wrote in message
    news:...
    > I just ran this stuff for my own knowledge. Though it might be
    > useful to some other people to know and maybe spark a discussion.
    >
    > I needed a fast way to test for membership, so naturally the
    > choices were the builtin containers: lists, dictionaries, and
    > tuples. The following is the test code and results:
    >
    > import timeit
    >
    > lst_i=timeit.Timer('random.randrange(10000) in l','import
    > random; l=range(10000)')
    > dct_i=timeit.Timer('l.has_key(random.randrange(10000))','import
    > random; l=dict([(i,None) for i in xrange(10000)])')
    > tup_i=timeit.Timer('random.randrange(10000) in l','import
    > random; l=tuple(range(10000))')
    >
    > lst_str=timeit.Timer('md5.md5(str(random.randrange(10000))).hexdigest()
    > in l','import random,md5; l=[md5.md5(str(i)).hexdigest() for i
    > in xrange(10000)]')
    >

    dct_str=timeit.Timer('l.has_key(md5.md5(str(random.randrange(10000))).hexdig
    est())','import
    > random,md5; l=dict([(md5.md5(str(i)).hexdigest(),None) for i
    > in xrange(10000)])')
    > tup_str=timeit.Timer('md5.md5(str(random.randrange(10000))).hexdigest()
    > in l','import random,md5; l=tuple([md5.md5(str(i)).hexdigest()
    > for i in xrange(10000)])')
    >
    > print 'Integer lookup'
    > r=lst_i.repeat(100,100); print 'list:',min(r),max(r);
    > r=dct_i.repeat(100,100); print 'dict:',min(r),max(r);
    > r=tup_i.repeat(100,100); print 'tupl:',min(r),max(r);
    > print 'String lookup'
    > r=lst_str.repeat(100,100); print 'list:',min(r),max(r);
    > r=dct_str.repeat(100,100); print 'dict:',min(r),max(r);
    > r=tup_str.repeat(100,100); print 'tupl:',min(r),max(r);
    >
    > [[[Ran on IRIX64 6.5, 24 processors, 12G Memory, 4G Swap, this
    > code only uses one processor at %100 the full length of the run]]]
    > Python 2.2.3 (#1, Nov 25 2003, 18:58:21) [C] on irix646-n32
    > Type "help", "copyright", "credits" or "license" for more
    > information.
    > >>> ## working on region in file

    > /usr/tmp/python-119959673PMu.py...
    > Integer lookup
    > list: 0.126830816269 0.160212993622
    > dict: 0.00362300872803 0.00385618209839
    > tupl: 0.119297981262 0.161748170853
    > String lookup
    > list: 0.142526865005 0.188524961472
    > dict: 0.00711393356323 0.00760197639465
    > tupl: 0.143892049789 0.186873912811
    > >>>

    >
    > The results are conclusive, go for dictionaries. But this
    > surprised me a little, anyone have insight as to why?
    >
    > I was also surprised that tuples and lists scored exactly the
    > same. I was hoping that immutable tuples might earn it some
    > speed over lists.
    >
    > I will eventually need this for testing strings. So the
    > doubling of speed for strings over integers for dictionaries
    > is a little alarming. Lists and tuples only saw a modest increase.
    >
    > Thank you in advance for any clever tricks you suggest.
    >
    > -Brian
    >
    Larry Bates, Jun 23, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ham

    I need speed Mr .Net....speed

    Ham, Oct 28, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    2,317
    Antony Baula
    Oct 29, 2004
  2. efiedler
    Replies:
    1
    Views:
    2,017
    Tim Ward
    Oct 9, 2003
  3. Replies:
    2
    Views:
    2,269
    Howard
    Apr 28, 2004
  4. Replies:
    2
    Views:
    330
    Christopher Benson-Manica
    Apr 28, 2004
  5. Weng Lei-QCH1840
    Replies:
    1
    Views:
    178
    Thomas
    Aug 15, 2003
Loading...

Share This Page