Fast Data Comparison (dict v. list. v string)

Discussion in 'Python' started by Scott Brady Drummonds, Apr 14, 2004.

  1. Hi, everyone,

    I'm a relative novice at Python and am working on some optimizations in my
    first Python project. At its core, this tool I'm writing needs to perform
    many comparisons to fixed-size lists of fixed-length strings.

    Currently, my implementation uses dictionaries to store each string. I
    maintain a separate "key-mapping" dictionary that maps keys from one
    dictionary to the other. Then, all I have to do to compare the two
    dictionaries is as follows:
    for metaKey in keyMap.keys():
    if dict1[metaKey] != dict2[keyMap[metaKey]]:
    # Do some processing

    Since the sizes of the dictionaries never change, I tried implementing this
    using lists. The solution looks something like this (and assumes that a
    pre-processing phase has sorted the contents of each list so their indexes
    are the same):
    for i in len(list1):
    if list1 != list2:
    # Do some processing

    As it turns out, this implementation appears to be about 33% faster than the
    dictionary-based one. Now, assuming that the datum being stored at each
    index can fit into one character, I could do a string-based implementation
    like this:
    for i in len(string1):
    if string1 != string:
    # Do some processing

    This last solution actually runs about the same as the dictionary, which
    takes 50% longer than the list implementation.

    Now, my questions are:
    1) Does anyone have another suggestion as to how I can organize these data
    so that I can compare many elements many times?
    2) Is there a string comparison operator that will return which indexes
    have different values? Maybe it would be faster than the iterative
    comparison approach for the third implementation.
    3) Since my data are changing between the successive executions of the code
    snippets above, I need a way of having the other parts of the program update
    it. But, it appears that strings are constant as I can't assign individual
    characters with "string1 = '0'". Is there any way around this?

    Thanks!
    Scott

    --
    Remove ".nospam" from the user ID in my e-mail to reply via e-mail.
     
    Scott Brady Drummonds, Apr 14, 2004
    #1
    1. Advertising

  2. Scott Brady Drummonds

    Larry Bates Guest

    Scott,

    You should take a close look at list comprehensions
    and possibly sets (new in V2.3). I'm not all that
    familiar with sets, but your problem sounds like
    they might come in handy.

    Example list comprehension:

    list1notinlist2=[x for x in list1 if x not in list2]
    for item in list1notinlist:
    # Do some processing

    Note: In Python if you find yourself using
    indexes into lists, there is "almost" always a
    better way (at least that is my experience).

    Larry Bates
    Syscon, Inc.

    "Scott Brady Drummonds" <> wrote in
    message news:c5k8rp$ql6$...
    > Hi, everyone,
    >
    > I'm a relative novice at Python and am working on some optimizations in my
    > first Python project. At its core, this tool I'm writing needs to perform
    > many comparisons to fixed-size lists of fixed-length strings.
    >
    > Currently, my implementation uses dictionaries to store each string. I
    > maintain a separate "key-mapping" dictionary that maps keys from one
    > dictionary to the other. Then, all I have to do to compare the two
    > dictionaries is as follows:
    > for metaKey in keyMap.keys():
    > if dict1[metaKey] != dict2[keyMap[metaKey]]:
    > # Do some processing
    >
    > Since the sizes of the dictionaries never change, I tried implementing

    this
    > using lists. The solution looks something like this (and assumes that a
    > pre-processing phase has sorted the contents of each list so their indexes
    > are the same):
    > for i in len(list1):
    > if list1 != list2:
    > # Do some processing
    >
    > As it turns out, this implementation appears to be about 33% faster than

    the
    > dictionary-based one. Now, assuming that the datum being stored at each
    > index can fit into one character, I could do a string-based implementation
    > like this:
    > for i in len(string1):
    > if string1 != string:
    > # Do some processing
    >
    > This last solution actually runs about the same as the dictionary, which
    > takes 50% longer than the list implementation.
    >
    > Now, my questions are:
    > 1) Does anyone have another suggestion as to how I can organize these

    data
    > so that I can compare many elements many times?
    > 2) Is there a string comparison operator that will return which indexes
    > have different values? Maybe it would be faster than the iterative
    > comparison approach for the third implementation.
    > 3) Since my data are changing between the successive executions of the

    code
    > snippets above, I need a way of having the other parts of the program

    update
    > it. But, it appears that strings are constant as I can't assign

    individual
    > characters with "string1 = '0'". Is there any way around this?
    >
    > Thanks!
    > Scott
    >
    > --
    > Remove ".nospam" from the user ID in my e-mail to reply via e-mail.
    >
    >
     
    Larry Bates, Apr 15, 2004
    #2
    1. Advertising

  3. "Larry Bates" <> wrote in message
    news:...
    > Scott,
    >
    > You should take a close look at list comprehensions
    > and possibly sets (new in V2.3). I'm not all that
    > familiar with sets, but your problem sounds like
    > they might come in handy.


    Excellent suggestion! Thanks so much for the idea.

    Scott
     
    Scott Brady Drummonds, Apr 15, 2004
    #3
  4. Scott Brady Drummonds

    george young Guest

    "Scott Brady Drummonds" <> wrote in message news:<c5k8rp$ql6$>...
    > Hi, everyone,
    >
    > I'm a relative novice at Python and am working on some optimizations in my
    > first Python project. At its core, this tool I'm writing needs to perform
    > many comparisons to fixed-size lists of fixed-length strings.
    >
    > Currently, my implementation uses dictionaries to store each string. I
    > maintain a separate "key-mapping" dictionary that maps keys from one
    > dictionary to the other. Then, all I have to do to compare the two
    > dictionaries is as follows:
    > for metaKey in keyMap.keys():
    > if dict1[metaKey] != dict2[keyMap[metaKey]]:
    > # Do some processing
    >
    > Since the sizes of the dictionaries never change, I tried implementing this
    > using lists. The solution looks something like this (and assumes that a
    > pre-processing phase has sorted the contents of each list so their indexes
    > are the same):
    > for i in len(list1):
    > if list1 != list2:
    > # Do some processing
    >
    > As it turns out, this implementation appears to be about 33% faster than the
    > dictionary-based one. Now, assuming that the datum being stored at each
    > index can fit into one character, I could do a string-based implementation
    > like this:
    > for i in len(string1):
    > if string1 != string:
    > # Do some processing
    >
    > This last solution actually runs about the same as the dictionary, which
    > takes 50% longer than the list implementation.
    >
    > Now, my questions are:
    > 1) Does anyone have another suggestion as to how I can organize these data
    > so that I can compare many elements many times?
    > 2) Is there a string comparison operator that will return which indexes
    > have different values? Maybe it would be faster than the iterative
    > comparison approach for the third implementation.
    > 3) Since my data are changing between the successive executions of the code
    > snippets above, I need a way of having the other parts of the program update
    > it. But, it appears that strings are constant as I can't assign individual
    > characters with "string1 = '0'". Is there any way around this?


    You might also take a look at psyco: http://psyco.sourceforge.net. It is
    very easy to use, non-intrusive, and claims large performance improvements for
    code like this. I have hopes that things like psyco can let us keep code
    in the simplest and *clearest* form, and let external optimization mechanisms
    make it fast enough when needed.

    -- George Young
     
    george young, Apr 16, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skip Montanaro
    Replies:
    0
    Views:
    428
    Skip Montanaro
    Aug 15, 2003
  2. Alexander Kozlovsky

    dict!ident as equivalent of dict["ident"]

    Alexander Kozlovsky, May 21, 2006, in forum: Python
    Replies:
    5
    Views:
    379
    Alexander Kozlovsky
    May 22, 2006
  3. Paul Melis

    dict.has_key(x) versus 'x in dict'

    Paul Melis, Dec 6, 2006, in forum: Python
    Replies:
    48
    Views:
    1,351
    Kent Johnson
    Dec 15, 2006
  4. Juha Nieminen
    Replies:
    22
    Views:
    1,072
    Kai-Uwe Bux
    Oct 12, 2007
  5. macm
    Replies:
    11
    Views:
    627
    Alexander Gattin
    Nov 11, 2010
Loading...

Share This Page