Are lists at least as efficient as dictionaries?

Discussion in 'Python' started by Narendra C. Tulpule, Aug 29, 2003.

  1. Hi,
    if you know the Python internals, here is a newbie question for you.
    If I have a list with 100 elements, each element being a long string,
    is it more efficient to maintain it as a dictionary (with a key = a
    string from the list and value = None) for the purpose of insertion
    and removal?
    Basically, if Python really implements lists as linked lists but
    dictionaries as hash tables, it may well be that hashing a key takes
    negligible time as compared to comparing it against every list
    element.
    Oh and here is (as a non-sequiter) something I don't understand
    either:
    >>> x = ([],)
    >>> x[0] += ['something']

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    TypeError: object doesn't support item assignment
    >>> x

    (['something'],) <---- complained but did it anyway??
    >>> x[0].append('and another thing') <------- no complaint!
    >>> x

    (['something', 'and another thing'],)
     
    Narendra C. Tulpule, Aug 29, 2003
    #1
    1. Advertising

  2. Narendra C. Tulpule

    Peter Otten Guest

    Narendra C. Tulpule wrote:

    > If I have a list with 100 elements, each element being a long string,
    > is it more efficient to maintain it as a dictionary (with a key = a
    > string from the list and value = None) for the purpose of insertion
    > and removal?


    Wild guess: I suppose that both implementations will not significantly
    affect the overall speed of your application, e.g. if the strings are
    *really* large (as opposed to the list of *only* 100 elements), reading
    from disk will take much longer than inserting into the list, even at
    arbitrary positions.

    Also, note that no particular order is preserved for the keys in a
    dictionary.

    And now for something completely different:

    >>>> x = ([],)
    >>>> x[0] += ['something']

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > TypeError: object doesn't support item assignment


    += calls the list.__iadd__(self, other) method, which seems to be
    implemented as

    def __iadd__(self, other):
    self.append(other)
    return self

    The call of this method succeds, but the following assignment fails, because
    tuples are immutable. This could only be remedied if all assignments

    a = a

    were silently ignored, or, if the += operator would not perform an
    assignment, which has the disadvantage that it would no longer work for
    immutables, so that
    >>> i = 1
    >>> i += 1
    >>> print i

    1 # admittedly faked
    >>>


    You could change __iadd__() to

    def __iadd__(self, other):
    return self + other

    but then sane behaviour in one special case comes at the cost of always
    creating a copy of a potentially large (say more than 100 items :) list.

    By the way, one topic per post is always a good idea :)

    Peter
     
    Peter Otten, Aug 29, 2003
    #2
    1. Advertising

  3. Narendra C. Tulpule wrote:

    > Hi,
    > if you know the Python internals, here is a newbie question for you.
    > If I have a list with 100 elements, each element being a long string,
    > is it more efficient to maintain it as a dictionary (with a key = a
    > string from the list and value = None) for the purpose of insertion
    > and removal?


    If you use a dictionary, there will be no "intrinsic" ordering -- you
    may as well use sets, then. "Insertion" for a list would thus just
    be an .append, very fast. "removal" may be O(N) for a list, if you
    need to search through it for occurrences.

    > Basically, if Python really implements lists as linked lists but
    > dictionaries as hash tables, it may well be that hashing a key takes
    > negligible time as compared to comparing it against every list
    > element.


    Python's lists are actually vectors, dicts are indeed hash tables.
    Hashing a long string does take some time, but the value may be
    cached (depending on the Python implementation) so that said time
    need be spent only once for a given string object (but separate
    string objects will spend the time multiply, even if it so happens
    that their values coincide).

    All in all, there's no serious alternative to benchmarking both
    possibilities in a realistic scenario. Quite likely you may find
    out that -- for sensible frequencies of insertiom / removal --
    the time doesn't matter for your application overall (dominated
    by I/O or other issues), and then you can use the simplest rather
    than the fastest approach. At least 9 times out of 10 you will
    be happiest with simplicity. If you don't care about ordering,
    Python 2.3's sets are likely the simplest data structure (they
    are implemented in terms of dictionaries, thus pretty fast too).


    Alex
     
    Alex Martelli, Aug 29, 2003
    #3
  4. Narendra C. Tulpule

    Jp Calderone Guest

    On Fri, Aug 29, 2003 at 10:07:20AM +0200, Peter Otten wrote:
    > Narendra C. Tulpule wrote:
    >

    [ snip]
    >
    > And now for something completely different:
    >
    > >>>> x = ([],)
    > >>>> x[0] += ['something']

    > > Traceback (most recent call last):
    > > File "<stdin>", line 1, in ?
    > > TypeError: object doesn't support item assignment

    >
    > += calls the list.__iadd__(self, other) method, which seems to be
    > implemented as
    >
    > def __iadd__(self, other):
    > self.append(other)
    > return self


    self.extend(other), actually. But that's the basic idea, yea.

    >
    > The call of this method succeds, but the following assignment fails, because
    > tuples are immutable. This could only be remedied if all assignments
    >
    > a = a
    >
    > were silently ignored, or, if the += operator would not perform an
    > assignment, which has the disadvantage that it would no longer work for
    > immutables, so that
    > >>> i = 1
    > >>> i += 1
    > >>> print i

    > 1 # admittedly faked
    > >>>


    += could simply be syntactic sugar for a call to __add__ and then an
    assignment. This would work for mutable and immutable objects.

    >
    > You could change __iadd__() to
    >
    > def __iadd__(self, other):
    > return self + other
    >
    > but then sane behaviour in one special case comes at the cost of always
    > creating a copy of a potentially large (say more than 100 items :) list.


    True, except that list.extend() exists.

    Jp

    --
    "There is no reason for any individual to have a computer in their
    home."
    -- Ken Olson, President of DEC, World Future Society
    Convention, 1977
     
    Jp Calderone, Aug 29, 2003
    #4
  5. Narendra C. Tulpule

    John Roth Guest

    "Narendra C. Tulpule" <> wrote in message
    news:...
    > Hi,
    > if you know the Python internals, here is a newbie question for you.
    > If I have a list with 100 elements, each element being a long string,
    > is it more efficient to maintain it as a dictionary (with a key = a
    > string from the list and value = None) for the purpose of insertion
    > and removal?


    Lists are more efficient for lookup by index, dictionaries are
    more efficient for insertion (except at the end, where Python
    maintains extra slots for just this case), removal and lookup
    by key. Lists are also much more efficient for sequential
    traversal.

    John Roth
     
    John Roth, Aug 29, 2003
    #5
  6. Narendra C. Tulpule

    Jp Calderone Guest

    On Fri, Aug 29, 2003 at 10:24:37AM -0700, Chad Netzer wrote:
    > On Fri, 2003-08-29 at 07:54, Jp Calderone wrote:
    >
    > > += could simply be syntactic sugar for a call to __add__ and then an
    > > assignment. This would work for mutable and immutable objects.

    >
    > But it loses the advantage that some objects would otherwise have of
    > being able to mutate in place, without allocating a new object (ie. very
    > large matrix additions).
    >


    But as you removed from my original post, list.extend() exists. All one
    has to do to retain the existing functionality of __iadd__ is name the
    method something else, then call it. All the advantages, none of the
    confusing or difficult to track semantics.

    Jp
     
    Jp Calderone, Aug 30, 2003
    #6
  7. Narendra C. Tulpule

    Chad Netzer Guest

    On Sat, 2003-08-30 at 00:03, Jp Calderone wrote:

    > But as you removed from my original post, list.extend() exists. All one
    > has to do to retain the existing functionality of __iadd__ is name the
    > method something else, then call it. All the advantages, none of the
    > confusing or difficult to track semantics.


    True, however when working with large Matrices, I much prefer A += B, to
    A.add(B), and the performance advantages with __iadd__ can be
    substantial.

    I prefer the original posters suggestion that self assignment be
    ignored. But I see your point about the more difficult semantics of
    __iadd__.


    --
    Chad Netzer
     
    Chad Netzer, Aug 30, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. omission9
    Replies:
    13
    Views:
    780
    Ben Finney
    Jan 27, 2004
  2. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    410
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
  3. lysdexia
    Replies:
    6
    Views:
    508
    John Machin
    Dec 2, 2007
  4. Brandon
    Replies:
    12
    Views:
    492
    Brandon
    Aug 15, 2008
  5. AAaron123
    Replies:
    0
    Views:
    601
    AAaron123
    Oct 3, 2008
Loading...

Share This Page