Are lists at least as efficient as dictionaries?

Narendra C. Tulpule · Aug 29, 2003

Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?
Basically, if Python really implements lists as linked lists but
dictionaries as hash tables, it may well be that hashing a key takes
negligible time as compared to comparing it against every list
element.
Oh and here is (as a non-sequiter) something I don't understand
either:

x = ([],)
x[0] += ['something']

Click to expand...

Click to expand...

Traceback (most recent call last):

File said:
x (['something'],) <---- complained but did it anyway??
x[0].append('and another thing') <------- no complaint!
x

Click to expand...

Click to expand...

(['something', 'and another thing'],)

Peter Otten · Aug 29, 2003

Narendra said:
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?

Wild guess: I suppose that both implementations will not significantly
affect the overall speed of your application, e.g. if the strings are
*really* large (as opposed to the list of *only* 100 elements), reading
from disk will take much longer than inserting into the list, even at
arbitrary positions.

Also, note that no particular order is preserved for the keys in a
dictionary.

And now for something completely different:

x = ([],)
x[0] += ['something']

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment

+= calls the list.__iadd__(self, other) method, which seems to be
implemented as

def __iadd__(self, other):
self.append(other)
return self

The call of this method succeds, but the following assignment fails, because
tuples are immutable. This could only be remedied if all assignments

a = a

were silently ignored, or, if the += operator would not perform an
assignment, which has the disadvantage that it would no longer work for
immutables, so that
You could change __iadd__() to

def __iadd__(self, other):
return self + other

but then sane behaviour in one special case comes at the cost of always
creating a copy of a potentially large (say more than 100 items

list.

By the way, one topic per post is always a good idea

Peter

Alex Martelli · Aug 29, 2003

Narendra said:
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?

If you use a dictionary, there will be no "intrinsic" ordering -- you
may as well use sets, then. "Insertion" for a list would thus just
be an .append, very fast. "removal" may be O(N) for a list, if you
need to search through it for occurrences.

Basically, if Python really implements lists as linked lists but
dictionaries as hash tables, it may well be that hashing a key takes
negligible time as compared to comparing it against every list
element.

Python's lists are actually vectors, dicts are indeed hash tables.
Hashing a long string does take some time, but the value may be
cached (depending on the Python implementation) so that said time
need be spent only once for a given string object (but separate
string objects will spend the time multiply, even if it so happens
that their values coincide).

All in all, there's no serious alternative to benchmarking both
possibilities in a realistic scenario. Quite likely you may find
out that -- for sensible frequencies of insertiom / removal --
the time doesn't matter for your application overall (dominated
by I/O or other issues), and then you can use the simplest rather
than the fastest approach. At least 9 times out of 10 you will
be happiest with simplicity. If you don't care about ordering,
Python 2.3's sets are likely the simplest data structure (they
are implemented in terms of dictionaries, thus pretty fast too).

Alex

Jp Calderone · Aug 29, 2003

Narendra C. Tulpule wrote:
[ snip]

And now for something completely different:

x = ([],)
x[0] += ['something']

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment

Click to expand...

+= calls the list.__iadd__(self, other) method, which seems to be
implemented as

def __iadd__(self, other):
self.append(other)
return self

self.extend(other), actually. But that's the basic idea, yea.

The call of this method succeds, but the following assignment fails, because
tuples are immutable. This could only be remedied if all assignments

a = a

were silently ignored, or, if the += operator would not perform an
assignment, which has the disadvantage that it would no longer work for
immutables, so that

+= could simply be syntactic sugar for a call to __add__ and then an
assignment. This would work for mutable and immutable objects.

You could change __iadd__() to

def __iadd__(self, other):
return self + other

but then sane behaviour in one special case comes at the cost of always
creating a copy of a potentially large (say more than 100 items list.

True, except that list.extend() exists.

Jp

John Roth · Aug 29, 2003

Narendra C. Tulpule said:
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?

Lists are more efficient for lookup by index, dictionaries are
more efficient for insertion (except at the end, where Python
maintains extra slots for just this case), removal and lookup
by key. Lists are also much more efficient for sequential
traversal.

John Roth

Jp Calderone · Aug 30, 2003

But it loses the advantage that some objects would otherwise have of
being able to mutate in place, without allocating a new object (ie. very
large matrix additions).

But as you removed from my original post, list.extend() exists. All one
has to do to retain the existing functionality of __iadd__ is name the
method something else, then call it. All the advantages, none of the
confusing or difficult to track semantics.

Jp

Chad Netzer · Aug 30, 2003

But as you removed from my original post, list.extend() exists. All one
has to do to retain the existing functionality of __iadd__ is name the
method something else, then call it. All the advantages, none of the
confusing or difficult to track semantics.

True, however when working with large Matrices, I much prefer A += B, to
A.add(B), and the performance advantages with __iadd__ can be
substantial.

I prefer the original posters suggestion that self assignment be
ignored. But I see your point about the more difficult semantics of
__iadd__.

lists as an efficient implementation of large two-dimensionalarrays(!)	0	Feb 2, 2010
Taking list as an input from Python to C	1	Jul 21, 2022
updating dictionaries from/to dictionaries	12	Aug 11, 2008
nested dictionaries and functions in data structures.	0	Jan 7, 2014
Efficient lookup in list of dictionaries	2	Dec 5, 2005
Changing ' to " in printed representation of dictionaries	0	Dec 2, 2010
What is the most efficient way to compare similar contents in two lists?	12	Jun 13, 2011
Fwd: Re: Comparing Dictionaries	1	Jul 30, 2007

Are lists at least as efficient as dictionaries?

Narendra C. Tulpule

Peter Otten

Alex Martelli

Jp Calderone

John Roth

Jp Calderone

Chad Netzer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads