Structures

Steven D'Aprano · Nov 3, 2008

For all practical purposes, dicts have almost constant access time (at
least with any half-decent __hash__ Â method).

Click to expand...

Hash tables are slick, but their hash function is their weakest link.

[ hash( 2**x ) for x in range( 0, 256, 32 ) ]

Click to expand...

Click to expand...

[1, 1, 1, 1, 1, 1, 1, 1]

That's not the hash function of a dict. Dicts don't have a hash function:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: dict objects are unhashable

What you're showing is the hash function of ints, and you've discovered
that there are some collisions. This is hardly surprising. There are an
infinite number of ints, and only a finite number of values that they can
hash to, so there have to be some collisions. It's worth looking at the
values that collide:

[ 2**x for x in range( 0, 256, 32 ) ]

Click to expand...

Click to expand...

[1, 4294967296L, 18446744073709551616L, 79228162514264337593543950336L,
340282366920938463463374607431768211456L,
1461501637330902918203684832716283019655932542976L,
6277101735386680763835789423207666416102355444464034512896L,
26959946667150639794667015087019630673637144422540572481103610249216L]

So if you have a dict with keys equal to those numbers, and do a lookup
on one of them, you'll get O(N) performance instead of O(1) *for those
keys that collide*. I think I can live with that.

If you think you can come up with a better hash function, feel free to
subclass int.

Such an index gives you linear-time performance in finding elements. Ha!

In any real application, all your keys are highly unlikely to collide in
that fashion.

Plus, your worst-case insertion will cause a copy of the entire table.

Explain please.

There aren't any mappings based on balanced trees, unfortunately.

I'm not sure why you think O(log N) is better than O(1). The major
advantage of tree structures is that, unlike hash tables, they are
ordered, not that they don't have collisions.

Steven D'Aprano · Nov 3, 2008

Names are more descriptive than "magic numbers" as indices. See for
example the "named tuple" returned by `os.stat()`.

I have no objection to named tuples, but I've sometimes missed having an
equivalent to the Pascal record or C struct: essentially a named mutable
tuple. Here's a quick way to get one:

def record(**kwargs):
"""Lightweight named mutable tuple equivalent."""
class Record(object):
__slots__ = kwargs.keys()
x = Record()
for key, value in kwargs.items():
setattr(x, key, value)
return x

It needs more work to be a full-fledged record, e.g. __eq__ and __str__,
but it's a start.

Glenn Linderman · Nov 4, 2008

I don't know even half of those. What about Perl? Does anyone know
that?

structs in Perl are generally implemented as hashes, which is similar to
a Python dict.

Michele Simionato · Nov 4, 2008

Note that classes, by default, are based on a contained dict! There are
games to be played with slots that can apparently improve that. I am
not yet experienced enough with Python to know if a slot is as fast as a
C struct, but perhaps it is.

No, slots have nothing to do with speed, they are a memory
optimization. IMO slots fall in the category of premature optimization
and it was a mistake to include them in the language
(the functionality should have been made available only at the C-
level). As of now, lots of people abuse slots without realizing their
disadvantages (for instance, they break multiple inheritance).

Bruno Desthuilliers · Nov 4, 2008

Joe Strout a écrit :

Classes.

or objects, or dicts, depending on concrete use case, personnal
preferences and current phase of the moon.

Bruno Desthuilliers · Nov 4, 2008

Paulo J. Matos a écrit :
(snip)

However, I wouldn't dare to say Python needs structures to be a good
language, or anything similar. My question was more directed to : if
there aren't structures in Python, what do Pythonists use instead?
(I have seen dicts might be an alternative,

Yes, and the most obvious one - at least when all you need is a kind of
data transfert object. Else, well, C++ structs are just a survival of a
C feature kept for compatibility reasons - technically speaking, you
just don't need structs when you have classes.

but as I said in previous
post, they seem to flexible [making them a canon to shoot a fly,

Err... I'm afraid you'll find Python way to flexible and dangerous,
then. You may not be aware of the fact that you can add / remove /
replace objects attributes (and methods) at runtime ? FWIW, and a couple
corner cases set asides, Python objects are mostly glorified dicts.

and
they probably lack constant-time access, right?]

AFAIK, it's almost constant-time access. But you won't get anything
better using classes, since attributes are stored in a dict anyway.

bearophileHUGS · Nov 4, 2008

Michele Simionato:

No, slots have nothing to do with speed, they are a memory optimization.

In many languages, often in Python too, the less memory you use/
allocate the faster you go.

In fact slots allow a speed increase too (in new style classes):

from timeit import default_timer as clock

class C1(object):
__slots__ = ["a", "b"]
def __init__(self, a, b):
self.a = a
self.a = b

class C2(object):
def __init__(self, a, b):
self.a = a
self.a = b

def main(N, test):
t0 = clock()

if test == 1:
[C1(ab, ab) for ab in xrange(N)]
elif test == 2:
[C2(ab, ab) for ab in xrange(N)]

t1 = clock()
print round(t1 - t0, 2)

main(N=700*1000, test=1)

Core duo 2 GHz:
test=1 ==> 1.06 s
test=2 ==> 3.0 s

(700*1000 is the way I have found to write the 700_000 I was talking
about, until we'll have a syntax for it.)

Bye,
bearophile

Michele Simionato · Nov 4, 2008

Michele Simionato:

No, slots have nothing to do with speed, they are a memory optimization..

Click to expand...

In many languages, often in Python too, the less memory you use/
allocate the faster you go.

In fact slots allow a speed increase too (in new style classes):

from timeit import default_timer as clock

class C1(object):
__slots__ = ["a", "b"]
def __init__(self, a, b):
self.a = a
self.a = b

class C2(object):
def __init__(self, a, b):
self.a = a
self.a = b

def main(N, test):
t0 = clock()

if test == 1:
[C1(ab, ab) for ab in xrange(N)]
elif test == 2:
[C2(ab, ab) for ab in xrange(N)]

t1 = clock()
print round(t1 - t0, 2)

main(N=700*1000, test=1)

Core duo 2 GHz:
test=1 ==> 1.06 s
test=2 ==> 3.0 s

(700*1000 is the way I have found to write the 700_000 I was talking
about, until we'll have a syntax for it.)

Bye,
bearophile

I did not expect such a large difference in instantiation time.
However I was thinking about
access time, and then the difference is not impressive (~20-25%):

from timeit import default_timer as clock

class C1(object):
__slots__ = ["a", "b"]
def __init__(self, a, b):
self.a = a
self.b = b

class C2(object):
def __init__(self, a, b):
self.a = a
self.b = b

def main(N, test):
t0 = clock()
if test == 'with slots':
c = C1(1, 2)
for _ in xrange(N):
(c.a, c.b)
elif test == 'without slots':
c = C2(1, 2)
for _ in xrange(N):
(c.a, c.b)
t1 = clock()
print test, round(t1 - t0, 3)

main(N=700*1000, test='with slots') # 0.152s
main(N=700*1000, test='without slots') # 0.195s

I mantain that one should use slots only as last resort, if the
speedup really justify having nonstandard classes.

Arnaud Delobelle · Nov 4, 2008

Marc 'BlackJack' Rintsch said:
Names are more descriptive than "magic numbers" as indices. See for
example the "named tuple" returned by `os.stat()`.

Ciao,
Marc 'BlackJack' Rintsch

I'm reminded of a 'struct' decorator I made some time ago:

def struct(f):
classname = f.__name__
prop_names = f.func_code.co_varnames[:f.func_code.co_argcount]
def _new(cls, *args, **kwargs):
return tuple.__new__(cls, f(*args, **kwargs))
def _repr(self):
return '%s%s' % (type(self).__name__, tuple(self))
def prop(i):
return property(lambda self: self)
attrs = { '__slots__': (), '__new__': _new, '__repr__': _repr }
attrs.update((name, prop(i)) for i, name in enumerate(prop_names))
return type(classname, (tuple,), attrs)

The advantage over namedtuple is that you don't have to repeat the name
of the structure and you can easily define default values. The drawback
is that it involved some mild hackery (It won't work as is in Python 3
as f.func_code changed to f.__code__). It worked like this:
.... def Point(x=0, y=0): return float(x), float(y)
....Traceback (most recent call last):
True

It's immutable but it would be easy to make it mutable by making it
inherit from list instead of tuple.

You could even have some processing:
.... def Interval(start, end, size=None):
.... size = end - start
.... return start, end, size
.... (2.5, 6.0, 3.5)

George Sakkis · Nov 4, 2008

I have no objection to named tuples, but I've sometimes
missed having an equivalent to the Pascal record or C
struct: essentially a named mutable tuple.

+1. FWIW, I posted a recipe along these lines, following closely the
functionality and implementation of namedtuple: http://code.activestate.com/recipes/576555/

George

Beginner coding	3	Jun 20, 2022
Self-learning (Java) isn't working. Need an in-person boot camp. Recommendations?	0	Feb 10, 2022
Small JS Countdown timer where user has to type string of numbers to stop it and win	8	Jun 16, 2024
Mapping My Path in Java Web Development: Crafting a Detailed Roadmap	0	Mar 9, 2024
How to host data visualization beginner friendly?	1	Aug 10, 2023
Nested structures question	3	Jan 12, 2011
New To Javascript - Accessing Data	3	Nov 26, 2023
nested dictionaries and functions in data structures.	0	Jan 7, 2014

Structures

Steven D'Aprano

Steven D'Aprano

Glenn Linderman

Michele Simionato

Bruno Desthuilliers

Bruno Desthuilliers

bearophileHUGS

Michele Simionato

Arnaud Delobelle

George Sakkis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads