Structures

S

Steven D'Aprano

For all practical purposes, dicts have almost constant access time (at
least with any half-decent __hash__  method).

Hash tables are slick, but their hash function is their weakest link.
[ hash( 2**x ) for x in range( 0, 256, 32 ) ]
[1, 1, 1, 1, 1, 1, 1, 1]

That's not the hash function of a dict. Dicts don't have a hash function:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: dict objects are unhashable


What you're showing is the hash function of ints, and you've discovered
that there are some collisions. This is hardly surprising. There are an
infinite number of ints, and only a finite number of values that they can
hash to, so there have to be some collisions. It's worth looking at the
values that collide:
[ 2**x for x in range( 0, 256, 32 ) ]
[1, 4294967296L, 18446744073709551616L, 79228162514264337593543950336L,
340282366920938463463374607431768211456L,
1461501637330902918203684832716283019655932542976L,
6277101735386680763835789423207666416102355444464034512896L,
26959946667150639794667015087019630673637144422540572481103610249216L]

So if you have a dict with keys equal to those numbers, and do a lookup
on one of them, you'll get O(N) performance instead of O(1) *for those
keys that collide*. I think I can live with that.

If you think you can come up with a better hash function, feel free to
subclass int.

Such an index gives you linear-time performance in finding elements. Ha!

In any real application, all your keys are highly unlikely to collide in
that fashion.


Plus, your worst-case insertion will cause a copy of the entire table.

Explain please.

There aren't any mappings based on balanced trees, unfortunately.

I'm not sure why you think O(log N) is better than O(1). The major
advantage of tree structures is that, unlike hash tables, they are
ordered, not that they don't have collisions.
 
S

Steven D'Aprano

Names are more descriptive than "magic numbers" as indices. See for
example the "named tuple" returned by `os.stat()`.


I have no objection to named tuples, but I've sometimes missed having an
equivalent to the Pascal record or C struct: essentially a named mutable
tuple. Here's a quick way to get one:

def record(**kwargs):
"""Lightweight named mutable tuple equivalent."""
class Record(object):
__slots__ = kwargs.keys()
x = Record()
for key, value in kwargs.items():
setattr(x, key, value)
return x


It needs more work to be a full-fledged record, e.g. __eq__ and __str__,
but it's a start.
 
G

Glenn Linderman

I don't know even half of those. What about Perl? Does anyone know
that?

structs in Perl are generally implemented as hashes, which is similar to
a Python dict.
 
M

Michele Simionato

Note that classes, by default, are based on a contained dict!  There are
games to be played with slots that can apparently improve that.  I am
not yet experienced enough with Python to know if a slot is as fast as a
C struct, but perhaps it is.  

No, slots have nothing to do with speed, they are a memory
optimization. IMO slots fall in the category of premature optimization
and it was a mistake to include them in the language
(the functionality should have been made available only at the C-
level). As of now, lots of people abuse slots without realizing their
disadvantages (for instance, they break multiple inheritance).
 
B

Bruno Desthuilliers

Paulo J. Matos a écrit :
(snip)
However, I wouldn't dare to say Python needs structures to be a good
language, or anything similar. My question was more directed to : if
there aren't structures in Python, what do Pythonists use instead?
(I have seen dicts might be an alternative,

Yes, and the most obvious one - at least when all you need is a kind of
data transfert object. Else, well, C++ structs are just a survival of a
C feature kept for compatibility reasons - technically speaking, you
just don't need structs when you have classes.
but as I said in previous
post, they seem to flexible [making them a canon to shoot a fly,

Err... I'm afraid you'll find Python way to flexible and dangerous,
then. You may not be aware of the fact that you can add / remove /
replace objects attributes (and methods) at runtime ? FWIW, and a couple
corner cases set asides, Python objects are mostly glorified dicts.
and
they probably lack constant-time access, right?]

AFAIK, it's almost constant-time access. But you won't get anything
better using classes, since attributes are stored in a dict anyway.
 
B

bearophileHUGS

Michele Simionato:
No, slots have nothing to do with speed, they are a memory optimization.

In many languages, often in Python too, the less memory you use/
allocate the faster you go.

In fact slots allow a speed increase too (in new style classes):

from timeit import default_timer as clock

class C1(object):
__slots__ = ["a", "b"]
def __init__(self, a, b):
self.a = a
self.a = b

class C2(object):
def __init__(self, a, b):
self.a = a
self.a = b

def main(N, test):
t0 = clock()

if test == 1:
[C1(ab, ab) for ab in xrange(N)]
elif test == 2:
[C2(ab, ab) for ab in xrange(N)]

t1 = clock()
print round(t1 - t0, 2)

main(N=700*1000, test=1)

Core duo 2 GHz:
test=1 ==> 1.06 s
test=2 ==> 3.0 s

(700*1000 is the way I have found to write the 700_000 I was talking
about, until we'll have a syntax for it.)

Bye,
bearophile
 
M

Michele Simionato

Michele Simionato:
No, slots have nothing to do with speed, they are a memory optimization..

In many languages, often in Python too, the less memory you use/
allocate the faster you go.

In fact slots allow a speed increase too (in new style classes):

from timeit import default_timer as clock

class C1(object):
    __slots__ = ["a", "b"]
    def __init__(self, a, b):
        self.a = a
        self.a = b

class C2(object):
    def __init__(self, a, b):
        self.a = a
        self.a = b

def main(N, test):
    t0 = clock()

    if test == 1:
        [C1(ab, ab) for ab in xrange(N)]
    elif test == 2:
        [C2(ab, ab) for ab in xrange(N)]

    t1 = clock()
    print round(t1 - t0, 2)

main(N=700*1000, test=1)

Core duo 2 GHz:
test=1 ==> 1.06 s
test=2 ==> 3.0 s

(700*1000 is the way I have found to write the 700_000 I was talking
about, until we'll have a syntax for it.)

Bye,
bearophile

I did not expect such a large difference in instantiation time.
However I was thinking about
access time, and then the difference is not impressive (~20-25%):

from timeit import default_timer as clock

class C1(object):
__slots__ = ["a", "b"]
def __init__(self, a, b):
self.a = a
self.b = b

class C2(object):
def __init__(self, a, b):
self.a = a
self.b = b

def main(N, test):
t0 = clock()
if test == 'with slots':
c = C1(1, 2)
for _ in xrange(N):
(c.a, c.b)
elif test == 'without slots':
c = C2(1, 2)
for _ in xrange(N):
(c.a, c.b)
t1 = clock()
print test, round(t1 - t0, 3)

main(N=700*1000, test='with slots') # 0.152s
main(N=700*1000, test='without slots') # 0.195s

I mantain that one should use slots only as last resort, if the
speedup really justify having nonstandard classes.
 
A

Arnaud Delobelle

Marc 'BlackJack' Rintsch said:
Names are more descriptive than "magic numbers" as indices. See for
example the "named tuple" returned by `os.stat()`.

Ciao,
Marc 'BlackJack' Rintsch

I'm reminded of a 'struct' decorator I made some time ago:

def struct(f):
classname = f.__name__
prop_names = f.func_code.co_varnames[:f.func_code.co_argcount]
def _new(cls, *args, **kwargs):
return tuple.__new__(cls, f(*args, **kwargs))
def _repr(self):
return '%s%s' % (type(self).__name__, tuple(self))
def prop(i):
return property(lambda self: self)
attrs = { '__slots__': (), '__new__': _new, '__repr__': _repr }
attrs.update((name, prop(i)) for i, name in enumerate(prop_names))
return type(classname, (tuple,), attrs)


The advantage over namedtuple is that you don't have to repeat the name
of the structure and you can easily define default values. The drawback
is that it involved some mild hackery (It won't work as is in Python 3
as f.func_code changed to f.__code__). It worked like this:
.... def Point(x=0, y=0): return float(x), float(y)
....Traceback (most recent call last):
True

It's immutable but it would be easy to make it mutable by making it
inherit from list instead of tuple.

You could even have some processing:
.... def Interval(start, end, size=None):
.... size = end - start
.... return start, end, size
.... (2.5, 6.0, 3.5)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,265
Messages
2,571,071
Members
48,771
Latest member
ElysaD

Latest Threads

Top