dicts,instances,containers, slotted instances, et cetera.

O

ocschwar

Hi, all.

I have an application that that creates, manipulates, and finally
archives on disk 10^6 instances of an object that in CS/DB terms is
best described as a relation.

It has 8 members, all of them common Python datatypes. 6 of these are
set once and then not modified. 2 are modified around 4 times before
the instance's archving. Large collections (of small lists) of these
objects are created, iterated through, and sorted using any and all of
the 8 members as sorting keys.

It neither has nor needs custom methods.

I used a simple dictionary to create the application prototype. Now I
need to speed things up.
I first tried changing to a new style class, with __slots__, __init__,
__getstate__& __setstate__ (for pickling) and was shocked to see
things SLOW down over dictionaries.

So of these options, where should I go first to satisfy my need for
speed?

0. Back to dict
1. old style class
2. new style class
3. new style class, with __slots__, with or without some nuance I'm
missing.
4. tuple, with constants to mark the indices
5. namedTuple
6. other...
 
A

Aaron Brady

Hi, all.

I have an application that that creates, manipulates, and finally
archives on disk 10^6 instances of an object that in CS/DB terms is
best described as a relation.

It has 8 members, all of them common Python datatypes. 6 of these are
set once and then not modified. 2 are modified around 4 times before
the instance's archving. Large collections (of small lists) of these
objects are created, iterated through, and sorted using any and all of
the 8 members as sorting keys.

It neither has nor needs custom methods.

I used a simple dictionary to create the application prototype. Now I
need to speed things up.
I first tried changing to a new style class, with __slots__, __init__,
__getstate__& __setstate__ (for pickling) and was shocked to see
things SLOW down over dictionaries.

So of these options, where should I go first to satisfy my need for
speed?

0. Back to dict
1. old style class
2. new style class
3. new style class, with __slots__, with or without some nuance I'm
missing.
4. tuple, with constants to mark the indices
5. namedTuple
6. other...

Hello, quoting myself from another thread today:

There is the 'shelve' module. You could create a shelf that tells you
the filename of the 5 other ones. A million keys should be no
problem, I guess. (It's standard library.) All your keys have to be
strings, though, and all your values have to be pickleable. If that's
a problem, yes you will need ZODB or Django (I understand), or another
relational DB.

There is currently no way to store live objects.
 
D

Diez B. Roggisch

Hi, all.

I have an application that that creates, manipulates, and finally
archives on disk 10^6 instances of an object that in CS/DB terms is
best described as a relation.

It has 8 members, all of them common Python datatypes. 6 of these are
set once and then not modified. 2 are modified around 4 times before
the instance's archving. Large collections (of small lists) of these
objects are created, iterated through, and sorted using any and all of
the 8 members as sorting keys.

It neither has nor needs custom methods.

I used a simple dictionary to create the application prototype. Now I
need to speed things up.
I first tried changing to a new style class, with __slots__, __init__,
__getstate__& __setstate__ (for pickling) and was shocked to see
things SLOW down over dictionaries.

So of these options, where should I go first to satisfy my need for
speed?

0. Back to dict
1. old style class
2. new style class
3. new style class, with __slots__, with or without some nuance I'm
missing.
4. tuple, with constants to mark the indices
5. namedTuple
6. other...

Use a database? Or *maybe* a C-extension wrapped by ctypes.

Diez
 
O

ocschwar

On Jan 28, 2:38 pm, (e-mail address removed) wrote:

Hello, quoting myself from another thread today:

There is the 'shelve' module.  You could create a shelf that tells you
the filename of the 5 other ones.  A million keys should be no
problem, I guess.  (It's standard library.)  All your keys have to be
strings, though, and all your values have to be pickleable.  If that's
a problem, yes you will need ZODB or Django (I understand), or another
relational DB.

There is currently no way to store live objects.


The problem is NOT archiving these objects. That works fine.

It's the computations I'm using these thigns for that are slow, and
that failed to speed up using __slots__.

What I need is something that will speed up getattr() or its
equivalent, and to a lesser degree setattr() or its equivalent.
 
O

ocschwar

(e-mail address removed) schrieb:










Use a database? Or *maybe* a C-extension wrapped by ctypes.

Diez

I can't port the entire app to be a stored database procedure.

ctypes, maybe. I just find it odd that there's no quick answer on the
fastest way in Python to implement a mapping in this context.
 
D

Diez B. Roggisch

The problem is NOT archiving these objects. That works fine.

I know. But if they are sorted to various criteria, doing that inside a
DB might also be faster. That was the point I wanted to make.

Diez
 
S

Steven D'Aprano

The problem is NOT archiving these objects. That works fine.

It's the computations I'm using these thigns for that are slow, and that
failed to speed up using __slots__.

You've profiled and discovered that the computations are slow, not the
archiving?

What parts of the computations are slow?

What I need is something that will speed up getattr() or its equivalent,
and to a lesser degree setattr() or its equivalent.

As you've found, __slots__ is not that thing.
.... __slots__ = 'a'
.... a = 1
........ a = 1
....0.11414718627929688


One micro-optimization you can do is something like this:

for i in xrange(1000000):
obj.y = obj.x + 3*obj.x**2
obj.x = obj.y - obj.x
# 12 name lookups per iteration


Becomes:


y = None
x = obj.x
try:
for i in xrange(1000000):
y = x + 3*x**2
x = y - x
# 6 name lookups per iteration
finally:
obj.y = y
obj.x = x


Unless you've profiled and has evidence that the bottleneck is attribute
access, my bet is that the problem is some other aspect of the
computation. In general, your intuition about what's fast and what's slow
in Python will be misleading if you're used to other languages. E.g. in C
comparisons are fast and moving data is slow, but in Python comparisons
are slow and moving data is fast.
 
M

Michele Simionato

I just find it odd that there's no quick answer on the
fastest way in Python to implement a mapping in this context.

A Python dict is as fast as you can get. If that is not enough, your
only choice is to try something at the C level, which may give the
desired speedup or not. Good luck!

Michele Simionato
 
J

James Stroud

I can't port the entire app to be a stored database procedure.

Perhaps I underestimate what you mean by this, but you may want to look
at pyTables (http://www.pytables.org/moin/HowToUse).
ctypes, maybe. I just find it odd that there's no quick answer on the
fastest way in Python to implement a mapping in this context.

Your explanation of where your prototype is slow is a little unclear. If
your data is largely numerical, you may want to rethink your
organization and use a numeric package. I did something similar and saw
an order of magnitude speed increase by switching from python data types
to numpy combined with careful tuning of how I managed the data.

You may have to spend more time on this than you would like, but if you
really put some thought into it and grind at your organization, you can
probably get a significant performance increase.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top