dicts,instances,containers, slotted instances, et cetera.

Discussion in 'Python' started by ocschwar@gmail.com, Jan 28, 2009.

  1. Guest

    Hi, all.

    I have an application that that creates, manipulates, and finally
    archives on disk 10^6 instances of an object that in CS/DB terms is
    best described as a relation.

    It has 8 members, all of them common Python datatypes. 6 of these are
    set once and then not modified. 2 are modified around 4 times before
    the instance's archving. Large collections (of small lists) of these
    objects are created, iterated through, and sorted using any and all of
    the 8 members as sorting keys.

    It neither has nor needs custom methods.

    I used a simple dictionary to create the application prototype. Now I
    need to speed things up.
    I first tried changing to a new style class, with __slots__, __init__,
    __getstate__& __setstate__ (for pickling) and was shocked to see
    things SLOW down over dictionaries.

    So of these options, where should I go first to satisfy my need for
    speed?

    0. Back to dict
    1. old style class
    2. new style class
    3. new style class, with __slots__, with or without some nuance I'm
    missing.
    4. tuple, with constants to mark the indices
    5. namedTuple
    6. other...
    , Jan 28, 2009
    #1
    1. Advertising

  2. Aaron Brady Guest

    On Jan 28, 2:38 pm, wrote:
    > Hi, all.
    >
    > I have an application that that creates, manipulates, and finally
    > archives on disk 10^6 instances of an object that in CS/DB terms is
    > best described as a relation.
    >
    > It has 8 members, all of them common Python datatypes. 6 of these are
    > set once and then not modified. 2 are modified around 4 times before
    > the instance's archving. Large collections (of small lists) of these
    > objects are created, iterated through, and sorted using any and all of
    > the 8 members as sorting keys.
    >
    > It neither has nor needs custom methods.
    >
    > I used a simple dictionary to create the application prototype. Now I
    > need to speed things up.
    > I first tried changing to a new style class, with __slots__, __init__,
    > __getstate__& __setstate__ (for pickling) and was shocked to see
    > things SLOW down over dictionaries.
    >
    > So of these options, where should I go first to satisfy my need for
    > speed?
    >
    > 0. Back to dict
    > 1. old style class
    > 2. new style class
    > 3. new style class, with __slots__, with or without some nuance I'm
    > missing.
    > 4. tuple, with constants to mark the indices
    > 5. namedTuple
    > 6. other...


    Hello, quoting myself from another thread today:

    There is the 'shelve' module. You could create a shelf that tells you
    the filename of the 5 other ones. A million keys should be no
    problem, I guess. (It's standard library.) All your keys have to be
    strings, though, and all your values have to be pickleable. If that's
    a problem, yes you will need ZODB or Django (I understand), or another
    relational DB.

    There is currently no way to store live objects.
    Aaron Brady, Jan 28, 2009
    #2
    1. Advertising

  3. schrieb:
    > Hi, all.
    >
    > I have an application that that creates, manipulates, and finally
    > archives on disk 10^6 instances of an object that in CS/DB terms is
    > best described as a relation.
    >
    > It has 8 members, all of them common Python datatypes. 6 of these are
    > set once and then not modified. 2 are modified around 4 times before
    > the instance's archving. Large collections (of small lists) of these
    > objects are created, iterated through, and sorted using any and all of
    > the 8 members as sorting keys.
    >
    > It neither has nor needs custom methods.
    >
    > I used a simple dictionary to create the application prototype. Now I
    > need to speed things up.
    > I first tried changing to a new style class, with __slots__, __init__,
    > __getstate__& __setstate__ (for pickling) and was shocked to see
    > things SLOW down over dictionaries.
    >
    > So of these options, where should I go first to satisfy my need for
    > speed?
    >
    > 0. Back to dict
    > 1. old style class
    > 2. new style class
    > 3. new style class, with __slots__, with or without some nuance I'm
    > missing.
    > 4. tuple, with constants to mark the indices
    > 5. namedTuple
    > 6. other...


    Use a database? Or *maybe* a C-extension wrapped by ctypes.

    Diez
    Diez B. Roggisch, Jan 28, 2009
    #3
  4. Guest

    On Jan 28, 4:50 pm, Aaron Brady <> wrote:
    > On Jan 28, 2:38 pm, wrote:
    >
    > Hello, quoting myself from another thread today:
    >
    > There is the 'shelve' module.  You could create a shelf that tells you
    > the filename of the 5 other ones.  A million keys should be no
    > problem, I guess.  (It's standard library.)  All your keys have to be
    > strings, though, and all your values have to be pickleable.  If that's
    > a problem, yes you will need ZODB or Django (I understand), or another
    > relational DB.
    >
    > There is currently no way to store live objects.



    The problem is NOT archiving these objects. That works fine.

    It's the computations I'm using these thigns for that are slow, and
    that failed to speed up using __slots__.

    What I need is something that will speed up getattr() or its
    equivalent, and to a lesser degree setattr() or its equivalent.
    , Jan 28, 2009
    #4
  5. Guest

    On Jan 28, 5:21 pm, "Diez B. Roggisch" <> wrote:
    > schrieb:
    >
    >
    >
    > > Hi, all.

    >
    > > I have an application that that creates, manipulates, and finally
    > > archives on disk 10^6 instances of an object that in CS/DB terms is
    > > best described as a relation.

    >
    > > It has 8 members, all of them common Python datatypes. 6 of these are
    > > set once and then not modified. 2 are modified around 4 times before
    > > the instance's archving. Large collections (of small lists) of these
    > > objects are created, iterated through, and sorted using any and all of
    > > the 8 members as sorting keys.

    >
    > > It neither has nor needs custom methods.

    >
    > > I used a simple dictionary to create the application prototype. Now I
    > > need to speed things up.
    > > I first tried changing to a new style class, with __slots__, __init__,
    > > __getstate__& __setstate__ (for pickling) and was shocked to see
    > > things SLOW down over dictionaries.

    >
    > > So of these options, where should I go first to satisfy my need for
    > > speed?

    >
    > > 0. Back to dict
    > > 1. old style class
    > > 2. new style class
    > > 3. new style class, with __slots__, with or without some nuance I'm
    > > missing.
    > > 4. tuple, with constants to mark the indices
    > > 5. namedTuple
    > > 6. other...

    >
    > Use a database? Or *maybe* a C-extension wrapped by ctypes.
    >
    > Diez


    I can't port the entire app to be a stored database procedure.

    ctypes, maybe. I just find it odd that there's no quick answer on the
    fastest way in Python to implement a mapping in this context.
    , Jan 28, 2009
    #5
  6. schrieb:
    > On Jan 28, 4:50 pm, Aaron Brady <> wrote:
    >> On Jan 28, 2:38 pm, wrote:
    >>
    >> Hello, quoting myself from another thread today:
    >>
    >> There is the 'shelve' module. You could create a shelf that tells you
    >> the filename of the 5 other ones. A million keys should be no
    >> problem, I guess. (It's standard library.) All your keys have to be
    >> strings, though, and all your values have to be pickleable. If that's
    >> a problem, yes you will need ZODB or Django (I understand), or another
    >> relational DB.
    >>
    >> There is currently no way to store live objects.

    >
    >
    > The problem is NOT archiving these objects. That works fine.


    I know. But if they are sorted to various criteria, doing that inside a
    DB might also be faster. That was the point I wanted to make.

    Diez
    Diez B. Roggisch, Jan 28, 2009
    #6
  7. On Wed, 28 Jan 2009 15:20:41 -0800, ocschwar wrote:

    > On Jan 28, 4:50 pm, Aaron Brady <> wrote:
    >> On Jan 28, 2:38 pm, wrote:
    >>
    >> Hello, quoting myself from another thread today:
    >>
    >> There is the 'shelve' module.  You could create a shelf that tells you
    >> the filename of the 5 other ones.  A million keys should be no problem,
    >> I guess.  (It's standard library.)  All your keys have to be strings,
    >> though, and all your values have to be pickleable.  If that's a
    >> problem, yes you will need ZODB or Django (I understand), or another
    >> relational DB.
    >>
    >> There is currently no way to store live objects.

    >
    >
    > The problem is NOT archiving these objects. That works fine.
    >
    > It's the computations I'm using these thigns for that are slow, and that
    > failed to speed up using __slots__.


    You've profiled and discovered that the computations are slow, not the
    archiving?

    What parts of the computations are slow?


    > What I need is something that will speed up getattr() or its equivalent,
    > and to a lesser degree setattr() or its equivalent.


    As you've found, __slots__ is not that thing.

    >>> class Slotted(object):

    .... __slots__ = 'a'
    .... a = 1
    ....
    >>> class Unslotted(object):

    .... a = 1
    ....
    >>> t1 = Timer('x.a', 'from __main__ import Slotted; x = Slotted()')
    >>> t2 = Timer('x.a', 'from __main__ import Unslotted; x = Unslotted()')
    >>>
    >>> min(t1.repeat(10))

    0.1138761043548584
    >>> min(t2.repeat(10))

    0.11414718627929688


    One micro-optimization you can do is something like this:

    for i in xrange(1000000):
    obj.y = obj.x + 3*obj.x**2
    obj.x = obj.y - obj.x
    # 12 name lookups per iteration


    Becomes:


    y = None
    x = obj.x
    try:
    for i in xrange(1000000):
    y = x + 3*x**2
    x = y - x
    # 6 name lookups per iteration
    finally:
    obj.y = y
    obj.x = x


    Unless you've profiled and has evidence that the bottleneck is attribute
    access, my bet is that the problem is some other aspect of the
    computation. In general, your intuition about what's fast and what's slow
    in Python will be misleading if you're used to other languages. E.g. in C
    comparisons are fast and moving data is slow, but in Python comparisons
    are slow and moving data is fast.


    --
    Steven
    Steven D'Aprano, Jan 29, 2009
    #7
  8. On Jan 29, 12:23 am, wrote:

    > I just find it odd that there's no quick answer on the
    > fastest way in Python to implement a mapping in this context.


    A Python dict is as fast as you can get. If that is not enough, your
    only choice is to try something at the C level, which may give the
    desired speedup or not. Good luck!

    Michele Simionato
    Michele Simionato, Jan 29, 2009
    #8
  9. James Stroud Guest

    wrote:
    > I can't port the entire app to be a stored database procedure.


    Perhaps I underestimate what you mean by this, but you may want to look
    at pyTables (http://www.pytables.org/moin/HowToUse).

    > ctypes, maybe. I just find it odd that there's no quick answer on the
    > fastest way in Python to implement a mapping in this context.


    Your explanation of where your prototype is slow is a little unclear. If
    your data is largely numerical, you may want to rethink your
    organization and use a numeric package. I did something similar and saw
    an order of magnitude speed increase by switching from python data types
    to numpy combined with careful tuning of how I managed the data.

    You may have to spend more time on this than you would like, but if you
    really put some thought into it and grind at your organization, you can
    probably get a significant performance increase.

    James

    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
    James Stroud, Jan 29, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Afanasiy

    Add two dicts

    Afanasiy, Aug 29, 2003, in forum: Python
    Replies:
    12
    Views:
    574
    John Roth
    Aug 30, 2003
  2. AlesD
    Replies:
    19
    Views:
    551
    marius lazer
    Aug 31, 2006
  3. Replies:
    7
    Views:
    535
    Pete Becker
    Jan 25, 2008
  4. bruce
    Replies:
    0
    Views:
    229
    bruce
    Jan 10, 2012
  5. Sebastian Mach
    Replies:
    5
    Views:
    301
Loading...

Share This Page