Why custom objects take so much memory?

J

jsanshef

Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s):
self.mystring = s

mylist = []
for i in range(1024*1024):
mylist.append(MyClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -> 238MB
#stage2 -> 15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

If you change the allocation line for this other:
the numbers decrease substantially to:
#stage1 -> 47.6MB
#stage2 -> 15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)

So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!
 
C

Chris Mellon

Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s):
self.mystring = s

mylist = []
for i in range(1024*1024):
mylist.append(MyClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -> 238MB
#stage2 -> 15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

Classes are fairly heavyweight - in your case you've got the size of
the PyObject struct, a dictionary for class attributes (which itself
is another pyobject), and the string object (yet another pyobject),
and the actual string data.
If you change the allocation line for this other:

the numbers decrease substantially to:
#stage1 -> 47.6MB
#stage2 -> 15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)

string objects don't have dictionaries and are smaller than "regular"
python objects.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

If you're worried about per-instance memory costs Python is probably
not the language for your purposes. On the other hand, odds are that
you actually don't need to worry so much.

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
 
H

Hrvoje Niksic

jsanshef said:
That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

The list itself consumes 4 MB because it stores 1 million PyObject
pointers. It possibly consumes more due to overallocation, but let's
ignore that.

Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr
+ 4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's
not counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12
for gc). That's 88 bytes per object, not counting malloc overhead.

Then there's string allocation: your average string is 6 chars long;
add to that one additional char for the terminating zero. The string
struct takes up 20 bytes + string length, rounded to nearest
alignment. For your average case, that's 27 bytes, rounded (I assume) to 28.
You also allocate 1024*1024 integers which are never freed (they're
kept on a free list), and each of which takes up at least 12 bytes.

All that adds up to 128 bytes per object, dispersed over several
different object types. It doesn't surprise me that Python is eating
200+ MB of memory.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!

Use a new-style class and set __slots__:

class MyClass(object):
__slots__ = 'mystring',
def __init__(self, s):
self.mystring = s

That brings down memory consumption to ~80MB, by cutting down the size
of object instance and removing the dict.
 
A

Aahz

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
 
C

Carl Banks

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....

Shouting absolute commands without full understanding of the situation
is not going to help anyone.

The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

You're obviously smart and highly expert in Python--I just wish you
would be more constructive with your advice.


Carl Banks
 
S

Steven D'Aprano

Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?

Or failing either of those, a good heuristic?

Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.

Are you sure about that? If Python strings are zero terminated, how does
Python deal with this?
'\x00'
 
R

Robert Kern

Steven said:
And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?

Here is Martin v. Löwis giving a few pointers (pardon the pun):

http://mail.python.org/pipermail/python-list/2002-March/135223.html
http://groups.google.de/group/comp.lang.python/msg/b9afcfc2e1de5b05?hl=de
Or failing either of those, a good heuristic?

I thought there was a tool that tried to estimate this using hints from the
type, but Googling has availed me not.
Are you sure about that?

Yes. Look at Include/stringobject.h:

"""
Type PyStringObject represents a character string. An extra zero byte is
reserved at the end to ensure it is zero-terminated, but a size is
present so strings with null bytes in them can be represented. This
is an immutable object type.
"""
If Python strings are zero terminated, how does
Python deal with this?
'\x00'

It stores a length separate from the value. The 0-termination is a courtesy to C
APIs that expect 0-terminated strings. It does not define the end of the Python
string, though.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
H

Hrvoje Niksic

Steven D'Aprano said:
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.

Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
 
J

jsanshef

Thank you all for your useful comments and suggestions!! They're a
great starting point to redesign my script completely ;)


Cheers!
 
C

Colin J. Williams

Hrvoje said:
And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.


For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.

It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
'a\0string'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
 
C

Colin J. Williams

Hrvoje said:
And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.


For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.

It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
'a\0string'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
 
A

Aahz

Shouting absolute commands without full understanding of the situation
is not going to help anyone.

Maybe not, but at least it will get people to stop for a bit.
The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
 
C

Chris Mellon

Maybe not, but at least it will get people to stop for a bit.

No, it will just make people stop ignoring you because you give
inappropriate advice forbidding reasonable solutions without a
rational justification.
The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.

Nothing. Subclasses of a class with __slots__ get a dict just like
anything else, unless they also define slots. Why do you think you can
subclass object to get something you can stick arbitrary attributes
on?
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--

Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
 
J

John Nagle

Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.

Agreed.

I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

John Nagle
SiteTruth
 
F

Fredrik Lundh

John said:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.
> I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".

</F>
 
M

MrJean1

My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.

/Jean Brouwers


John said:
     I'd like to hear more about what kind of performance gain can be
obtained from "__slots__".  I'm looking into ways of speeding up
HTML parsing via BeautifulSoup.  If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine.  C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
     pass

class C1(object):
     pass

class C2(object):
     __slots__ = ["attrib"]

Your mileage may vary.

 > I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".

</F>
 
C

Chris Mellon

John said:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.


Here are my timings of object creation time. Note that as you add
slots, the speed advantage disappears.

C:\>python -m timeit -s "from objs import C0 as c" "c()"
1000000 loops, best of 3: 0.298 usec per loop

C:\>python -m timeit -s "from objs import C1 as c" "c()"
10000000 loops, best of 3: 0.172 usec per loop

C:\>python -m timeit -s "from objs import C2 as c" "c()"
10000000 loops, best of 3: 0.164 usec per loop

C:\>python -m timeit -s "from objs import C3 as c" "c()"
1000000 loops, best of 3: 0.302 usec per loop

#objs.py
import string

class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["foo"]

class C3(object):
__slots__ = list(string.ascii_letters)
 
M

MrJean1

You are correct. Mea culpa.

/Jean Brouwers



unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>
 
J

John Nagle

Fredrik said:
The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

Not much of a win there. Thanks.
The solution to that is spelled "lxml".

I may eventually have to go to a non-Python solution.
But I've finally made enough robustness fixes to BeautifulSoup that
it's usable on large numbers of real-world web sites. (Only two
exceptions in the last 100,000 web sites processed. If you want
to exercise your HTML parser on hard cases, run hostile-code web
sites through it.)

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top