Any problems with *lots* of attributes?

F

Frank Millman

Hi all

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is
quicker.

My application is database-intensive. I have a Table class which is
instantiated for each table that is opened. I have a Column class
which is instantiated for each column in each table. The number of
attributes for each Column can be up to about 20. It is possible for
the number of column instances at any one time to run into the
hundreds, therefore the total number of attributes can run into
thousands.

At the moment I store the data for each column in a single Column
attribute, as a list. If I switch to creating a separate Column
attribute for each piece of data, is there a danger that the number of
attributes may eventually reach a point where any performance gain is
negated or even reversed?

Any advice will be much appreciated.

Frank Millman
 
H

Harald Massa

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name.

"Attributes of Objects" are essentially dictionaries. Access to elements of
dictionaries is O(1), meaning: not depending on the length of the
dictionary. (assuming there are no hash collisions)

You should try to compare performace with numeric indizes to Tuples instead
of lists, cause tuples are invariant they may be faster.
 
A

Aahz

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.

That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
F

Frank Millman

Aahz said:
That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.
Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?

Thanks for your input. Test program is shown below.

Frank

-------------------------------------

from time import time

class frank:
def __init__(self,a,b,c,l):
self.a = a # an attribute
self.b = b # an attribute
self.c = c # an attribute
self.l = l # a list

def getval_from_attr(self):
start = time()
for i in xrange(1000000):
aa = self.a
return (time() - start)

def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)

def setval_in_attr(self,val):
start = time()
for i in xrange(1000000):
self.a = val
return (time() - start)

def setval_in_list(self,val):
start = time()
for i in xrange(1000000):
self.l[0] = val
return (time() - start)

f = frank(100,200,300,[100,200,300])
print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

-------------------------------------

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
 
P

Paul Prescod

Frank said:
This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead.

20 entries * 100 instances = 2000 entries in 100 dictionaries
20 entries * 100 instances = 2000 list indexes in 100 lists

I really don't expect a big memory difference in normal use. I
personally would not worry about it until I actually experienced a
performance issue. Even before you code your app you know the size of
database you care about: why not simulate it and see how the performance is?

Paul Prescod
 
J

John Roth

Frank Millman said:
Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44


This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)
 
A

Aahz

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)

Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
A

Aahz

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some reference
to each of the elements of each list, so maybe there is an equivalent
overhead. Is this a valid concern, or am I worrying about nothing?

You're mostly worrying about nothing; your analysis is essentially
correct.
def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)

Try this instead:

def getval_from_list(self):
l = self.l
start = time()
for i in xrange(1000000):
aa = l[0]
return (time() - start)

Your original code pays the penalty for *both* dict/attribute lookup and
list lookup.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
M

Michael Hudson

Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)

The hash is stored in the dictentry struct, which gets you to three
times; I suppose it's possible alignment would get you four but I
don't think so.

Cheers,
mwh
 
F

Frank Millman

You're mostly worrying about nothing; your analysis is essentially
correct.

Thanks a lot, everybody, for the replies. I will proceed on the
assumption that I do not need to worry about this.

I have to change a lot of things in my app if I switch from lists to
attributes, and I was nervous of doing all the changes and then
finding that I had a problem. I am now reasonably confident that I
will not have a problem, and the code should be cleaner and faster
when I have finished, so I will get stuck in.

Thanks again

Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top