Any problems with lots of attributes?

Frank Millman · Feb 4, 2004

Hi all

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is
quicker.

My application is database-intensive. I have a Table class which is
instantiated for each table that is opened. I have a Column class
which is instantiated for each column in each table. The number of
attributes for each Column can be up to about 20. It is possible for
the number of column instances at any one time to run into the
hundreds, therefore the total number of attributes can run into
thousands.

At the moment I store the data for each column in a single Column
attribute, as a list. If I switch to creating a separate Column
attribute for each piece of data, is there a danger that the number of
attributes may eventually reach a point where any performance gain is
negated or even reversed?

Any advice will be much appreciated.

Frank Millman

Harald Massa · Feb 4, 2004

This is a question about the performance trade-off between two methods

of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name.

"Attributes of Objects" are essentially dictionaries. Access to elements of
dictionaries is O(1), meaning: not depending on the length of the
dictionary. (assuming there are no hash collisions)

You should try to compare performace with numeric indizes to Tuples instead
of lists, cause tuples are invariant they may be faster.

Aahz · Feb 4, 2004

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.

That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Frank Millman · Feb 5, 2004

Aahz said:
That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?

Thanks for your input. Test program is shown below.

Frank

-------------------------------------

from time import time

class frank:
def __init__(self,a,b,c,l):
self.a = a # an attribute
self.b = b # an attribute
self.c = c # an attribute
self.l = l # a list

def getval_from_attr(self):
start = time()
for i in xrange(1000000):
aa = self.a
return (time() - start)

def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)

def setval_in_attr(self,val):
start = time()
for i in xrange(1000000):
self.a = val
return (time() - start)

def setval_in_list(self,val):
start = time()
for i in xrange(1000000):
self.l[0] = val
return (time() - start)

f = frank(100,200,300,[100,200,300])
print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

-------------------------------------

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44

Paul Prescod · Feb 5, 2004

Frank said:
This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead.

20 entries * 100 instances = 2000 entries in 100 dictionaries
20 entries * 100 instances = 2000 list indexes in 100 lists

I really don't expect a big memory difference in normal use. I
personally would not worry about it until I actually experienced a
performance issue. Even before you code your app you know the size of
database you care about: why not simulate it and see how the performance is?

Paul Prescod

John Roth · Feb 5, 2004

Frank Millman said:
Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)

Aahz · Feb 5, 2004

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)

Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Aahz · Feb 5, 2004

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some reference
to each of the elements of each list, so maybe there is an equivalent
overhead. Is this a valid concern, or am I worrying about nothing?

You're mostly worrying about nothing; your analysis is essentially
correct.

def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)

Try this instead:

def getval_from_list(self):
l = self.l
start = time()
for i in xrange(1000000):
aa = l[0]
return (time() - start)

Your original code pays the penalty for *both* dict/attribute lookup and
list lookup.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Michael Hudson · Feb 5, 2004

Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)

The hash is stored in the dictentry struct, which gets you to three
times; I suppose it's possible alignment would get you four but I
don't think so.

Cheers,
mwh

Frank Millman · Feb 6, 2004

You're mostly worrying about nothing; your analysis is essentially
correct.

Thanks a lot, everybody, for the replies. I will proceed on the
assumption that I do not need to worry about this.

I have to change a lot of things in my app if I switch from lists to
attributes, and I was nervous of doing all the changes and then
finding that I had a problem. I am now reasonably confident that I
will not have a problem, and the code should be cleaner and faster
when I have finished, so I will get stuck in.

Thanks again

Frank

SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Sort by number of characters	0	Nov 3, 2023
Is this right way to convert data attributes values to number in javascipt? Need to get valid numeric value or 0	2	May 30, 2023
Add the value taken from the user to my range for ActiveX ComboBox	0	Aug 27, 2022
Difficulties with the addition of an accordion	0	Feb 6, 2024
Correct javascript generates problems	1	Dec 31, 2022
A website that I couldn't make a screenshot of it nor save any page from.	1	Oct 29, 2023
data attributes override method attributes?	16	Sep 25, 2012

Any problems with lots of attributes?

Frank Millman

Harald Massa

Aahz

Frank Millman

Paul Prescod

John Roth

Aahz

Aahz

Michael Hudson

Frank Millman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

Any problems with *lots* of attributes?

Frank Millman

Harald Massa

Aahz

Frank Millman

Paul Prescod

John Roth

Aahz

Aahz

Michael Hudson

Frank Millman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

Any problems with lots of attributes?