Memory Usage of Strings

A

Amit Dev

I'm observing a strange memory usage pattern with strings. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.
l = []
for i in xrange(100000):
.... l.append(str(i) * (1000/len(str(i))))

This uses around 100MB of memory as expected and 'del l' will clear that.

.... l.append(str(i) * (5000/len(str(i))))

This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from.

If I reduce the string size, it remains high till it reaches around
1000. In that case it is back to 100MB usage.

Python 2.6.4 on FreeBSD.

Regards,
Amit
 
J

John Gordon

In said:
I'm observing a strange memory usage pattern with strings. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.
l = []
for i in xrange(100000):
... l.append(str(i) * (1000/len(str(i))))
This uses around 100MB of memory as expected and 'del l' will clear that.
... l.append(str(i) * (5000/len(str(i))))
This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from.
If I reduce the string size, it remains high till it reaches around
1000. In that case it is back to 100MB usage.

I don't know anything about the internals of python storage -- overhead,
possible merging of like strings, etc. but some simple character counting
shows that these two loops do not produce the same number of characters.

The first loop produces:

Ten single-digit values of i which are repeated 1000 times for a total of
10000 characters;

Ninety two-digit values of i which are repeated 500 times for a total of
45000 characters;

Nine hundred three-digit values of i which are repeated 333 times for a
total of 299700 characters;

Nine thousand four-digit values of i which are repeated 250 times for a
total of 2250000 characters;

Ninety thousand five-digit values of i which are repeated 200 times for
a total of 18000000 characters.

All that adds up to a grand total of 20604700 characters.

Or, to condense the above long-winded text in table form:

range num digits 1000/len(str(i)) total chars
0-9 10 1 1000 10000
10-99 90 2 500 45000
100-999 900 3 333 299700
1000-9999 9000 4 250 2250000
10000-99999 90000 5 200 18000000
========
grand total chars 20604700

The second loop yields this table:

range num digits 5000/len(str(i)) total bytes
0-9 10 1 5000 50000
10-99 90 2 2500 225000
100-999 900 3 1666 1499400
1000-9999 9000 4 1250 11250000
10000-19999 10000 5 1000 10000000
========
grand total chars 23024400

The two loops do not produce the same numbers of characters, so I'm not
surprised they do not consume the same amount of storage.

P.S.: Please forgive me if I've made some basic math error somewhere.
 
A

Amit Dev

sum(map(len, l)) => 99998200 for 1st case and 99999100 for 2nd case.
Roughly 100MB as I mentioned.

In said:
I'm observing a strange memory usage pattern with strings. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.
l = []
for i in xrange(100000):
...  l.append(str(i) * (1000/len(str(i))))
This uses around 100MB of memory as expected and 'del l' will clear that..
...  l.append(str(i) * (5000/len(str(i))))
This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from.
If I reduce the string size, it remains high till it reaches around
1000. In that case it is back to 100MB usage.

I don't know anything about the internals of python storage -- overhead,
possible merging of like strings, etc.  but some simple character counting
shows that these two loops do not produce the same number of characters.

The first loop produces:

Ten single-digit values of i which are repeated 1000 times for a total of
10000 characters;

Ninety two-digit values of i which are repeated 500 times for a total of
45000 characters;

Nine hundred three-digit values of i which are repeated 333 times for a
total of 299700 characters;

Nine thousand four-digit values of i which are repeated 250 times for a
total of 2250000 characters;

Ninety thousand five-digit values of i which are repeated 200 times for
a total of 18000000 characters.

All that adds up to a grand total of 20604700 characters.

Or, to condense the above long-winded text in table form:

range         num digits 1000/len(str(i))  total chars
0-9            10 1      1000                    10000
10-99          90 2       500                   45000
100-999       900 3       333                  299700
1000-9999    9000 4       250                  2250000
10000-99999 90000 5       200                 18000000
                                             ========
                         grand total chars   20604700

The second loop yields this table:

range         num digits 5000/len(str(i))  total bytes
0-9            10 1      5000                    50000
10-99          90 2      2500                  225000
100-999       900 3      1666                 1499400
1000-9999    9000 4      1250                 11250000
10000-19999 10000 5      1000                 10000000
                                             ========
                         grand total chars   23024400

The two loops do not produce the same numbers of characters, so I'm not
surprised they do not consume the same amount of storage.

P.S.: Please forgive me if I've made some basic math error somewhere.
 
T

Terry Reedy

??

Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit
(AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
import sys
L = []
for i in xrange(100000):
... L.append(str(i) * (1000 / len(str(i))))
...824464

This is only the size of the list object and does not include the sum of
sizes of the string objects. With 8-byth pointers, 824464 == 8*100000 +
(small bit of overhead) + extra space (for list to grow without
reallocation and copy)
L = []
for i in xrange(20000):
... L.append(str(i) * (5000 / len(str(i))))
...178024

== 8*20000 + extra
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top