Memory Usage of Strings

Discussion in 'Python' started by Amit Dev, Mar 16, 2011.

  1. Amit Dev

    Amit Dev Guest

    I'm observing a strange memory usage pattern with strings. Consider
    the following session. Idea is to create a list which holds some
    strings so that cumulative characters in the list is 100MB.

    >>> l = []
    >>> for i in xrange(100000):

    .... l.append(str(i) * (1000/len(str(i))))

    This uses around 100MB of memory as expected and 'del l' will clear that.


    >>> for i in xrange(20000):

    .... l.append(str(i) * (5000/len(str(i))))

    This is using 165MB of memory. I really don't understand where the
    additional memory usage is coming from.

    If I reduce the string size, it remains high till it reaches around
    1000. In that case it is back to 100MB usage.

    Python 2.6.4 on FreeBSD.

    Regards,
    Amit
    Amit Dev, Mar 16, 2011
    #1
    1. Advertising

  2. Amit Dev

    John Gordon Guest

    In <> Amit Dev <> writes:

    > I'm observing a strange memory usage pattern with strings. Consider
    > the following session. Idea is to create a list which holds some
    > strings so that cumulative characters in the list is 100MB.


    > >>> l = []
    > >>> for i in xrange(100000):

    > ... l.append(str(i) * (1000/len(str(i))))


    > This uses around 100MB of memory as expected and 'del l' will clear that.


    > >>> for i in xrange(20000):

    > ... l.append(str(i) * (5000/len(str(i))))


    > This is using 165MB of memory. I really don't understand where the
    > additional memory usage is coming from.


    > If I reduce the string size, it remains high till it reaches around
    > 1000. In that case it is back to 100MB usage.


    I don't know anything about the internals of python storage -- overhead,
    possible merging of like strings, etc. but some simple character counting
    shows that these two loops do not produce the same number of characters.

    The first loop produces:

    Ten single-digit values of i which are repeated 1000 times for a total of
    10000 characters;

    Ninety two-digit values of i which are repeated 500 times for a total of
    45000 characters;

    Nine hundred three-digit values of i which are repeated 333 times for a
    total of 299700 characters;

    Nine thousand four-digit values of i which are repeated 250 times for a
    total of 2250000 characters;

    Ninety thousand five-digit values of i which are repeated 200 times for
    a total of 18000000 characters.

    All that adds up to a grand total of 20604700 characters.

    Or, to condense the above long-winded text in table form:

    range num digits 1000/len(str(i)) total chars
    0-9 10 1 1000 10000
    10-99 90 2 500 45000
    100-999 900 3 333 299700
    1000-9999 9000 4 250 2250000
    10000-99999 90000 5 200 18000000
    ========
    grand total chars 20604700

    The second loop yields this table:

    range num digits 5000/len(str(i)) total bytes
    0-9 10 1 5000 50000
    10-99 90 2 2500 225000
    100-999 900 3 1666 1499400
    1000-9999 9000 4 1250 11250000
    10000-19999 10000 5 1000 10000000
    ========
    grand total chars 23024400

    The two loops do not produce the same numbers of characters, so I'm not
    surprised they do not consume the same amount of storage.

    P.S.: Please forgive me if I've made some basic math error somewhere.

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
    John Gordon, Mar 16, 2011
    #2
    1. Advertising

  3. Amit Dev

    Amit Dev Guest

    sum(map(len, l)) => 99998200 for 1st case and 99999100 for 2nd case.
    Roughly 100MB as I mentioned.

    On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <> wrote:
    > In <> Amit Dev <> writes:
    >
    >> I'm observing a strange memory usage pattern with strings. Consider
    >> the following session. Idea is to create a list which holds some
    >> strings so that cumulative characters in the list is 100MB.

    >
    >> >>> l = []
    >> >>> for i in xrange(100000):

    >> ...  l.append(str(i) * (1000/len(str(i))))

    >
    >> This uses around 100MB of memory as expected and 'del l' will clear that..

    >
    >> >>> for i in xrange(20000):

    >> ...  l.append(str(i) * (5000/len(str(i))))

    >
    >> This is using 165MB of memory. I really don't understand where the
    >> additional memory usage is coming from.

    >
    >> If I reduce the string size, it remains high till it reaches around
    >> 1000. In that case it is back to 100MB usage.

    >
    > I don't know anything about the internals of python storage -- overhead,
    > possible merging of like strings, etc.  but some simple character counting
    > shows that these two loops do not produce the same number of characters.
    >
    > The first loop produces:
    >
    > Ten single-digit values of i which are repeated 1000 times for a total of
    > 10000 characters;
    >
    > Ninety two-digit values of i which are repeated 500 times for a total of
    > 45000 characters;
    >
    > Nine hundred three-digit values of i which are repeated 333 times for a
    > total of 299700 characters;
    >
    > Nine thousand four-digit values of i which are repeated 250 times for a
    > total of 2250000 characters;
    >
    > Ninety thousand five-digit values of i which are repeated 200 times for
    > a total of 18000000 characters.
    >
    > All that adds up to a grand total of 20604700 characters.
    >
    > Or, to condense the above long-winded text in table form:
    >
    > range         num digits 1000/len(str(i))  total chars
    > 0-9            10 1      1000                    10000
    > 10-99          90 2       500                   45000
    > 100-999       900 3       333                  299700
    > 1000-9999    9000 4       250                  2250000
    > 10000-99999 90000 5       200                 18000000
    >                                              ========
    >                          grand total chars   20604700
    >
    > The second loop yields this table:
    >
    > range         num digits 5000/len(str(i))  total bytes
    > 0-9            10 1      5000                    50000
    > 10-99          90 2      2500                  225000
    > 100-999       900 3      1666                 1499400
    > 1000-9999    9000 4      1250                 11250000
    > 10000-19999 10000 5      1000                 10000000
    >                                              ========
    >                          grand total chars   23024400
    >
    > The two loops do not produce the same numbers of characters, so I'm not
    > surprised they do not consume the same amount of storage.
    >
    > P.S.: Please forgive me if I've made some basic math error somewhere.
    >
    > --
    > John Gordon                   A is for Amy, who fell down the stairs
    >              B is for Basil, assaulted by bears
    >                                -- Edward Gorey, "The Gashlycrumb Tinies"
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    Amit Dev, Mar 16, 2011
    #3
  4. Amit Dev

    Terry Reedy Guest

    On 3/16/2011 3:51 PM, Santoso Wijaya wrote:
    > ??
    >
    > Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit
    > (AMD64)] on
    > win32
    > Type "help", "copyright", "credits" or "license" for more information.
    > >>> import sys
    > >>> L = []
    > >>> for i in xrange(100000):

    > ... L.append(str(i) * (1000 / len(str(i))))
    > ...
    > >>> sys.getsizeof(L)

    > 824464


    This is only the size of the list object and does not include the sum of
    sizes of the string objects. With 8-byth pointers, 824464 == 8*100000 +
    (small bit of overhead) + extra space (for list to grow without
    reallocation and copy)

    > >>> L = []
    > >>> for i in xrange(20000):

    > ... L.append(str(i) * (5000 / len(str(i))))
    > ...
    > >>> sys.getsizeof(L)

    > 178024


    == 8*20000 + extra

    --
    Terry Jan Reedy
    Terry Reedy, Mar 16, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. metfan
    Replies:
    2
    Views:
    4,837
    Robert Olofsson
    Oct 21, 2003
  2. hvt
    Replies:
    0
    Views:
    1,197
  3. hvt
    Replies:
    0
    Views:
    1,452
  4. Krist
    Replies:
    8
    Views:
    6,358
    Arne Vajhøj
    Feb 10, 2010
  5. MrsEntity
    Replies:
    20
    Views:
    459
Loading...

Share This Page