Finding size of Variable

A

Ayushi Dalmia

Feb 5, 2014

#21

Ayushi Dalmia said:
Ayushi Dalmia said:

On 2014-02-04 14:21, Dave Angel wrote:

To get the "total" size of a list of strings, try (untested):

a = sys.getsizeof (mylist )

for item in mylist:

a += sys.getsizeof (item)

I always find this sort of accumulation weird (well, at least in

Python; it's the *only* way in many other languages) and would write

it as

a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)

-tkc

Click to expand...

This also doesn't gives the true size. I did the following:
import sys

data=[]

Click to expand...

f=open('stopWords.txt','r')

Click to expand...

for line in f:

line=line.split()

Click to expand...

data.extend(line)

Click to expand...

print sys.getsizeof(data)

Click to expand...

Did you actually READ either of my posts or Tim's? For a

container, you can't just use getsizeof on the container.

a = sys.getsizeof (data)

for item in mylist:

a += sys.getsizeof (data)

print a

Yes, I did. I now understand how to find the size.

M

Mark Lawrence

Feb 5, 2014

#22

On 05/02/2014 14:33, Ayushi Dalmia wrote:

Please stop sending double line spaced messages, just follow the
instructions here https://wiki.python.org/moin/GoogleGroupsPython to
prevent this happening, thanks.

D

Dennis Lee Bieber

Feb 6, 2014

#23

My guess is that if you split a 4K file into words, then put the words
into a list, you'll probably end up with 6-8K in memory.

Click to expand...

I'd guess rather more; Python strings have a fair bit of fixed
overhead, so with a whole lot of small strings, it will get more
costly.
'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32
bit (Intel)]'

sys.getsizeof("asdf")

Click to expand...

29

import sys
indata = "221B or not to be seeing you again"
sys.getsizeof(indata) 67
worddata = indata.split()
worddata ['221B', 'or', 'not', 'to', 'be', 'seeing', 'you', 'again']
sys.getsizeof(worddata) + sum(sys.getsizeof(wd) for wd in worddata)

Click to expand...

Click to expand...

451

That's a 7X expansion for just splitting a single line into a list of
words.

W

wxjmfauth

Feb 6, 2014

#24

Le mercredi 5 février 2014 12:44:47 UTC+1, Chris Angelico a écrit :

My guess is that if you split a 4K file into words, then put the words

Click to expand...

into a list, you'll probably end up with 6-8K in memory.

Click to expand...

I'd guess rather more; Python strings have a fair bit of fixed

overhead, so with a whole lot of small strings, it will get more

costly.

'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32

bit (Intel)]'

29

"Stop words" tend to be short, rather than long, words, so I'd look at

an average of 2-3 letters per word. Assuming they're separated by

spaces or newlines, that means there'll be roughly a thousand of them

in the file, for about 25K of overhead. A bit less if the words are

longer, but still quite a bit. (Byte strings have slightly less

overhead, 17 bytes apiece, but still quite a bit.)

ChrisA

sum([sys.getsizeof(c) for c in ['a']]) 26
sum([sys.getsizeof(c) for c in ['a', 'a EURO']]) 68
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']]) 112
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']]) 158
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 'aaaaaaaaaaaaaaaaaaaa EURO']]) 238

sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a']]) 21
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO']]) 46
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO']]) 75
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']]) 108
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 'aaaaaaaaaaaaaaaaaaaa EURO']]) 209

sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336
sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150
sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261
sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135

Click to expand...

Click to expand...

jmf

N

Ned Batchelder

Feb 6, 2014

#25

sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336
sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150
sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261
sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135

Click to expand...

Click to expand...

jmf

JMF, we've told you I-don't-know-how-many-times to stop this.
Seriously: think hard about what your purpose is in sending these absurd
benchmarks. I guarantee you are not accomplishing it.

W

wxjmfauth

Feb 6, 2014

#26

Le jeudi 6 février 2014 12:10:08 UTC+1, Ned Batchelder a écrit :

sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3])

Click to expand...

336

Click to expand...

sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3])

Click to expand...

150

Click to expand...

sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3])

Click to expand...

261

Click to expand...

sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3])

Click to expand...

135

Click to expand...

jmf

Click to expand...

Click to expand...

JMF, we've told you I-don't-know-how-many-times to stop this.

Seriously: think hard about what your purpose is in sending these absurd

benchmarks. I guarantee you are not accomplishing it.

Sorry, I'm only pointing you may lose memory when
working with short strings as it was explained.
I really, very really, do not see what is absurd
or obsure in:
37

I apologize for the " a EURO" which should have
been a real "EURO". No idea, what's happend.

jmf

W

wxjmfauth

Feb 6, 2014

#27

Some mysterious problem with the "euro".
Let's take a real "French" char.37

or a "German" char, áºžáºžáºžáºžáºž
37

S

Steven D'Aprano

Feb 8, 2014

#28

Sorry, I'm only pointing you may lose memory when working with short
strings as it was explained. I really, very really, do not see what is
absurd or obsure in:

37

Why do you care about NINE bytes? The least amount of memory in any PC
that I know about is 500000000 bytes, more than fifty million times more.
And you are whinging about wasting nine bytes?

If you care about that lousy nine bytes, Python is not the language for
you. Go and program in C, where you can spent ten or twenty times longer
programming, but save nine bytes in every string.

Nobody cares about your memory "benchmark" except you. Python is not
designed to save memory, Python is designed to use as much memory as
needed to give the programmer an easier job. In C, I can store a single
integer in a single byte. In Python, horror upon horrors, it takes 14
bytes!!!

py> sys.getsizeof(1)
14

We consider it A GOOD THING that Python spends memory for programmer
convenience and safety. Python looks for memory optimizations when it can
save large amounts of memory, not utterly trivial amounts. So in a Python
wide build, a ten-thousand block character string requires a little bit
more than 40KB. In Python 3.3, that can be reduced to only 10KB for a
purely Latin-1 string, or 20K for a string without any astral characters.
That's the sort of memory savings that are worthwhile, reducing memory
usage by 75%.

Could Python save memory by using UTF-8? Yes. But it would cost
complexity and time, strings would be even slower than they are now. That
is not a trade-off that the core developers have chosen to make, and I
agree with them.

E

Ethan Furman

Feb 8, 2014

#29

That is not a trade-off that the core developers have chosen to make,
and I agree with them.

Even though you haven't broken all the build-bots yet, you can still stop saying "them".

M

Mark Lawrence

Feb 8, 2014

#30

Why do you care about NINE bytes? The least amount of memory in any PC
that I know about is 500000000 bytes, more than fifty million times more.
And you are whinging about wasting nine bytes?

If you care about that lousy nine bytes, Python is not the language for
you. Go and program in C, where you can spent ten or twenty times longer
programming, but save nine bytes in every string.

Nobody cares about your memory "benchmark" except you. Python is not
designed to save memory, Python is designed to use as much memory as
needed to give the programmer an easier job. In C, I can store a single
integer in a single byte. In Python, horror upon horrors, it takes 14
bytes!!!

py> sys.getsizeof(1)
14

We consider it A GOOD THING that Python spends memory for programmer
convenience and safety. Python looks for memory optimizations when it can
save large amounts of memory, not utterly trivial amounts. So in a Python
wide build, a ten-thousand block character string requires a little bit
more than 40KB. In Python 3.3, that can be reduced to only 10KB for a
purely Latin-1 string, or 20K for a string without any astral characters.
That's the sort of memory savings that are worthwhile, reducing memory
usage by 75%.

Could Python save memory by using UTF-8? Yes. But it would cost
complexity and time, strings would be even slower than they are now. That
is not a trade-off that the core developers have chosen to make, and I
agree with them.

This is a C +1 to save memory when compared against this Python +1

R

Rustom Mody

Feb 9, 2014

#31

One could argue that if you're parsing a particular file, a very large one, that those 9 bytes can go into the optimization of parsing aforementioned file. Of, course we have faster processors, so why care?
Because it goes into the optimization of the code one is 'developing' in python.

Yes... There are cases when python is an inappropriate language to use...
So???

Its good to get a bit of context here.

loop:
jmf says python is inappropriate.
Someone asks him: Is it? In what case?
jmf: No answer
After a delay of few days jmp to start of loop

[BTW: In my book this classic trolling]

C

Chris Angelico

Feb 9, 2014

#32

I didn't say she couldn't optimize in another language, and was just
prototyping in Python. I just said she was optimizing her python
code...dufus.

And there are a *lot* of cases where that is inappropriate language to
use. Please don't.

ChrisA

N

Ned Batchelder

Feb 9, 2014

#33

On Sat, Feb 8, 2014 at 8:25 PM, Rustom Mody <[email protected]

large one, that those 9 bytes can go into the optimization of
parsing aforementioned file. Of, course we have faster processors,
so why care?
'developing' in python.

Yes... There are cases when python is an inappropriate language to
use...
So???

I didn't say she couldn't optimize in another language, and was just
prototyping in Python. I just said she was optimizing her python
code...dufus.

Please keep the discussion respectful. Misunderstandings are easy, I
suspect this is one of them. There's no reason to start calling people
names.

Its good to get a bit of context here.

loop:
jmf says python is inappropriate.
Someone asks him: Is it? In what case?
jmf: No answer
After a delay of few days jmp to start of loop

loop:
mov head,up_your_ass
push repeat
pop repeat
jmp loop

Please keep in mind the Code of Conduct:

http://www.python.org/psf/codeofconduct

Thanks.

[BTW: In my book this classic trolling]
--

And the title of this book would be..."Pieces of Cliche Bullshit
Internet Arguments for Dummies"

https://mail.python.org/mailman/listinfo/python-list

D

David Hutto

Feb 9, 2014

#34

Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
and comment back, off list, next time.

N

Ned Batchelder

Feb 9, 2014

#35

Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
and comment back, off list, next time.

I'm not sure what happened in this thread. It might be that you think
Rustom Mody was referring to you when he said, "BTW: In my book this
classic trolling." I don't think he was, I think he was referring to JMF.

In any case, perhaps it would be best to just take a break?

R

Rustom Mody

Feb 9, 2014

#36

I'm not sure what happened in this thread. It might be that you think
Rustom Mody was referring to you when he said, "BTW: In my book this
classic trolling." I don't think he was, I think he was referring to JMF.

Of course!
And given the turn of this thread, we must hand it to jmf for being even better at trolling than I thought

See the first para
http://en.wikipedia.org/wiki/Troll_(Internet)

W

wxjmfauth

Feb 10, 2014

#37

Le samedi 8 février 2014 03:48:12 UTC+1, Steven D'Aprano a écrit :

We consider it A GOOD THING that Python spends memory for programmer

convenience and safety. Python looks for memory optimizations when it can

save large amounts of memory, not utterly trivial amounts. So in a Python

wide build, a ten-thousand block character string requires a little bit

more than 40KB. In Python 3.3, that can be reduced to only 10KB for a

purely Latin-1 string, or 20K for a string without any astral characters.

That's the sort of memory savings that are worthwhile, reducing memory

usage by 75%.

In its attempt to save memory, Python only succeeds to
do worse than any utf* coding schemes.

---

Python does not save memory at all. A str (unicode string)
uses less memory only - and only - because and when one uses
explicitly characters which are consuming less memory.

Not only the memory gain is zero, Python falls back to the
worse case.
4000048

The opposite of what the utf8/utf16 do!
2000025

jmf

A

Asaf Las

Feb 10, 2014

#38

On Monday, February 10, 2014 4:07:14 PM UTC+2, (e-mail address removed) wrote:
Interesting
here you get string type
and here bytes

Why?

M

Mark Lawrence

Feb 10, 2014

#39

On Monday, February 10, 2014 4:07:14 PM UTC+2, (e-mail address removed) wrote:
Interesting

here you get string type

and here bytes

Why?

Please don't feed this particular troll, he's spent 18 months driving us
nuts with his nonsense.

T

Tim Chase

Feb 10, 2014

#40

Python does not save memory at all. A str (unicode string)
uses less memory only - and only - because and when one uses
explicitly characters which are consuming less memory.

Not only the memory gain is zero, Python falls back to the
worse case.

4000048

If Python used UTF-32 for EVERYTHING, then all three of those cases
would be 4000048, so it clearly disproves your claim that "python
does not save memory at all".

The opposite of what the utf8/utf16 do!

2000025

However, as pointed out repeatedly, string-indexing in fixed-width
encodings are O(1) while indexing into variable-width encodings (e.g.
UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing
while saving space when a string doesn't need to use a full 32-bit
width.

-tkc

Getting value of instances of variable.	1	Mar 25, 2023
Total maximal size of data	11	Jan 25, 2010
Embedding Python: estimate size of dict/list	0	Mar 29, 2011
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
Strange Behaviour in finding Size of a File	34	Nov 9, 2012
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020
Finding Relative Maxima in Python3	1	May 31, 2013
Hash array with variable size?	5	Feb 28, 2011

Ayushi Dalmia

Mark Lawrence

Dennis Lee Bieber

wxjmfauth

Ned Batchelder

wxjmfauth

wxjmfauth

Steven D'Aprano

Ethan Furman

Mark Lawrence

Rustom Mody

Chris Angelico

Ned Batchelder

David Hutto

Ned Batchelder

Rustom Mody

wxjmfauth

Asaf Las

Mark Lawrence

Tim Chase

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads