maximum length of a list & tuple

L

Lupe

hi,

I'm finishing a small program which uses recursion for less than 100 levels.

Each time the function is called, it compares an element of a list (which is
a list too) to other elements, which all amounts to millions of
comparisons.

The program is working for short lists but not for longer ones. I think
there are two possible problems: either disk space is not enough to save
lists as the program is executed or there is a finite manageable list
lenght that is reached by the program.

hence my question:

is there a limit? which one? I'm using list.append()
what about for tuples?

I'll rewrite that part of the code and will overcome the problem, since I
only need a few elements of the list generated, but I'm still curious about
the answers

Thanks in advanve

Lupe
 
E

Ed Suominen

I think the limit is 1000 recursion levels, but you shouldn't need to ever
get anywhere near that. Use generators to break up the iteration of your
list comparisons.

-Ed Suominen
 
L

Larry Bates

I've constructed lists with 100,000+ elements and
they have worked flawlessly.

FYI,
Larry Bates
Syscon, Inc.
 
J

Josiah Carlson

I just would like to know if there is any limit to a list or tuple.

It depends on how much memory you have.

Run the following and the last thing it prints out is your limit...

c = 1
while c < 2**32:
try:
d = [1]*c
print c
c *= 2
except:
break


- Josiah
 
T

Tim Roberts

Josiah Carlson said:
I just would like to know if there is any limit to a list or tuple.

It depends on how much memory you have.

Run the following and the last thing it prints out is your limit...

c = 1
while c < 2**32:
try:
d = [1]*c
print c
c *= 2
except:
break

Interesting experiment. I get 64M on my 384MB machine, which suggests 4
bytes per list entry. Of course, now my swap file is packed full, and all
my apps are running slowly while they page themselves back in...
 
J

Josiah Carlson

Run the following and the last thing it prints out is your limit...
c = 1
while c < 2**32:
try:
d = [1]*c
print c
c *= 2
except:
break


Interesting experiment. I get 64M on my 384MB machine, which suggests 4
bytes per list entry. Of course, now my swap file is packed full, and all
my apps are running slowly while they page themselves back in...

Far more than 4 bytes per list entry. 4 bytes to store the integers
themselves, but since integers are Python objects, and lists are arrays
of pointers to list objects, there is quite a bit of other extra stuff
attached.

I can't remember exactly how much extra stuff, but it can be found by
looking through the sources. You could check the amount of memory used
before, and the memory used at peak, and divide it by your 64M to find
out how much is used. I just did, but I'll also leave it as an exercise
to the reader.

- Josiah
 
P

Peter Hansen

Josiah said:
Run the following and the last thing it prints out is your limit...

c = 1
while c < 2**32:
try:
d = [1]*c
print c
c *= 2
except:
break



Interesting experiment. I get 64M on my 384MB machine, which suggests 4
bytes per list entry. Of course, now my swap file is packed full, and
all
my apps are running slowly while they page themselves back in...


Far more than 4 bytes per list entry. 4 bytes to store the integers
themselves, but since integers are Python objects, and lists are arrays
of pointers to list objects, there is quite a bit of other extra stuff
attached.

Actually, unless I'm mistaken that stores only one integer, and a whole
lot of references to it... which are just pointers (4 bytes each).

The technique is probably flawed in other ways, however, as it
is likely subject to memory fragmentation and other problems.

-Peter
 
J

Josiah Carlson

Actually, unless I'm mistaken that stores only one integer, and a whole
lot of references to it... which are just pointers (4 bytes each).

The technique is probably flawed in other ways, however, as it
is likely subject to memory fragmentation and other problems.

You can't just store the integer. How would you differentiate between
an integer in a list and a pointer? Answer: you must use PyIntObjects.
Use the source.

According to intobject.c, intobject.h and object.h, the structure of
every intobject (after the macros have been expanded) is as follows:

typedef struct {
int ob_refcnt;
_typeobject *ob_type;
long ob_ival;
} PyIntObject;

The size of each of those on a 32 bit machine is 4 bytes, plus the
pointer from the list (another 4 bytes), which leaves us with a total of
16 bytes per integer object in the list.

Checking the memory usage of Python before and after the creation of 1
million integer list, I get a /very/ consistant ~16 bytes per. That
would be 12 bytes for the object, and 4 bytes for each pointer in the
list. There is some additional overhead that you'd notice in
intobject.c and/or intobject.h, but yeah. Generally 16 bytes per integer.

As for memory fragmentation, considering how memory paging and
segmentation works, that's all underlying stuff that the OS deals with.
I could go on as to why it doesn't matter in this case (considering
the large mallocs necessary for the creation of large lists), but I'll
leave that as an exercise to the reader.


Again, from the source, 12 bytes for the object, 4 bytes for the pointer
in the list, total of 16 bytes per intobject in a list.

- Josiah
 
P

Peter Hansen

Josiah said:
You can't just store the integer. How would you differentiate between
an integer in a list and a pointer? Answer: you must use PyIntObjects.
Use the source.

Python does not recognize anything called "pointers" at the language
level, only internally.

What I was saying is that the only PyIntObject created was one with
the ob_ival of 1. Then a list containing one pointer to it was
created. Then it was replicated "c" times. Only one integer.

Please refute that claim rather than just arguing a bunch of other
points, none of which I dispute but all of which I claim are irrelevant
to the question at hand.
Checking the memory usage of Python before and after the creation of 1
million integer list, I get a /very/ consistant ~16 bytes per. That
would be 12 bytes for the object, and 4 bytes for each pointer in the
list.

Please post the expression you are using, for comparison.
I could go on as to why it doesn't matter in this case (considering the
large mallocs necessary for the creation of large lists), but I'll leave
that as an exercise to the reader.

Please go on about it. I'd like to see where you discuss what happens
to the previously memory that was held in earlier lists, as it iterates
through the loop creating exponentially larger lists.
Again, from the source, 12 bytes for the object, 4 bytes for the pointer
in the list, total of 16 bytes per intobject in a list.

Nice. So each entry in the list is 4 bytes (as I indicated). And if
they all point to a single instance of the integer 1, how many extra
bytes do you think that allocates?

-Peter
 
P

Peter Hansen

Peter said:
Please post the expression you are using, for comparison.

By the way, in case it wasn't clear, we have a disagreement (and perhaps
a misunderstanding) only about the meaning of the original code posted.
You believe it is allocating many integers. I believe it allocates
only a single one. Perhaps one of us is wrong. I hadn't read the code
in great detail, and still haven't, and perhaps that's my mistake.
Depending on your reply, I'll actually go back and look at it closely
and perhaps execute it myself and see the result (in terms of the list
produced, not in terms of the memory consumption). Maybe you will
do the same and we'll identify which of us is mistaken...

-Peter
 
J

Josiah Carlson

You can't just store the integer. How would you differentiate between
Python does not recognize anything called "pointers" at the language
level, only internally.

What I was saying is that the only PyIntObject created was one with
the ob_ival of 1. Then a list containing one pointer to it was
created. Then it was replicated "c" times. Only one integer.

Yeah, I was thinking of the case in generating a sequence of integers
via range. In that case, range(N) produces 12-byte intobjects and 4
byte pointers. In the case of c*[1), indeed you are right, one object
gets created, and some c pointers to that object are generated for the list.

Seems I got off topic and we are both right.

- Josiah
 
B

Bengt Richter

Peter said:
Please post the expression you are using, for comparison.

By the way, in case it wasn't clear, we have a disagreement (and perhaps
a misunderstanding) only about the meaning of the original code posted.
You believe it is allocating many integers. I believe it allocates
only a single one. Perhaps one of us is wrong. I hadn't read the code
in great detail, and still haven't, and perhaps that's my mistake.
Depending on your reply, I'll actually go back and look at it closely
and perhaps execute it myself and see the result (in terms of the list
produced, not in terms of the memory consumption). Maybe you will
do the same and we'll identify which of us is mistaken...
>>> dict([(id(x),x) for x in [1]*100]) {7946208: 1}
>>> dict([(id(x),x) for x in range(5)]) {7939088: 0, 7946208: 1, 7943184: 4, 7945200: 2, 7944192: 3}
>>> dict([(id(x),x) for x in range(5)*10])
{7939088: 0, 7946208: 1, 7943184: 4, 7945200: 2, 7944192: 3}
... print dict([(id(x),x) for x in [v for i in xrange(5)]])
...
{8053508: -6}
{7936720: -5}
{7939088: 0}
{7946208: 1}
{8042496: 99}
{8166456: 100} ... print dict([(id(x),x) for x in [v*1 for i in xrange(5)]])
...
{8166384: -6, 8166444: -6, 8166492: -6, 8166360: -6, 8166468: -6}
{7936720: -5}
{7939088: 0}
{7946208: 1}
{8042496: 99}
{8166360: 100, 8166468: 100, 8166480: 100, 8166348: 100, 8166492: 100}

Regards,
Bengt Richter
 
S

Steve

Lupe said:
I just would like to know if there is any limit to a list or tuple.

Yes. Since the universe only contains something like
10**85 particles of matter, the upper limit to a list
is roughly 2**(10**85) bits.

Sorry, I couldn't resist :-D

You can store as many things in a list as you have
memory for. I suppose there is probably some
fundamental limit due to 32 bit addressing issues, but
are you sure you are really storing so many items you
are hitting that limit? I'd guess you aren't.

For example, the other day I was playing around with
Python, and just for a lark I created a list with
300,000,000 instances of None. It took about five
minutes, but it worked :)

Going back to your original problem:
> Each time the function is called, it compares an
> element of a list (which is a list too) to other
> elements, which all amounts to millions of
> comparisons.

Are you sure there isn't a more efficient algorithm for
what you are trying to do?
> The program is working for short lists but not for
> longer ones.

What do you mean it is working? It doesn't finish? It
is too slow? Or it gives the wrong answer?

If it is giving a wrong answer, there is a bug in your
code. If it is running too slowly, chances are there is
a better algorithm that will fix it.
 
S

Steve

Lupe said:
I just would like to know if there is any limit to a list or tuple.

Yes. Since the universe only contains something like
10**85 particles of matter, the upper limit to a list
is roughly 2**(10**85) bits.

Sorry, I couldn't resist :-D

You can store as many things in a list as you have
memory for. I suppose there is probably some
fundamental limit due to 32 bit addressing issues, but
are you sure you are really storing so many items you
are hitting that limit? I'd guess you aren't.

For example, the other day I was playing around with
Python, and just for a lark I created a list with
300,000,000 instances of None. It took about five
minutes, but it worked :)

Going back to your original problem:
> Each time the function is called, it compares an
> element of a list (which is a list too) to other
> elements, which all amounts to millions of
> comparisons.

Are you sure there isn't a more efficient algorithm for
what you are trying to do?
> The program is working for short lists but not for
> longer ones.

What do you mean it is working? It doesn't finish? It
is too slow? Or it gives the wrong answer?

If it is giving a wrong answer, there is a bug in your
code. If it is running too slowly, chances are there is
a better algorithm that will fix it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top