Iterator length

B

bearophileHUGS

Often I need to tell the len of an iterator, this is a stupid example:

len isn't able to tell it:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

This is a bad solution, it may need too much memory, etc:

This is a simple solution in a modern Python:
50

This is a faster solution (and Psyco helps even more):

def leniter(iterator):
"""leniter(iterator): return the length of an iterator,
consuming it."""
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

Is it a good idea to extend the functionalities of the built-in len
function to cover such situation too?

Bye,
bearophile
 
G

George Sakkis

Often I need to tell the len of an iterator, this is a stupid example:


len isn't able to tell it:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

This is a bad solution, it may need too much memory, etc:


This is a simple solution in a modern Python:

50

This is a faster solution (and Psyco helps even more):

def leniter(iterator):
"""leniter(iterator): return the length of an iterator,
consuming it."""
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

Is it a good idea to extend the functionalities of the built-in len
function to cover such situation too?

Bye,
bearophile

Is this a rhetorical question ? If not, try this:

George
 
B

bearophileHUGS

George Sakkis:
Is this a rhetorical question ? If not, try this:

It wasn't a rhetorical question.


What's your point? Maybe you mean that it consumes the given iterator?
I am aware of that, it's written in the function docstring too. But
sometimes you don't need the elements of a given iterator, you just
need to know how many elements it has. A very simple example:

s = "aaabbbbbaabbbbbb"
from itertools import groupby
print [(h,leniter(g)) for h,g in groupby(s)]

Bye,
bearophile
 
B

Ben Finney

But sometimes you don't need the elements of a given iterator, you
just need to know how many elements it has.

AFAIK, the iterator protocol doesn't allow for that.

Bear in mind, too, that there's no way to tell from outside that an
iterater even has a finite length; also, many finite-length iterators
have termination conditions that preclude knowing the number of
iterations until the termination condition actually happens.
 
G

Gabriel Genellina

At said:
def leniter(iterator):
"""leniter(iterator): return the length of an iterator,
consuming it."""
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

Is it a good idea to extend the functionalities of the built-in len
function to cover such situation too?

I don't think so, because it may consume the iterator, and that's a
big side effect that one would not expect from builtin len()


--
Gabriel Genellina
Softlab SRL






__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
S

Steven D'Aprano

What's your point? Maybe you mean that it consumes the given iterator?
I am aware of that, it's written in the function docstring too. But
sometimes you don't need the elements of a given iterator, you just
need to know how many elements it has. A very simple example:

s = "aaabbbbbaabbbbbb"
from itertools import groupby
print [(h,leniter(g)) for h,g in groupby(s)]

s isn't an iterator. It's a sequence, a string, and an iterable, but not
an iterator.

I hope you know what sequences and strings are :)

An iterable is anything that can be iterated over -- it includes sequences
and iterators.

An iterator, on the other hand, is something with the iterator protocol,
that is, it has a next() method and raises StopIteration when it's done.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'str' object has no attribute 'next'

An iterator should return itself if you pass it to iter():
True

You've said that you understand len of an iterator will consume the
iterator, and that you don't think that matters. It might not matter in
a tiny percentage of cases, but it will certainly matter all the rest
of the time!

And let's not forget, in general you CAN'T calculate the length of an
iterator, not even in theory:

def randnums():
while random.random != 0.123456789:
yield "Not finished yet"
yield "Finished"

What should the length of randnums() return?

One last thing which people forget... iterators can have a length, the
same as any other object, if they have a __len__ method:
16

So, if you want the length of an arbitrary iterator, just call len()
and deal with the exception.
 
B

bearophileHUGS

Steven D'Aprano:
s = "aaabbbbbaabbbbbb"
from itertools import groupby
print [(h,leniter(g)) for h,g in groupby(s)]

s isn't an iterator. It's a sequence, a string, and an iterable, but not
an iterator.

If you look better you can see that I use the leniter() on g, not on s.
g is the iterator I need to compute the len of.

I hope you know what sequences and strings are :)

Well, I know little still about the C implementation of CPython
iterators :)

But I agree with the successive things you say, iterators may be very
general things, and there are too many drawbacks/dangers, so it's
better to keep leniter() as a function separated from len(), with
specialized use.

Bye and thank you,
bearophile
 
S

Steven D'Aprano

Steven D'Aprano:
s = "aaabbbbbaabbbbbb"
from itertools import groupby
print [(h,leniter(g)) for h,g in groupby(s)]

s isn't an iterator. It's a sequence, a string, and an iterable, but not
an iterator.

If you look better you can see that I use the leniter() on g, not on s.
g is the iterator I need to compute the len of.


Oops, yes you're right. But since g is not an arbitrary iterator, one can
easily do this:

print [(h,len(list(g))) for h,g in groupby(s)]

No need for a special function.


Well, I know little still about the C implementation of CPython
iterators :)

But I agree with the successive things you say, iterators may be very
general things, and there are too many drawbacks/dangers, so it's
better to keep leniter() as a function separated from len(), with
specialized use.

I don't think it's better to have leniter() at all. If you, the iterator
creator, know enough about the iterator to be sure it has a predictable
length, you know how to calculate it. Otherwise, iterators in general
don't have a predictable length even in principle.
 
B

bearophileHUGS

Steven D'Aprano:
since g is not an arbitrary iterator, one can easily do this:
print [(h,len(list(g))) for h,g in groupby(s)]
No need for a special function.

If you look at my first post you can see that I have shown that
solution too, but it creates a list that may be long, that may use a
lot of of memory, and then throws it away each time. I think that's a
bad solution. It goes against the phylosophy of iterators too, they are
things created to avoid managing true lists of items too.

If you, the iterator
creator, know enough about the iterator to be sure it has a predictable
length, you know how to calculate it.

I don't agree, sometimes I know I have a finite iterator, but I may
ignore how many elements it gives (and sometimes they may be a lot).
See the simple example with the groupby.

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
DewittMill
Top