itertools.ilen?

T

Terry Reedy

Jeremy Fincher said:
Sometimes I find myself simply wanting the length of an iterator.

An iterator is a function/method that traverses (or possibly
generates) a seqeuence. The sequence has a length (actual or
potential) but the iterator does not.
For example, to collect some (somewhat useless ;))
statistics about a program of mine, I've got code like this:

objs = gc.get_objects()
classes = len([obj for obj in objs if inspect.isclass(obj)])
functions = len([obj for obj in objs if inspect.isroutine(obj)])
modules = len([obj for obj in objs if inspect.ismodule(obj)])
dicts = len([obj for obj in objs if type(obj) == types.DictType])
lists = len([obj for obj in objs if type(obj) == types.ListType])
tuples = len([obj for obj in objs if type(obj) ==
types.TupleType])

Alternative: initialize six counters to 0. Scan list once and update
appropriate counter.
Now, obviously I can (and will, now that 2.3 is officially released :))
replace the list comprehensions with itertools.ifilter, but I need an
itertools.ilen to find the length of such iterators.

You mean the associated sequence.
I can imagine such a need arises in more useful situations than this, but
this is the particular case that brought the need to mind.

The Python code is simple, obviously:

def ilen(iterator):
i = 0
for _ in iterator:
i += 1
return i

But it's a pity to use itertools' super-fast iterators and have to use slow,
raw Python to determine their length :)

If you mean a c-coded counter (which would not be an iterator itself)
equivalent to the above, that could be done. Perhaps len() could be
upgraded/extended to accept an iterator and count when it can't get a
__len__ method to call. The main downside is that iterators are
sometimes destructive (run once only).

In the meanwhile, is this really a bottleneck for you? or merely the
'pity' of a program running in 1 sec when 0.1 is possible?

Terry J. Reedy
 
J

Jeremy Fincher

Sometimes I find myself simply wanting the length of an iterator. For
example, to collect some (somewhat useless ;)) statistics about a program
of mine, I've got code like this:

objs = gc.get_objects()
classes = len([obj for obj in objs if inspect.isclass(obj)])
functions = len([obj for obj in objs if inspect.isroutine(obj)])
modules = len([obj for obj in objs if inspect.ismodule(obj)])
dicts = len([obj for obj in objs if type(obj) == types.DictType])
lists = len([obj for obj in objs if type(obj) == types.ListType])
tuples = len([obj for obj in objs if type(obj) == types.TupleType])

Now, obviously I can (and will, now that 2.3 is officially released :))
replace the list comprehensions with itertools.ifilter, but I need an
itertools.ilen to find the length of such iterators.

I can imagine such a need arises in more useful situations than this, but
this is the particular case that brought the need to mind.

The Python code is simple, obviously:

def ilen(iterator):
i = 0
for _ in iterator:
i += 1
return i

But it's a pity to use itertools' super-fast iterators and have to use slow,
raw Python to determine their length :)

Jeremy
 
M

Michael Peuser

Terry Reedy said:
An iterator is a function/method that traverses (or possibly
generates) a seqeuence. The sequence has a length (actual or
potential) but the iterator does not.

Very well explained. There are lots of usefull generators with unlimited
sequences.

- random generators

- def achilles():
while 1
:N=1.
yield N
n=n/2

- def schoenberg():
cycle=range(12)
while 1:
shuffle(cycle)
for i in cycle:
yield i


There is no way to determined, whether such generartors will come to an
end - The Halting Problem for Turing Machines ;-)
Thus there will never be a safe len(iterator).

Kindly
Michael
 
M

Michael Peuser

Another solution could be to implement custom lenght methods. However I see
no graceful way to do it with the quite tricky implementation (yield is the
only hint!) of 2.3.

It would be definitly easy with 2.2 "by hand" function factories (def
iter(), def __next__()), just def len() in addition and find the fastest
implementation

Kindly
Michael
 
J

Jeremy Fincher

Terry said:
An iterator is a function/method that traverses (or possibly
generates) a seqeuence. The sequence has a length (actual or
potential) but the iterator does not.

Even some sequences don't have a length; consider (Lisp terminology)
"improper lists," where the cdr points to a cell earlier in the list. Or
any class with a somehow non-terminating __len__.
Alternative: initialize six counters to 0. Scan list once and update
appropriate counter.

Yes, that works in this particular case, and is probably a superior
solution.
If you mean a c-coded counter (which would not be an iterator itself)
equivalent to the above, that could be done. Perhaps len() could be
upgraded/extended to accept an iterator and count when it can't get a
__len__ method to call. The main downside is that iterators are
sometimes destructive (run once only).

That's why I don't think such a change should be made to len(); *all*
iterators are destructive and len() silently destroying them doesn't seem
generally useful enough for the potential for mistake.
In the meanwhile, is this really a bottleneck for you? or merely the
'pity' of a program running in 1 sec when 0.1 is possible?

The whole of itertools really seems to exist because of the "pity" of taking
efficient iterators and turning them into lists in order to do any
significant manipulation of them. In that case, I would imagine the pity
of having to turn an interator into a sequence in order to determine the
length of the underlying sequence would be reason enough.

Jeremy
 
J

Jeremy Fincher

Michael said:
There is no way to determined, whether such generartors will come to an
end - The Halting Problem for Turing Machines ;-)
Thus there will never be a safe len(iterator).

But then, there's no way to determine whether any given class' __len__ will
terminate, so you've got the same problem with len.

Granted, it's more likely to manifest itself with iterators and ilen than
with sequences and len, but if it's really an issue, ilen could take an
optional "max" argument for declaring a counter ilen isn't to exceed.

Jeremy
 
R

Raymond Hettinger

"Jeremy Fincher"
Sometimes I find myself simply wanting the length of an iterator. For
example, to collect some (somewhat useless ;)) statistics about a program
of mine, I've got code like this:

objs = gc.get_objects()
classes = len([obj for obj in objs if inspect.isclass(obj)])
functions = len([obj for obj in objs if inspect.isroutine(obj)])
modules = len([obj for obj in objs if inspect.ismodule(obj)])
dicts = len([obj for obj in objs if type(obj) == types.DictType])
lists = len([obj for obj in objs if type(obj) == types.ListType])
tuples = len([obj for obj in objs if type(obj) == types.TupleType])

Now, obviously I can (and will, now that 2.3 is officially released :))
replace the list comprehensions with itertools.ifilter, but I need an
itertools.ilen to find the length of such iterators.

I can imagine such a need arises in more useful situations than this, but
this is the particular case that brought the need to mind.

The Python code is simple, obviously:

def ilen(iterator):
i = 0
for _ in iterator:
i += 1
return i

But it's a pity to use itertools' super-fast iterators and have to use slow,
raw Python to determine their length :)


For your application, it is not hard to build a itertools version:
.... return sum(itertools.imap(predicate, seqn))
.... return x&1 == 0
.... return type(x) == types.TupleType


Raymond Hettinger
 
C

Christos TZOTZIOY Georgiou

objs = gc.get_objects()
classes = len([obj for obj in objs if inspect.isclass(obj)])
functions = len([obj for obj in objs if inspect.isroutine(obj)])
modules = len([obj for obj in objs if inspect.ismodule(obj)])
dicts = len([obj for obj in objs if type(obj) == types.DictType])
lists = len([obj for obj in objs if type(obj) == types.ListType])
tuples = len([obj for obj in objs if type(obj) == types.TupleType])

Another way to count objects:

# code start
import types, gc

type2key = {
types.ClassType: "classes",
types.FunctionType: "functions",
types.MethodType: "functions",
types.ModuleType: "modules",
types.DictType: "dicts",
types.ListType: "lists",
types.TupleType: "tuples"
}

sums = {
"classes": 0, "functions": 0, "modules": 0, "dicts": 0,
"lists": 0, "tuples": 0
}

for obj in gc.get_objects():
try:
sums[type2key[type(obj)]] += 1
except KeyError:
pass
# code end

This code is intended to be <2.3 compatible.
 
D

Duncan Booth

Another way to count objects:

# code start
import types, gc

type2key = {
types.ClassType: "classes",
types.FunctionType: "functions",
types.MethodType: "functions",
types.ModuleType: "modules",
types.DictType: "dicts",
types.ListType: "lists",
types.TupleType: "tuples"
}

sums = {
"classes": 0, "functions": 0, "modules": 0, "dicts": 0,
"lists": 0, "tuples": 0
}

for obj in gc.get_objects():
try:
sums[type2key[type(obj)]] += 1
except KeyError:
pass
# code end

I'm just curious, why did you decide to map the types to strings instead of
just using the types themselves?
e.g.
if type(obj) not in sums:
sums[type(obj)] = 1
else:
sums[type(obj)] += 1

print typ.__name__, count


instance 525
tuple 4273
class 162
getset_descriptor 14
traceback 2
wrapper_descriptor 165
list 258
module 71
instance method 279
function 1222
weakref 18
dict 1647
method_descriptor 82
member_descriptor 75
frame 18
 
J

Jeremy Fincher

Duncan said:
I'm just curious, why did you decide to map the types to strings instead
of just using the types themselves?

So I can pluralize them in my output.

Jeremy
 
C

Christos TZOTZIOY Georgiou

I'm just curious, why did you decide to map the types to strings instead of
just using the types themselves?
e.g.
if type(obj) not in sums:
sums[type(obj)] = 1
else:
sums[type(obj)] += 1

Just because the initial code treated functions and methods as same;
also to be output-friendly. I offered code with similar functionality,
only more concise, it wasn't code for my use :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top