itertools.ilen?

Discussion in 'Python' started by Terry Reedy, Aug 7, 2003.

  1. Terry Reedy

    Terry Reedy Guest

    "Jeremy Fincher" <fincher.*@osu.edu> wrote in message
    news:bgsqj5$228$-state.edu...
    > Sometimes I find myself simply wanting the length of an iterator.


    An iterator is a function/method that traverses (or possibly
    generates) a seqeuence. The sequence has a length (actual or
    potential) but the iterator does not.

    > For example, to collect some (somewhat useless ;))
    > statistics about a program of mine, I've got code like this:
    >
    > objs = gc.get_objects()
    > classes = len([obj for obj in objs if inspect.isclass(obj)])
    > functions = len([obj for obj in objs if

    inspect.isroutine(obj)])
    > modules = len([obj for obj in objs if

    inspect.ismodule(obj)])
    > dicts = len([obj for obj in objs if type(obj) ==

    types.DictType])
    > lists = len([obj for obj in objs if type(obj) ==

    types.ListType])
    > tuples = len([obj for obj in objs if type(obj) ==

    types.TupleType])

    Alternative: initialize six counters to 0. Scan list once and update
    appropriate counter.

    > Now, obviously I can (and will, now that 2.3 is officially released

    :))
    > replace the list comprehensions with itertools.ifilter, but I need

    an
    > itertools.ilen to find the length of such iterators.


    You mean the associated sequence.

    > I can imagine such a need arises in more useful situations than

    this, but
    > this is the particular case that brought the need to mind.
    >
    > The Python code is simple, obviously:
    >
    > def ilen(iterator):
    > i = 0
    > for _ in iterator:
    > i += 1
    > return i
    >
    > But it's a pity to use itertools' super-fast iterators and have to

    use slow,
    > raw Python to determine their length :)


    If you mean a c-coded counter (which would not be an iterator itself)
    equivalent to the above, that could be done. Perhaps len() could be
    upgraded/extended to accept an iterator and count when it can't get a
    __len__ method to call. The main downside is that iterators are
    sometimes destructive (run once only).

    In the meanwhile, is this really a bottleneck for you? or merely the
    'pity' of a program running in 1 sec when 0.1 is possible?

    Terry J. Reedy
    Terry Reedy, Aug 7, 2003
    #1
    1. Advertising

  2. Sometimes I find myself simply wanting the length of an iterator. For
    example, to collect some (somewhat useless ;)) statistics about a program
    of mine, I've got code like this:

    objs = gc.get_objects()
    classes = len([obj for obj in objs if inspect.isclass(obj)])
    functions = len([obj for obj in objs if inspect.isroutine(obj)])
    modules = len([obj for obj in objs if inspect.ismodule(obj)])
    dicts = len([obj for obj in objs if type(obj) == types.DictType])
    lists = len([obj for obj in objs if type(obj) == types.ListType])
    tuples = len([obj for obj in objs if type(obj) == types.TupleType])

    Now, obviously I can (and will, now that 2.3 is officially released :))
    replace the list comprehensions with itertools.ifilter, but I need an
    itertools.ilen to find the length of such iterators.

    I can imagine such a need arises in more useful situations than this, but
    this is the particular case that brought the need to mind.

    The Python code is simple, obviously:

    def ilen(iterator):
    i = 0
    for _ in iterator:
    i += 1
    return i

    But it's a pity to use itertools' super-fast iterators and have to use slow,
    raw Python to determine their length :)

    Jeremy
    Jeremy Fincher, Aug 7, 2003
    #2
    1. Advertising

  3. "Terry Reedy" <> schrieb im Newsbeitrag
    news:...
    >
    > "Jeremy Fincher" <fincher.*@osu.edu> wrote in message
    > news:bgsqj5$228$-state.edu...
    > > Sometimes I find myself simply wanting the length of an iterator.

    >
    > An iterator is a function/method that traverses (or possibly
    > generates) a seqeuence. The sequence has a length (actual or
    > potential) but the iterator does not.
    >


    Very well explained. There are lots of usefull generators with unlimited
    sequences.

    - random generators

    - def achilles():
    while 1
    :N=1.
    yield N
    n=n/2

    - def schoenberg():
    cycle=range(12)
    while 1:
    shuffle(cycle)
    for i in cycle:
    yield i


    There is no way to determined, whether such generartors will come to an
    end - The Halting Problem for Turing Machines ;-)
    Thus there will never be a safe len(iterator).

    Kindly
    Michael
    Michael Peuser, Aug 7, 2003
    #3
  4. Another solution could be to implement custom lenght methods. However I see
    no graceful way to do it with the quite tricky implementation (yield is the
    only hint!) of 2.3.

    It would be definitly easy with 2.2 "by hand" function factories (def
    iter(), def __next__()), just def len() in addition and find the fastest
    implementation

    Kindly
    Michael

    "Jeremy Fincher" <fincher.*@osu.edu> schrieb im Newsbeitrag
    news:bgt56e$7no$-state.edu...
    > Michael Peuser wrote:
    > > There is no way to determined, whether such generartors will come to an
    > > end - The Halting Problem for Turing Machines ;-)
    > > Thus there will never be a safe len(iterator).

    >
    > But then, there's no way to determine whether any given class' __len__

    will
    > terminate, so you've got the same problem with len.
    >
    > Granted, it's more likely to manifest itself with iterators and ilen than
    > with sequences and len, but if it's really an issue, ilen could take an
    > optional "max" argument for declaring a counter ilen isn't to exceed.
    >
    > Jeremy
    Michael Peuser, Aug 7, 2003
    #4
  5. Terry Reedy wrote:
    > An iterator is a function/method that traverses (or possibly
    > generates) a seqeuence. The sequence has a length (actual or
    > potential) but the iterator does not.


    Even some sequences don't have a length; consider (Lisp terminology)
    "improper lists," where the cdr points to a cell earlier in the list. Or
    any class with a somehow non-terminating __len__.

    > Alternative: initialize six counters to 0. Scan list once and update
    > appropriate counter.


    Yes, that works in this particular case, and is probably a superior
    solution.

    > If you mean a c-coded counter (which would not be an iterator itself)
    > equivalent to the above, that could be done. Perhaps len() could be
    > upgraded/extended to accept an iterator and count when it can't get a
    > __len__ method to call. The main downside is that iterators are
    > sometimes destructive (run once only).


    That's why I don't think such a change should be made to len(); *all*
    iterators are destructive and len() silently destroying them doesn't seem
    generally useful enough for the potential for mistake.

    > In the meanwhile, is this really a bottleneck for you? or merely the
    > 'pity' of a program running in 1 sec when 0.1 is possible?


    The whole of itertools really seems to exist because of the "pity" of taking
    efficient iterators and turning them into lists in order to do any
    significant manipulation of them. In that case, I would imagine the pity
    of having to turn an interator into a sequence in order to determine the
    length of the underlying sequence would be reason enough.

    Jeremy
    Jeremy Fincher, Aug 7, 2003
    #5
  6. Michael Peuser wrote:
    > There is no way to determined, whether such generartors will come to an
    > end - The Halting Problem for Turing Machines ;-)
    > Thus there will never be a safe len(iterator).


    But then, there's no way to determine whether any given class' __len__ will
    terminate, so you've got the same problem with len.

    Granted, it's more likely to manifest itself with iterators and ilen than
    with sequences and len, but if it's really an issue, ilen could take an
    optional "max" argument for declaring a counter ilen isn't to exceed.

    Jeremy
    Jeremy Fincher, Aug 7, 2003
    #6
  7. "Jeremy Fincher"
    > Sometimes I find myself simply wanting the length of an iterator. For
    > example, to collect some (somewhat useless ;)) statistics about a program
    > of mine, I've got code like this:
    >
    > objs = gc.get_objects()
    > classes = len([obj for obj in objs if inspect.isclass(obj)])
    > functions = len([obj for obj in objs if inspect.isroutine(obj)])
    > modules = len([obj for obj in objs if inspect.ismodule(obj)])
    > dicts = len([obj for obj in objs if type(obj) == types.DictType])
    > lists = len([obj for obj in objs if type(obj) == types.ListType])
    > tuples = len([obj for obj in objs if type(obj) == types.TupleType])
    >
    > Now, obviously I can (and will, now that 2.3 is officially released :))
    > replace the list comprehensions with itertools.ifilter, but I need an
    > itertools.ilen to find the length of such iterators.
    >
    > I can imagine such a need arises in more useful situations than this, but
    > this is the particular case that brought the need to mind.
    >
    > The Python code is simple, obviously:
    >
    > def ilen(iterator):
    > i = 0
    > for _ in iterator:
    > i += 1
    > return i
    >
    > But it's a pity to use itertools' super-fast iterators and have to use slow,
    > raw Python to determine their length :)



    For your application, it is not hard to build a itertools version:

    >>> import itertools
    >>> def countif(predicate, seqn):

    .... return sum(itertools.imap(predicate, seqn))

    >>> def isEven(x):

    .... return x&1 == 0

    >>> countif(isEven, xrange(1000000))

    500000

    >>> def isTuple(x):

    .... return type(x) == types.TupleType

    >>> tuples = countif(isTuple, objs)



    Raymond Hettinger
    Raymond Hettinger, Aug 7, 2003
    #7
  8. On Thu, 07 Aug 2003 03:10:10 -0400, rumours say that Jeremy Fincher
    <fincher.*@osu.edu> might have written:

    > objs = gc.get_objects()
    > classes = len([obj for obj in objs if inspect.isclass(obj)])
    > functions = len([obj for obj in objs if inspect.isroutine(obj)])
    > modules = len([obj for obj in objs if inspect.ismodule(obj)])
    > dicts = len([obj for obj in objs if type(obj) == types.DictType])
    > lists = len([obj for obj in objs if type(obj) == types.ListType])
    > tuples = len([obj for obj in objs if type(obj) == types.TupleType])


    Another way to count objects:

    # code start
    import types, gc

    type2key = {
    types.ClassType: "classes",
    types.FunctionType: "functions",
    types.MethodType: "functions",
    types.ModuleType: "modules",
    types.DictType: "dicts",
    types.ListType: "lists",
    types.TupleType: "tuples"
    }

    sums = {
    "classes": 0, "functions": 0, "modules": 0, "dicts": 0,
    "lists": 0, "tuples": 0
    }

    for obj in gc.get_objects():
    try:
    sums[type2key[type(obj)]] += 1
    except KeyError:
    pass
    # code end

    This code is intended to be <2.3 compatible.
    --
    TZOTZIOY, I speak England very best,
    Microsoft Security Alert: the Matrix began as open source.
    Christos TZOTZIOY Georgiou, Aug 7, 2003
    #8
  9. Terry Reedy

    Duncan Booth Guest

    Christos "TZOTZIOY" Georgiou <> wrote in
    news::

    > Another way to count objects:
    >
    > # code start
    > import types, gc
    >
    > type2key = {
    > types.ClassType: "classes",
    > types.FunctionType: "functions",
    > types.MethodType: "functions",
    > types.ModuleType: "modules",
    > types.DictType: "dicts",
    > types.ListType: "lists",
    > types.TupleType: "tuples"
    > }
    >
    > sums = {
    > "classes": 0, "functions": 0, "modules": 0, "dicts": 0,
    > "lists": 0, "tuples": 0
    > }
    >
    > for obj in gc.get_objects():
    > try:
    > sums[type2key[type(obj)]] += 1
    > except KeyError:
    > pass
    > # code end
    >


    I'm just curious, why did you decide to map the types to strings instead of
    just using the types themselves?
    e.g.

    >>> import gc
    >>> sums = {}
    >>> for obj in gc.get_objects():

    if type(obj) not in sums:
    sums[type(obj)] = 1
    else:
    sums[type(obj)] += 1


    >>> for typ, count in sums.iteritems():

    print typ.__name__, count


    instance 525
    tuple 4273
    class 162
    getset_descriptor 14
    traceback 2
    wrapper_descriptor 165
    list 258
    module 71
    instance method 279
    function 1222
    weakref 18
    dict 1647
    method_descriptor 82
    member_descriptor 75
    frame 18
    >>>


    --
    Duncan Booth
    int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
    "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
    Duncan Booth, Aug 8, 2003
    #9
  10. Duncan Booth wrote:
    > I'm just curious, why did you decide to map the types to strings instead
    > of just using the types themselves?


    So I can pluralize them in my output.

    Jeremy
    Jeremy Fincher, Aug 8, 2003
    #10
  11. On Fri, 8 Aug 2003 10:02:39 +0000 (UTC), rumours say that Duncan Booth
    <> might have written:

    >I'm just curious, why did you decide to map the types to strings instead of
    >just using the types themselves?
    >e.g.
    >
    >>>> import gc
    >>>> sums = {}
    >>>> for obj in gc.get_objects():

    > if type(obj) not in sums:
    > sums[type(obj)] = 1
    > else:
    > sums[type(obj)] += 1


    Just because the initial code treated functions and methods as same;
    also to be output-friendly. I offered code with similar functionality,
    only more concise, it wasn't code for my use :)
    --
    TZOTZIOY, I speak England very best,
    Microsoft Security Alert: the Matrix began as open source.
    Christos TZOTZIOY Georgiou, Aug 19, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    860
  2. Dan Williams

    itertools.take?

    Dan Williams, Jul 15, 2003, in forum: Python
    Replies:
    1
    Views:
    347
    Raymond Hettinger
    Jul 15, 2003
  3. Steven Bethard
    Replies:
    0
    Views:
    383
    Steven Bethard
    Mar 12, 2005
  4. Raymond Hettinger
    Replies:
    17
    Views:
    532
    Simon Brunning
    Feb 18, 2008
  5. Nick Mellor
    Replies:
    35
    Views:
    336
    Paul Rubin
    Dec 6, 2012
Loading...

Share This Page