itertools.flatten()? and copying generators/iterators.

Discussion in 'Python' started by Francis Avila, Oct 28, 2003.

  1. Below is an implementation a 'flattening' recursive generator (take a nested
    iterator and remove all its nesting). Is this possibly general and useful
    enough to be included in itertools? (I know *I* wanted something like it...)

    Very basic examples:

    >>> rl = [1, [2, 3, [4, 5], '678', 9]]
    >>> list(flatten(rl))

    [1, 2, 3, 4, 5, '6', '7', '8', 9]
    >>> notstring = lambda obj: not isinstance(obj, type(''))
    >>> list(flatten(rl, notstring))

    [1, 2, 3, 4, 5, '678', 9]
    >>> isstring = lambda obj: not notstring(obj)
    >>> list(flatten(rl, isstring))

    [1, [2, 3, [4, 5], '678', 9]]
    >>> #The string is within a list, so we never descend that far.
    >>> car_is_2 = lambda obj: isinstance(obj, type([])) and obj[0] == 2
    >>> list(flatten(rl, car_is_2))

    [1, 2, 3, [4, 5], '678', 9]
    >>> rls = ['Here', 'are', ['some', ['nested'], 'strings']]
    >>> list(flatten(rls))

    ['H', 'e', 'r', 'e', 'a', 'r', 'e', 's', 'o', 'm', 'e', 'n', 'e', 's', 't',
    'e', 'd', 's', 't', 'r', 'i', 'n', 'g', 's']
    >>> list(flatten(rls, notstring))

    ['Here', 'are', 'some', 'nested', 'strings']
    >>> rli = iter([1, 2, iter(['abc', iter('ABC')]), 4])
    >>> list(flatten(rli))

    [1, 2, 'a', 'b', 'c', 'A', 'B', 'C', 4]
    >>> list(flatten(rli, notstring))

    []
    >>> #rli is an iterator, remember!
    >>> rli = iter([1, 2, iter(['abc', iter('ABC')]), 4])
    >>> list(flatten(rli, notstring))

    [1, 2, 'abc', 'A', 'B', 'C', 4]
    >>> # The following I'm not sure what to do about...
    >>> empty = [1, [], 3]
    >>> emptyiter = [1, iter([]), 3]
    >>> list(flatten(empty))

    [1, [], 3]
    >>> list(flatten(emptyiter))

    [1, 3]
    >>>


    I tried hard to get it to work with iterator and generator objects, too, and
    it mostly does. However, I'm having some problems determining whether a
    given object will iterate infinitely, if that object is already a
    generator/iterator. Basically, I'm unable to copy an iterator (why?). See
    isemptyorself() below for details. Aside from that, I'm just generally
    unsure what the proper behavior should be when an iterator/generator is
    encountered.

    Also, why is the iterator type not included in the types module or described
    in the language reference (Python 2.2)?

    --- Code ---


    def isiterable(obj):
    try: iter(obj)
    except TypeError: return False
    else: return True

    def isemptyorself(iterable):
    """True if iterable yields nothing or itself."""
    it = iter(iterable)

    # KLUDGE! This test must modify the object in order to test
    # it. This means that a value of iterable will be discarded!
    # Most likely, iterable is itself an iterator or generator,
    # because id(iter(GENR or ITER)) == id(GENR or ITER).
    # Unfortunately, we can't copy generators and iterators using
    # the copy module, so we must just assume that this iterator
    # doesn't yield itself or nothing....

    if it is iterable:
    return False

    try: res = it.next()
    except StopIteration: #Yields nothing
    return True
    else:
    if res == iterable: #Yields itself
    return True
    return False

    def flatten(iterable, isnested=None):
    """Iterate items in iterable, descending into nested items.

    isnested is a function that returns true if the element of
    iterable should be descended into. The default is to
    consider iterable anything that iter() thinks is iterable (unless
    doing so would cause an infinite recursion).

    """
    if isnested is None:
    isnested = lambda obj: True #Always descend

    for item in iterable:
    if isiterable(item) and not isemptyorself(item) \
    and isnested(item):
    for subitem in flatten(item, isnested):
    yield subitem
    else:
    yield item


    --
    Francis Avila
     
    Francis Avila, Oct 28, 2003
    #1
    1. Advertising

  2. [Francis Avila]
    > Below is an implementation a 'flattening' recursive generator (take a nested
    > iterator and remove all its nesting). Is this possibly general and useful
    > enough to be included in itertools? (I know *I* wanted something like it...)


    Interesting post!

    I'll add your idea to the list of itertool suggestions received so far. My
    first take
    is that it more likely to be added to the "recipes" in the examples section.

    Core itertools should be primitive building blocks that combine with one
    another to make other tools. Also, I'm trying to keep the toolset as small
    as possible by excluding new tools that can be easily and efficiently coded
    in pure python.

    I would code it like this:

    def flatten(s):
    try:
    iter(s)
    except TypeError:
    yield s
    else:
    for elem in s:
    for subelem in flatten(elem):
    yield subelem

    As your examples show, it does have some suprising behavior in that
    strings get split apart instead of staying intact. That is easily taken care of
    by an AtomicString subclass:

    class AtomicString(str):
    def __iter__(self):
    raise TypeError

    a = [1, 2, AtomicString('abc'), 4]



    > >>> # The following I'm not sure what to do about...
    > >>> empty = [1, [], 3]
    > >>> emptyiter = [1, iter([]), 3]


    The above code definition handles this without a problem.



    > I'm having some problems determining whether a
    > given object will iterate infinitely, if that object is already a
    > generator/iterator.


    In general, that is not knowable.



    > Basically, I'm unable to copy an iterator (why?).


    Alex Martelli just wrote a PEP about making many iterators copyable.
    See www.python.org/peps/pep-0323.html

    Until that is adopted, your best bet is to use tee():

    def tee(iterable):
    "Return two independent iterators from a single iterable"
    def gen(next, data={}, cnt=[0]):
    dpop = data.pop
    for i in count():
    if i == cnt[0]:
    item = data = next()
    cnt[0] += 1
    else:
    item = dpop(i)
    yield item
    next = iter(iterable).next
    return (gen(next), gen(next))


    > Also, why is the iterator type not included in the types module or described
    > in the language reference (Python 2.2)?


    There is no iterator type. Iterators can be any object that supports the
    iterator
    protocol: __iter__() and next(). Generators, container iterators, and each of
    the itertools define their own type.


    This was a long but highly instructive pair of posts. I hope it becomes
    widely read. Your work ventured into previously uncharted
    territory and touched on a number of frontier issues like copyability,
    determining whether something is iterable, the various iterator types,
    and the deep mathematical subject of determining whether a given
    process is finite.


    Raymond Hettinger
     
    Raymond Hettinger, Oct 28, 2003
    #2
    1. Advertising

  3. Francis Avila

    Peter Otten Guest

    Raymond Hettinger wrote:

    > Core itertools should be primitive building blocks that combine with one
    > another to make other tools. Also, I'm trying to keep the toolset as
    > small as possible by excluding new tools that can be easily and
    > efficiently coded in pure python.
    >
    > I would code it like this:
    >
    > def flatten(s):
    > try:
    > iter(s)
    > except TypeError:
    > yield s
    > else:
    > for elem in s:
    > for subelem in flatten(elem):
    > yield subelem
    >
    > As your examples show, it does have some suprising behavior in that
    > strings get split apart instead of staying intact. That is easily taken
    > care of by an AtomicString subclass:
    >
    > class AtomicString(str):
    > def __iter__(self):
    > raise TypeError
    >
    > a = [1, 2, AtomicString('abc'), 4]
    >
    >
    >
    >> >>> # The following I'm not sure what to do about...
    >> >>> empty = [1, [], 3]
    >> >>> emptyiter = [1, iter([]), 3]

    >


    I suggest a minor modification:

    def flatten(s, toiter=iter):
    try:
    it = toiter(s)
    except TypeError:
    yield s
    else:
    for elem in it:
    for subelem in flatten(elem, toiter):
    yield subelem

    def keepstrings(seq):
    if isinstance(seq, basestring):
    raise TypeError
    return iter(seq)

    sample = [1, 2, [3, "abc def".split()], 4]
    print sample
    print list(flatten(sample, keepstrings))
    print list(flatten([1, [], 3]))
    print list(flatten([1, iter([]), 3]))

    The above handles strings in a way that is nonintrusive on client code.

    Peter
     
    Peter Otten, Oct 28, 2003
    #3
  4. Peter Otten wrote:
    ...
    > def keepstrings(seq):
    > if isinstance(seq, basestring):
    > raise TypeError
    > return iter(seq)

    ...
    > The above handles strings in a way that is nonintrusive on client code.


    Yes, very nice. I'd make keepstrings the default (one RARELY wants
    to treat strings as nonatomic) AND replace the typetest with an
    attempt to do a seq+'' (if it doesn't raise, seq ain't a string),
    but that's just me:).


    Alex
     
    Alex Martelli, Oct 28, 2003
    #4
  5. "Alex Martelli" <> wrote in message
    news:Tesnb.57573$...
    > Peter Otten wrote:
    > ...
    > > def keepstrings(seq):
    > > if isinstance(seq, basestring):
    > > raise TypeError
    > > return iter(seq)

    > ...
    > > The above handles strings in a way that is nonintrusive on client code.

    >
    > Yes, very nice. I'd make keepstrings the default (one RARELY wants
    > to treat strings as nonatomic) AND replace the typetest with an
    > attempt to do a seq+'' (if it doesn't raise, seq ain't a string),
    > but that's just me:).
    >
    > Alex
    >


    try: seq+'' seems more expensive. Is it just to handle the case where
    someone makes a string-alike that doesn't derive from str? (Actually, I
    guess that's about all pre-2.2 code.)

    Is it expensive enough to even matter?
    --
    Francis Avila
     
    Francis Avila, Oct 28, 2003
    #5
  6. "Raymond Hettinger" <> wrote in message
    news:niqnb.17753$...
    > [Francis Avila]


    > > Also, why is the iterator type not included in the types module or

    described
    > > in the language reference (Python 2.2)?

    >
    > There is no iterator type. Iterators can be any object that supports the
    > iterator
    > protocol: __iter__() and next(). Generators, container iterators, and

    each of
    > the itertools define their own type.


    Actually (Python 2.2):

    >>> IteratorType = type(iter(()))
    >>> IteratorType

    <type 'iterator'>
    >>> from types import GeneratorType
    >>> GeneratorType

    <type 'generator'>
    >>> GeneratorType == IteratorType

    0
    >>> isinstance(IteratorType, GeneratorType)

    0
    >>> isinstance(GeneratorType, IteratorType)

    0
    >>> dir(GeneratorType)

    ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__',
    '__init__', '__iter__', '__new__', '__reduce__', '__repr__', '__setattr__',
    '__str__', 'gi_frame', 'gi_running', 'next']
    >>> dir(IteratorType)

    ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__',
    '__init__', '__iter__', '__new__', '__reduce__', '__repr__', '__setattr__',
    '__str__', 'next']
    >>> def g():

    yield None


    >>> G = g()
    >>> type(G)

    <type 'generator'>
    >>> G

    <generator object at 0x0156F1C0>
    >>> iter(G)

    <generator object at 0x0156F1C0>
    >>> I = iter(())
    >>> type(I)

    <type 'iterator'>
    >>> I

    <iterator object at 0x014BAC80>
    >>> iter(I)

    <iterator object at 0x014BAC80>
    >>>


    The generator seems to be an actual execution frame, but an iterator only a
    type that supports the iterator protocol. I think from this that iterator
    needs to store all its elements prior to yielding them, whereas a generator
    does not. For this reason, an iterator will always terminate eventually.
    For these two reasons, I think it's useful to distinguish iterators and
    generators.

    I think I've clarified it a bit more to myself. For iterators, it iterates
    infinitely if it yields itself, otherwise it doesn't. For generators, it is
    unknowable.

    Are classes supporting the iterator protocol always generators? Would they
    *ever* be an iterator? (According to type(), I mean) Is there any other way
    to get a strict iterator other than applying iter() to a sequence? (I
    noticed you can't force Python to subclass the iterator type.)

    The language reference (or maybe it's the module reference?) states that
    generators yield iterator objects. That's not exactly true: they yield
    generator objects which support the iterator protocol, but not iterator
    objects. And again, iterator types are not mentioned as a separate type in
    the language reference, which strikes me as odd.

    Perhaps a guru can clarify the relationship between generators and
    iterators?
    --
    Francis Avila
     
    Francis Avila, Oct 28, 2003
    #6
  7. Francis Avila

    Peter Otten Guest

    Francis Avila wrote:

    > try: seq+'' seems more expensive. Is it just to handle the case where
    > someone makes a string-alike that doesn't derive from str? (Actually, I
    > guess that's about all pre-2.2 code.)
    >
    > Is it expensive enough to even matter?
    > --
    > Francis Avila


    I you think that the concatenation is costly, as I did when I read your
    remark - it's not:

    >>> smartSnakes = "don't rattle"
    >>> smartSnakes is (smartSnakes + "")

    True

    Peter
     
    Peter Otten, Oct 28, 2003
    #7
  8. "Peter Otten" <> wrote in message
    news:bnlp0m$d9l$01$-online.com...
    > Francis Avila wrote:
    >
    > > try: seq+'' seems more expensive. Is it just to handle the case where
    > > someone makes a string-alike that doesn't derive from str? (Actually, I
    > > guess that's about all pre-2.2 code.)
    > >
    > > Is it expensive enough to even matter?
    > > --
    > > Francis Avila

    >
    > I you think that the concatenation is costly, as I did when I read your
    > remark - it's not:
    >
    > >>> smartSnakes = "don't rattle"
    > >>> smartSnakes is (smartSnakes + "")

    > True
    >
    > Peter


    Actually, I meant the exception testing. Isn't Alex suggesting something
    like?:

    try:
    seq+''
    except:
    seq is not a string
    else:
    seq is a string

    --
    Francis Avila
     
    Francis Avila, Oct 28, 2003
    #8
  9. Francis Avila wrote:
    ...
    > Actually, I meant the exception testing. Isn't Alex suggesting something
    > like?:
    >
    > try:
    > seq+''
    > except:
    > seq is not a string
    > else:
    > seq is a string


    Right. Yes, if you have a lot of nonstrings this IS going to be
    substantially slower than isinstance tests. As usual, setting up
    a representative benchmark and measuring the effect is best.

    Of course, it's not easy to decide what IS representative, but
    let's give it a try:

    # Peter Otten's suggestion (simplified/flattened out)
    def flatten_po(s):
    try:
    if isinstance(s, basestring):
    raise TypeError
    else:
    it = iter(s)
    except TypeError:
    yield s
    else:
    for elem in it:
    for subelem in flatten_po(elem):
    yield subelem

    # my slight preference
    def flatten_am(s):
    try:
    s+''
    except TypeError:
    try: it = iter(s)
    except TypeError:
    yield s
    else:
    for elem in s:
    for subelem in flatten_am(elem):
    yield subelem
    else:
    yield s

    # an example tree -- a mix of strings, items, some nesting
    subtree = ['foo']*18 + [1,2,3]*6
    tree = [ subtree*10, [ subtree * 8 ] ]

    # functions to benchmark
    def process(flatten, tree=tree):
    return list(flatten(tree))

    def pro_po(): return process(flatten_po)
    def pro_am(): return process(flatten_am)


    Now, with this setup, I measure:

    [alex@lancelot bo]$ timeit.py -c -s'import fl' 'fl.pro_po()'
    100 loops, best of 3: 9.4e+03 usec per loop
    [alex@lancelot bo]$ timeit.py -c -s'import fl' 'fl.pro_am()'
    100 loops, best of 3: 8.3e+03 usec per loop

    ....which is a reminder of WHY it's best to always measure --
    I fully expected pro_am to be slower, as it's more general
    (handles UserString instances etc), and I was just trying
    to check _how_ much slower it was...!

    Upon reflection, the problem is with flatten_po elegant
    'raise TypeError', which merges strings with noniterables
    BUT only at the cost of an exception. Changing the
    preample of flatten_po to:

    try:
    if isinstance(s, basestring):
    yield s
    return
    ...

    improves its timeit.py-measured performance to 5.9e+03 usec.
    So, on this sample of data, the _po version is either 13%
    slower or 29% faster than the _am .

    Now, fortunately, it's an EXTREMELY unlikely event that
    performance differences of this order of magnitude should
    deeply influence our choice of algorithms! Twice as slow,
    well, one wants to keep an eye on that; 30% or less, it's
    unlikely to matter except at the very heart of some key
    bottleneck -- and, honestly, how often will flattening
    overhead be THE bottleneck of your application...? Note
    that in this case we're basically doing no processing with
    the items in the flattened sequence, just stuffing them into
    flat lists. Do any processing whatsoever, and that will
    probably dwarf the infrastructural overhead of the flattening
    procedure itself.

    The key thing is watching for difference in big-O behavior --
    and, here, we have none... just somewhat different (NOT
    by whole orders of magnitude!) multiplicative constants.
    THAT kind of performance issue is one we can live with
    more often than not! Which, in turn, is what makes the
    ordinary use of exceptions generally tolerable from the
    point of view of performance, except in the very hottest
    bottlenecks...


    Alex
     
    Alex Martelli, Oct 28, 2003
    #9
  10. Francis Avila wrote:
    ...
    > Perhaps a guru can clarify the relationship between generators and
    > iterators?


    Sure: the use of "iterators" or "iterator objects" in the Python docs
    does NOT mean "objects that are instance of <type 'iterator'>", it
    means "objects that satisfy the iterator protocol".

    The situation may be a bit less confusing in today's Python, just
    because the typenames involved have been changed a bit in a few
    important special cases:

    >>> iter(())

    <tupleiterator object at 0x402deecc>

    >>> iter([])

    <listiterator object at 0x402dedec>

    >>> iter('')

    <iterator object at 0x402dee8c>

    These types are unrelated -- each only inherits from object, as
    you can check by viewing their __bases__ -- and besides the small
    performance improvement that results from each of these iterator
    objects 'knowing' the exact type of sequence it's iterating on,
    the name diversification may help avoid the understandable kind
    of confusion you may have been led into by older Python versions.


    Alex
     
    Alex Martelli, Oct 28, 2003
    #10
  11. Francis Avila wrote:
    ...
    > try: seq+'' seems more expensive. Is it just to handle the case where
    > someone makes a string-alike that doesn't derive from str? (Actually, I


    As per my other post, it sure SEEMS more expensive - whether it IS or
    not depends on details:). Yes, it's basically to try and treat as a
    string any class that looks like one.

    > guess that's about all pre-2.2 code.)


    Nope, UserString is still alive and well in 2.2 and 2.3 -- and doesn't
    subclass basestring (I don't know why). Yes, if you write stringlike
    classes in 2.2 and later, subclassing basestring, just as a "flag" since
    quite a few spots may specialcase instances of basestring, IS becoming
    advisable, I suspect.

    > Is it expensive enough to even matter?


    Nope (in the sample case I measured), again see my other post for
    details. But perhaps my old-style insistence against isinstance
    abuse is becoming misplaced in this one case -- basestring exists
    ONLY for the purpose of allowing easy isinstance checks, after all,
    it offers no useful functionality per se. So, I should maybe
    resign myself to the inevitable...

    ....and hope a similar type 'basenumber' to become the "just flagging"
    baseclass of int, long, float and complext doesn't emerge...:)


    Alex
     
    Alex Martelli, Oct 28, 2003
    #11
  12. On Tue, 28 Oct 2003 14:19:40 GMT, Alex Martelli <> wrote:

    >Francis Avila wrote:
    > ...
    >> Actually, I meant the exception testing. Isn't Alex suggesting something
    >> like?:
    >>
    >> try:
    >> seq+''
    >> except:
    >> seq is not a string
    >> else:
    >> seq is a string

    >
    >Right. Yes, if you have a lot of nonstrings this IS going to be
    >substantially slower than isinstance tests. As usual, setting up
    >a representative benchmark and measuring the effect is best.
    >
    >Of course, it's not easy to decide what IS representative, but
    >let's give it a try:
    >

    I am a worrywart, I guess, but my reaction is 'Nicht finger-poken
    in der blinken-machinen' etc ;-) IOW, without knowing exactly what kind of
    blinken-machine seq is, I would fear triggering an unknown side effect
    with seq+'', so I would not like such a test as part of library code without
    at least having it documented with shout-caps in the doc strings.

    Regards,
    Bengt Richter
     
    Bengt Richter, Oct 28, 2003
    #12
  13. Francis Avila

    Peter Otten Guest

    Alex Martelli wrote:

    > The key thing is watching for difference in big-O behavior --
    > and, here, we have none... just somewhat different (NOT
    > by whole orders of magnitude!) multiplicative constants.
    > THAT kind of performance issue is one we can live with
    > more often than not! Which, in turn, is what makes the
    > ordinary use of exceptions generally tolerable from the
    > point of view of performance, except in the very hottest
    > bottlenecks...


    You are right, but let me play a little with the code anyway:

    def flatten_am(s):
    try:
    s+''
    except TypeError:
    try: it = iter(s)
    except TypeError:
    yield s
    else:
    for elem in s:
    for subelem in flatten_am(elem):
    yield subelem
    else:
    yield s


    def flatten_dict(s, isScalar):
    try:
    scalar = isScalar[s.__class__]
    except KeyError:
    t = s.__class__
    try:
    s = iter(s)
    except TypeError:
    scalar = isScalar[t] = True
    else:
    scalar = isScalar[t] = False

    if scalar:
    yield s
    else:
    for elem in s:
    for subelem in flatten_dict(elem, isScalar):
    yield subelem

    # an example tree -- a mix of strings, items, some nesting
    subtree = ['foo']*18 + [1,2,3]*6
    tree = [ subtree*10, [ subtree * 8 ] ]

    def pro_am():
    for f in flatten_am(tree): pass

    def pro_dict():
    for f in flatten_dict(tree, {"".__class__: True}): pass

    assert list(flatten_am(tree)) == list(flatten_dict(tree, {"".__class__:
    True}))


    This gives for flatten_am()
    ....> timeit.py -c -s"import fl" "fl.pro_am()"
    100 loops, best of 3: 5.7e+03 usec per loop

    versus

    ....> timeit.py -c -s"import fl" "fl.pro_dict()"
    1000 loops, best of 3: 1.37e+03 usec per loop

    for flatten_dict(). So you get some speed up at the cost of uglier though
    still quite readable code and under the assumption that either all or no
    instances of a class are iterable.
    Of course the idea works with both isinstance() and a + "" but seems
    somewhat more in the spirit of isinstance().

    Peter
     
    Peter Otten, Oct 29, 2003
    #13
  14. Peter Otten wrote:
    ...
    > def flatten_dict(s, isScalar):
    > try:
    > scalar = isScalar[s.__class__]
    > except KeyError:
    > t = s.__class__
    > try:
    > s = iter(s)
    > except TypeError:
    > scalar = isScalar[t] = True
    > else:
    > scalar = isScalar[t] = False
    >
    > if scalar:
    > yield s
    > else:
    > for elem in s:
    > for subelem in flatten_dict(elem, isScalar):
    > yield subelem


    Neat: "caching" (aka memoizing) IS a good general technique
    for optimization, and this is a cool and un-obvious application.


    Alex
     
    Alex Martelli, Oct 29, 2003
    #14
  15. "Peter Otten" <> wrote in message
    news:bnonsi$l5a$05$-online.com...
    >
    > def flatten_dict(s, isScalar):
    > try:
    > scalar = isScalar[s.__class__]
    > except KeyError:

    ....
    > t = s.__class__


    Why doesn't the above fail (AttributeError) for classic classes?

    ....
    > scalar = isScalar[t] = True
    > else:
    > scalar = isScalar[t] = False


    This part I never would have thought to do. Clever!

    ....
    > for flatten_dict(). So you get some speed up at the cost of uglier though
    > still quite readable code and under the assumption that either all or no
    > instances of a class are iterable.


    That's not always a good assumption to make, if we want to decide whether to
    iterate based upon some property of the object (which is actually how I
    originally discovered I needed a "flatten").

    Perhaps we can combine both conditions and caching optimizations?


    def flatten_fastcond(s, isScalar, itercond=lambda o: None):
    try:
    scalar = isScalar[s.__class__]
    except KeyError:

    # If itercond has something to say about s, use *it*, and don't
    # add the lookup to isScalar.

    iterate = itercond(s)
    if not iterate is None:
    scalar = iterate
    if not scalar:
    s = iter(s)

    else:
    t = s.__class__
    try:
    s = iter(s)
    except TypeError:
    scalar = isScalar[t] = True
    else:
    scalar = isScalar[t] = False

    if scalar:
    yield s
    else:
    for elem in s:
    for subelem in flatten_dict(elem, isScalar):
    yield subelem


    It's usually true that all instances of a given class are iterable or not,
    and it's less inconvenient to add an entry to a dictionary than to write a
    conditional test (also faster). If we move the itercond block above 'try',
    we'll slow down this more common case, but speed things up if s is a
    homogenous structure where itercond does all the work of deciding. There
    are probably other things I could say against it if I thought about it, and
    we're fast enough anyway.

    As expected, given the same test tree (which doesn't exercise itercond), the
    speed is about the same.

    >python timeit.py -c -s"import flatten" "flatten.pro_fastcond()"

    100 loops, best of 3: 6.29e+003 usec per loop

    >python timeit.py -c -s"import flatten" "flatten.pro_dict()"

    100 loops, best of 3: 6.29e+003 usec per loop

    I'll make up some fancy tree later to test out itercond.
    --
    Francis Avila
     
    Francis Avila, Oct 30, 2003
    #15
  16. "Francis Avila" <> wrote in message
    news:...
    > As expected, given the same test tree (which doesn't exercise itercond),

    the
    > speed is about the same.


    *Ahem*:

    > def flatten_fastcond(s, isScalar, itercond=lambda o: None):

    ....
    > for subelem in flatten_dict(elem, isScalar):
    > yield subelem


    I thought it was a little strange that it was *exactly* the same, given the
    function call...


    Corrected (still not much difference, though):

    ....>python timeit.py -c -s"import flatten" "flatten.pro_dict()"
    100 loops, best of 3: 7.22e+003 usec per loop

    ....>python timeit.py -c -s"import flatten" "flatten.pro_fastcond()"
    100 loops, best of 3: 7.33e+003 usec per loop


    > I'll make up some fancy tree later to test out itercond.


    Well, should have made it sooner. I would have noticed an empty list...
    --
    Francis Avila
     
    Francis Avila, Oct 30, 2003
    #16
  17. Francis Avila

    Bryan Guest

    multithreaded PyObject_callFunction

    i've embedded python into our product and i'm not sure how to handle this situation. the following code is a
    simplification of a function that calls a python function. i've removed all error checking to furthur simplify the
    problem. the following code works correctly:

    void foo(context_t ctx) {
    PyObject py_interp = NULL;
    PyObject py_mod = NULL;
    PyObject py_call_foo = NULL;
    const char *mod;

    PyEval_AcquireLock();
    py_interp = get_interpreter(ctx);
    PyThreadState_Swap(py_interp);

    mod = get_module(ctx);
    py_mod = PyImport_ImportModule(mod);
    py_call_foo = PyObject_GetAttrString(py_mod, "py_foo");
    PyObject_CallFunction(py_call_foo, NULL);

    Py_XDECREF(py_interp);
    Py_XDECREF(py_mod);
    Py_XDECREF(py_call_foo);

    PyThreadState_Swap(NULL);
    PyEval_ReleaseLock();
    }

    now i have a situation where foo may be called at the same time in different c threads. furthermore, the call py_foo
    might take forever to execute and foo is then required to return immediately. so i modified the code to something like
    this where foo starts a threaded function foo_threaded in a new thread so foo can return immediately.

    void foo_threaded(void *args) {
    PyObject py_interp = NULL;
    PyObject py_mod = NULL;
    PyObject py_call_foo = NULL;
    const char *mod;

    PyEval_AcquireLock();
    py_interp = get_interpreter(ctx);
    PyThreadState_Swap(py_interp);

    mod = get_module(ctx);
    py_mod = PyImport_ImportModule(mod);
    py_call_foo = PyObject_GetAttrString(py_mod, "py_foo");
    PyObject_CallFunction(py_call_foo, NULL);

    Py_XDECREF(py_interp);
    Py_XDECREF(py_mod);
    Py_XDECREF(py_call_foo);

    PyThreadState_Swap(NULL);
    PyEval_ReleaseLock();
    }


    void foo(context_t ctx) {
    Thread thread;
    thread = new_thread(call_py_foo);
    thread.start()
    }

    this seems like the wrong approach to me. if foo gets called a second time but the call to py_foo hasn't returned yet,
    the interpreter is still locked. so how can another py_foo be called? the 2nd time foo is called the call to
    get_interpreter may or may not return the same interpreter. what am i missing?

    thanks for any help.

    bryan
     
    Bryan, Oct 30, 2003
    #17
  18. Francis Avila

    Bryan Guest

    Re: multithreaded PyObject_callFunction

    sorry, posted to the wrong thread

    bryan
     
    Bryan, Oct 30, 2003
    #18
  19. Francis Avila

    Peter Otten Guest

    Francis Avila wrote:

    >> for flatten_dict(). So you get some speed up at the cost of uglier though
    >> still quite readable code and under the assumption that either all or no
    >> instances of a class are iterable.

    >
    > That's not always a good assumption to make, if we want to decide whether
    > to iterate based upon some property of the object (which is actually how I
    > originally discovered I needed a "flatten").
    >
    > Perhaps we can combine both conditions and caching optimizations?


    I didn't work this out, but I suppose the way to go is to allow three states
    (ALWAYS, NEVER, NOCACHE) in the dictionary and only invoke itercond() on
    cache misses and NOCACHE values.

    Peter
     
    Peter Otten, Oct 30, 2003
    #19
  20. "Peter Otten" <> wrote in message
    news:bnqih2$v3c$01$-online.com...
    > Francis Avila wrote:
    >
    > >> for flatten_dict(). So you get some speed up at the cost of uglier

    though
    > >> still quite readable code and under the assumption that either all or

    no
    > >> instances of a class are iterable.

    > >
    > > That's not always a good assumption to make, if we want to decide

    whether
    > > to iterate based upon some property of the object (which is actually how

    I
    > > originally discovered I needed a "flatten").
    > >
    > > Perhaps we can combine both conditions and caching optimizations?

    >
    > I didn't work this out, but I suppose the way to go is to allow three

    states
    > (ALWAYS, NEVER, NOCACHE) in the dictionary and only invoke itercond() on
    > cache misses and NOCACHE values.
    >
    > Peter


    Hah, just finished writing the post the moment you
     
    Francis Avila, Oct 30, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Francis Avila
    Replies:
    0
    Views:
    497
    Francis Avila
    Nov 2, 2003
  2. Ville Vainio

    Wishlist item: itertools.flatten

    Ville Vainio, Mar 11, 2005, in forum: Python
    Replies:
    19
    Views:
    674
    Christos TZOTZIOY Georgiou
    Mar 17, 2005
  3. Steven Bethard
    Replies:
    0
    Views:
    412
    Steven Bethard
    Mar 12, 2005
  4. Replies:
    2
    Views:
    415
    Steve Pope
    Sep 27, 2006
  5. Raymond Hettinger
    Replies:
    17
    Views:
    578
    Simon Brunning
    Feb 18, 2008
Loading...

Share This Page