flatten() time trial mystery. (or, 101 ways to flatten a nested list using generators)

Discussion in 'Python' started by Francis Avila, Nov 2, 2003.

  1. A few days ago (see the 'itertools.flatten()?' thread from October 28) I
    became obsessed with refactoring a recursive generator that yielded the
    leaves of nested iterables. When the dust settled, I had many flatten
    functions at hand.

    So I had to time them. Results below.

    History of the functions (from flattrial.py):

    # There are three basic features:
    # 1. Can specify a function that determines iterability.
    # 2. Can specify class-iterability.
    # 3. Can modify sequence before iterating.

    ##Function: Features Supported : Author
    ##
    ##flatten_fa: 1 : Francis Avila
    ##flatten_po: 1,2,3 : Peter Otten
    ##flatten_po2: None : Alex Martelli channeling Peter Otten
    ##flatten_am: None : Alex Martelli
    ##flatten_dict: 2 : Peter Otten
    ##flatten_fastcond: 1,2 : Francis Avila
    ##flatten_itercond: 1,2,3 : Francis Avila
    ##flatten_dictcond: 1,2,3 : Francis Avila
    ##flatten_dictdef: 1,2,3 : Francis Avila
    ##flatten_trydictdef: 1,2,3 : Francis Avila
    ##flatten_fastdictdef: 1,2,3 : Francis Avila

    Tree test flattens tree:
    subtree = ['foo']*18 + [1,2,3]*6
    tree = [ subtree*10, [ subtree * 8 ] ]

    Node test flattens nodetree:
    class Node(object):
    def __init__(self, label=None, data=None, children=()):
    self.children = children
    self.label = label
    self.data = data
    def __iter__(self):
    return iter(self.children)

    leaves = [Node(chr(i+65),i) for i in range(10)]
    branches = [Node(chr(i+65), i, leaves) for i in range(10,30)]
    nodetree = [Node(chr(i+65), i, branches) for i in range(30,50)]

    Results (Python 2.2):

    .....>python flattrial.py
    C:\Docs>python flattrial.py
    Tree Tests (100 reps):
    02.113544 fa
    03.129147 po
    02.845054 po2
    02.587387 am
    00.643718 dict
    00.648371 fastcond
    00.724689 itercond
    00.791277 dictcond
    01.006224 dictdef
    00.833452 trydictdef
    00.776937 fastdictdef
    Node Tests (10 reps):
    02.877818 po
    02.633231 itercond
    00.878554 dictcond
    01.040838 dictdef
    00.897504 trydictdef
    00.864411 fastdictdef

    I'd post flattrial.py, but it's about 500 lines and I don't have any web
    space to put it up. Besides, I'm not sure anyone is interested. :)

    A mystery, though. I did not expect dictdef (my magnum opus) to be as slow
    as it was, so I investigated. I went to the obvious first: using dict.get
    is quite a bit slower than using try: x = dict[key]; except KeyError: x =
    default. This is rather inexplicable....

    It was still noticeably slower than dictcond. So, I made fastdict, which
    emulated dictcond more closely by not allowing the default handler to modify
    the sequence passed to it:

    def defaulthandler(seq):
    try:
    it = iter(seq)
    except TypeError:
    return False, seq
    else:
    return True, it

    def flatten_fastdictdef(iterable, get_iterbility=None):
    #Note defaulthandler is no longer an argument.
    if get_iterbility is None:
    get_iterbility = {''.__class__:False, u''.__class__:False}

    # In dictdef, the following try-except is:

    # iterbility = get_iterbility.get(iterable.__class__, defaulthandler)

    try:
    iterbility = get_iterbility[iterable.__class__]
    except KeyError:
    #Following added to avoid a function call
    #Was:

    # iterbility = defaulthandler
    #
    # if iterbility is defaulthandler:
    # iterbility, iterable = defaulthandler(iterable)
    # get_iterbility[iterable.__class__] = iterbility

    #Now:
    t = iterable.__class__
    try:
    iterable = iter(iterable)
    except TypeError:
    iterbility = get_iterbility[t] = False
    else:
    iterbility = get_iterbility[t] = True

    if callable(iterbility):
    iterbility, iterable = iterbility(iterable)

    if not iterbility:
    yield iterable
    else:
    for elem in iterable:
    for subelem in flatten_fastdictdef(elem, get_iterbility):
    yield subelem


    This gave the results you see above: even faster than dictcond. The thing
    is, I don't have any idea why. The function call overhead doesn't seem to
    be enough to explain the difference between the try version of dictdef and
    fastdictdef. Nor the name rebinding (which is very fast). Anyone have any
    ideas?

    Second, is the speed gain of fastdictdef over trydictdef worth the loss of
    specifying a defaulthandler that can dictate what goes into the cache
    dictionary and modify sequences? (I know, "it depends", but in the general
    case, is that a feature anyone would ever *need*? I can't see how.)

    --
    Francis Avila
     
    Francis Avila, Nov 2, 2003
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Francis Avila
    Replies:
    23
    Views:
    803
    Francis Avila
    Oct 30, 2003
  2. Replies:
    2
    Views:
    414
    Steve Pope
    Sep 27, 2006
  3. Rustom Mody
    Replies:
    266
    Views:
    709
    Dennis Lee Bieber
    Apr 1, 2014
  4. vasudevram
    Replies:
    1
    Views:
    80
    Mark H Harris
    Mar 25, 2014
  5. Dan Stromberg
    Replies:
    6
    Views:
    93
    Mark H Harris
    Mar 29, 2014
Loading...

Share This Page