merge & de-duplicate lists

Discussion in 'Python' started by Alan Little, Oct 8, 2003.

  1. Alan Little

    Alan Little Guest

    I need to merge and de-duplicate some lists, and I have some code
    which works but doesn't seem particularly elegant. I was wondering if
    somebody could point me at a cleaner way to do it.

    Here's my function:

    +++++++++++++++++++

    from sets import Set

    def mergeLists (intialList, sourceOfAdditionalLists,
    nameOfMethodToCall) :
    workingSet = Set(initialList)
    for s in sourceOfAdditionalLists :
    getList = s.__getAttribute__(nameOfMethodToCall)
    workingSet = workingSet.union(Set \
    (callable(getList) and getList() or getList))
    return workingSet

    ++++++++++++++

    Two questions - passing the *name* of the method to call, and then
    looking it up for each object in the list of extra sources (all of
    which need to be new-style objects - not a problem in my application)
    seems inelegant. My "sourcesOfAdditionalLists" are normally all of the
    same class - is there something I can bind at class level that
    automagically refers to instance-level attributes when invoked?

    Second (hopefully clearer & less obscure) question : is
    sets.Set.union() an efficient way to do list de-duplication? Seems
    like the obvious tool for the job.
     
    Alan Little, Oct 8, 2003
    #1
    1. Advertising

  2. Alan Little wrote:

    > I need to merge and de-duplicate some lists, and I have some code
    > which works but doesn't seem particularly elegant. I was wondering if
    > somebody could point me at a cleaner way to do it.
    >
    > Here's my function:
    >
    > +++++++++++++++++++
    >
    > from sets import Set
    >
    > def mergeLists (intialList, sourceOfAdditionalLists,
    > nameOfMethodToCall) :
    > workingSet = Set(initialList)
    > for s in sourceOfAdditionalLists :
    > getList = s.__getAttribute__(nameOfMethodToCall)


    Normal expression of this line would rather be:
    getList = getattr(s, nameOfMethodToCall)

    > workingSet = workingSet.union(Set \
    > (callable(getList) and getList() or getList))
    > return workingSet
    >
    > ++++++++++++++
    >
    > Two questions - passing the *name* of the method to call, and then
    > looking it up for each object in the list of extra sources (all of
    > which need to be new-style objects - not a problem in my application)
    > seems inelegant. My "sourcesOfAdditionalLists" are normally all of the
    > same class - is there something I can bind at class level that
    > automagically refers to instance-level attributes when invoked?


    I'm not quite sure what you mean here. You can of course play
    around with descriptors, e.g. properties or your own custom ones.
    But that 'normally' is worrisome -- what happens in the not-so-
    normal cases where one or two items are of a different class...?


    > Second (hopefully clearer & less obscure) question : is
    > sets.Set.union() an efficient way to do list de-duplication? Seems
    > like the obvious tool for the job.


    Well, .union must make a new set each time and then you throw
    away the old one; this is inefficient in much the same way in
    which concatenating a list of lists would be if you coded that:
    result = baselist
    for otherlist in otherlists:
    result = result + otherlist
    here, too, the + would each time make a new list and you would
    throw away the old one with the assignment. This inefficiency
    is removed by in-place updates, e.g. result.extend(otherlist)
    for the case of list concatenation, and
    workingSet.update(otherlist)
    for your case (don't bother explicitly making a Set out of
    the otherlist -- update can take any iterable).

    Overall, I would code your function (including some
    renamings for clarity, and the removal of a very error
    prone, obscure and useless and/or -- just imagine what
    would happen if getList() returned an EMPTY list...) as
    follows:

    def mergeLists(intialList, sourceOfAdditionalLists,
    nameOfAttribute):
    workingSet = Set(initialList)
    for s in sourceOfAdditionalLists :
    getList = getattr(s, nameOfAttribute)
    if callable(getList): getList=getList()
    workingSet.update(getList)
    return workingSet


    Alex
     
    Alex Martelli, Oct 8, 2003
    #2
    1. Advertising

  3. Alan Little

    anton muhin Guest

    Alan Little wrote:
    > I need to merge and de-duplicate some lists, and I have some code
    > which works but doesn't seem particularly elegant. I was wondering if
    > somebody could point me at a cleaner way to do it.
    >
    > Here's my function:
    >
    > +++++++++++++++++++
    >
    > from sets import Set
    >
    > def mergeLists (intialList, sourceOfAdditionalLists,
    > nameOfMethodToCall) :
    > workingSet = Set(initialList)
    > for s in sourceOfAdditionalLists :
    > getList = s.__getAttribute__(nameOfMethodToCall)
    > workingSet = workingSet.union(Set \
    > (callable(getList) and getList() or getList))
    > return workingSet
    >
    > ++++++++++++++
    >
    > Two questions - passing the *name* of the method to call, and then
    > looking it up for each object in the list of extra sources (all of
    > which need to be new-style objects - not a problem in my application)
    > seems inelegant. My "sourcesOfAdditionalLists" are normally all of the
    > same class - is there something I can bind at class level that
    > automagically refers to instance-level attributes when invoked?

    If they all are of the same class, you may just introduce method to call
    that returns your lists.

    BTW, your design might be not perfect. I personally whould rather split
    this function into a couple: one to merge lists and the second that will
    produce lists to merege (this approach might help with the problem above).

    regards,
    anton.
     
    anton muhin, Oct 8, 2003
    #3
  4. Alan Little

    Alan Little Guest

    Thanks for the advice guys. I've only been playing with python in my
    spare time for a few weeks, and am hugely impressed with how clean and
    fun and powerful it all is (compared to digging in the java and oracle
    mines, which is what I do in the day job). Getting this sort of
    informed critique of my ideas is great.
     
    Alan Little, Oct 8, 2003
    #4
  5. Alan Little

    Alan Little Guest

    Alex wrote:

    > > Two questions - passing the *name* of the method to call, and then
    > > looking it up for each object in the list of extra sources (all of
    > > which need to be new-style objects - not a problem in my application)
    > > seems inelegant. My "sourcesOfAdditionalLists" are normally all of the
    > > same class - is there something I can bind at class level that
    > > automagically refers to instance-level attributes when invoked?

    >
    > I'm not quite sure what you mean here. You can of course play
    > around with descriptors, e.g. properties or your own custom ones.
    > But that 'normally' is worrisome -- what happens in the not-so-
    > normal cases where one or two items are of a different class...?


    what I was getting at was that I was thnínking I would like to be able
    to call my function something like this (pseudocode):

    def f (listOfObjects, method) :
    for o in listOfObjects :
    o.method # passed method gets invoked on the instances

    l = [a,b,c] # collection of objects of known class C
    result = f(l, C.method) # call with method of the class

    Stashing the method name in a string is a level of indirection that
    somehow felt to me like it *ought* to be unnecessary, but I think I
    was just thinking in non-pythonic function pointer terms. Which
    doesn't work in a world where (a) we don't know if two objects of the
    same class have the same methods, and (b) we have the flexibility to
    write functions that will work with anything that has a method of the
    requisite name, regardless of its type.

    Just learning out loud here (with the help of your Nutshell book, I
    might add).
     
    Alan Little, Oct 8, 2003
    #5
  6. Alan Little

    anton muhin Guest

    Alan Little wrote:
    > def f (listOfObjects, method) :
    > for o in listOfObjects :
    > o.method # passed method gets invoked on the instances
    >
    > l = [a,b,c] # collection of objects of known class C
    > result = f(l, C.method) # call with method of the class
    >


    Maybe the following might interest you:

    1.

    class A(object):
    def fooA(self):
    return "fooA"

    class B(object):
    def fooB(self):
    return "fooB"

    a, b = A(), B()

    invoke_fooA = lambda obj: obj.fooA()
    invoke_fooB = lambda obj: obj.fooB()

    print invoke_fooA(a)
    ptin invoke_fooB(b)

    2. If the class is the same you can use:

    class A(object):
    def foo(self):
    return "A.foo"

    method = A.foo

    a = A()

    print method(a)

    HTH,
    anton.
     
    anton muhin, Oct 8, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kakarot
    Replies:
    2
    Views:
    6,472
    Alex Leung
    Jun 28, 2003
  2. kiki
    Replies:
    0
    Views:
    2,919
  3. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    448
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
  4. Find the node where 2 lists merge

    , Apr 21, 2006, in forum: C Programming
    Replies:
    9
    Views:
    375
    Amit Bhatnagar
    Apr 26, 2006
  5. Bobby Edward
    Replies:
    1
    Views:
    5,392
    groker
    Apr 7, 2009
Loading...

Share This Page