Re: Py2.3: Feedback on Sets (fwd)

Discussion in 'Python' started by David Mertz, Aug 16, 2003.

  1. David Mertz

    David Mertz Guest

    > * Is there a compelling need for additional set methods like
    > Set.powerset() and Set.isdisjoint(s) or are the current
    > offerings sufficient?


    I confess that I have not used sets for anything beyond testing. I love
    the concept, but I just haven't had the need yet (especially in
    something where I want to require 2.3).

    The mention of Set.powerset() above is quite interesting to me. It
    feels both exciting and dangerous :).

    As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
    Seems like it would be really easy to run into some long runtimes and
    memory usage. Then again, power set really is a fundamental operation
    on sets. And making users rewrite it each time is error prone.

    So I think I would advocate it, but with a fairly harsh warning in the
    documentation about complexity issues.

    Yours, David...

    --
    mertz@ | The specter of free information is haunting the `Net! All the
    gnosis | powers of IP- and crypto-tyranny have entered into an unholy
    ..cx | alliance...ideas have nothing to lose but their chains. Unite
    | against "intellectual property" and anti-privacy regimes!
    -------------------------------------------------------------------------
    X-Shameless-Plug: Buy Text Processing in Python: http://tinyurl.com/jskh
     
    David Mertz, Aug 16, 2003
    #1
    1. Advertising

  2. David Mertz

    Aahz Guest

    In article <>,
    David Mertz <> wrote:
    >
    >The mention of Set.powerset() above is quite interesting to me. It
    >feels both exciting and dangerous :).
    >
    >As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
    >Seems like it would be really easy to run into some long runtimes and
    >memory usage. Then again, power set really is a fundamental operation
    >on sets. And making users rewrite it each time is error prone.


    Maybe make it an iterator? <0.3 wink>
    --
    Aahz () <*> http://www.pythoncraft.com/

    This is Python. We don't care much about theory, except where it intersects
    with useful practice. --Aahz
     
    Aahz, Aug 16, 2003
    #2
    1. Advertising

  3. David Mertz

    Gregor Lingl Guest

    David Mertz schrieb:

    >>* Is there a compelling need for additional set methods like
    >> Set.powerset() and Set.isdisjoint(s) or are the current
    >> offerings sufficient?

    >
    >
    > I confess that I have not used sets for anything beyond testing. I love
    > the concept, but I just haven't had the need yet (especially in
    > something where I want to require 2.3).
    >
    > The mention of Set.powerset() above is quite interesting to me. It
    > feels both exciting and dangerous :).


    I think it would be more general and versatile to provide a method for
    the Cantorproduct of two sets, set1.cantorproduct(set2), which could be
    used to define Set.powerset.
    Additionally implementing this as __mul__ and __pow__ methods, so you
    could write set1*set2 and set**n would be funny.

    Regards, Gregor

    >
    > As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
    > Seems like it would be really easy to run into some long runtimes and
    > memory usage. Then again, power set really is a fundamental operation
    > on sets. And making users rewrite it each time is error prone.
    >
    > So I think I would advocate it, but with a fairly harsh warning in the
    > documentation about complexity issues.
    >
    > Yours, David...
    >
    > --
    > mertz@ | The specter of free information is haunting the `Net! All the
    > gnosis | powers of IP- and crypto-tyranny have entered into an unholy
    > .cx | alliance...ideas have nothing to lose but their chains. Unite
    > | against "intellectual property" and anti-privacy regimes!
    > -------------------------------------------------------------------------
    > X-Shameless-Plug: Buy Text Processing in Python: http://tinyurl.com/jskh
    >
    >
     
    Gregor Lingl, Aug 16, 2003
    #3
  4. David Mertz

    Paul Miller Guest

    (David Mertz) wrote in message news:<>...
    > > * Is there a compelling need for additional set methods like
    > > Set.powerset() and Set.isdisjoint(s) or are the current
    > > offerings sufficient?

    >


    > As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
    > Seems like it would be really easy to run into some long runtimes and
    > memory usage. Then again, power set really is a fundamental operation
    > on sets. And making users rewrite it each time is error prone.


    How is it error prone? if P is the power set operation, and S and X
    are sets, then X in P(S) is equivalent to X is a subset of S. I don't
    think that's all that error prone. The times when you will want to
    iterate over the *entire* power set of a set are probably very few,
    simply because of complexity issues. I can't see iterating over the
    whole P(S) where S contains more than about 100 elements being time
    feasible.
     
    Paul Miller, Aug 16, 2003
    #4
  5. [Raymond]
    > > * Is there a compelling need for additional set methods like
    > > Set.powerset() and Set.isdisjoint(s) or are the current
    > > offerings sufficient?


    [David Mertz]
    > I confess that I have not used sets for anything beyond testing. I love
    > the concept, but I just haven't had the need yet (especially in
    > something where I want to require 2.3).
    >
    > The mention of Set.powerset() above is quite interesting to me. It
    > feels both exciting and dangerous :).



    There would need to be compelling use cases before it would be
    added to the API.

    How about just adding a cut-and-paste recipe to the docs:

    from sets import Set
    def powerset(iterable):
    data = list(iterable)
    for i in xrange(2**len(data)):
    yield Set([e for j,e in enumerate(data) if i&(1<<j)])

    >>> list(powerset('abc'))

    [Set([]), Set(['a']), Set(['b']), Set(['a', 'b']), Set(['c']), Set(['a', 'c']),
    Set(['c', 'b']), Set(['a', 'c', 'b'])]


    Raymond Hettinger
     
    Raymond Hettinger, Aug 16, 2003
    #5
  6. David Mertz

    David Mertz Guest

    | (David Mertz) wrote
    |> Then again, power set really is a fundamental operation
    |> on sets. And making users rewrite it each time is error prone.

    (Paul Miller) wrote previously:
    |How is it error prone? if P is the power set operation, and S and X
    |are sets, then X in P(S) is equivalent to X is a subset of S. I don't
    |think that's all that error prone.

    This looks more like a definition of a method Set.ispowerset(), not one
    of Set.powerset(). The actual generation of P(S) is the part I think
    programmers could do wrong. I'm not saying that they might not usually
    do it right... but mistakes happen.

    Raymond Hettinger also wrote:
    |from sets import Set
    |def powerset(iterable):
    | data = list(iterable)
    | for i in xrange(2**len(data)):
    | yield Set([e for j,e in enumerate(data) if i&(1<<j)])

    Hmmm... I should have checked the docs first. A sample implementation
    obviously helps. That said, this isn't REALLY an implementation of
    powerset. It returns an iterator, not a set. Now, sure... an iterator
    is a BETTER thing to get, for lots of reasons. But I'm not sure it
    lives up to the function name.

    Yours, David...

    --
    Keeping medicines from the bloodstreams of the sick; food from the bellies
    of the hungry; books from the hands of the uneducated; technology from the
    underdeveloped; and putting advocates of freedom in prisons. Intellectual
    property is to the 21st century what the slave trade was to the 16th.
     
    David Mertz, Aug 17, 2003
    #6
  7. David Mertz wrote:
    ...
    > Raymond Hettinger also wrote:
    > |from sets import Set
    > |def powerset(iterable):
    > | data = list(iterable)
    > | for i in xrange(2**len(data)):
    > | yield Set([e for j,e in enumerate(data) if i&(1<<j)])
    >
    > Hmmm... I should have checked the docs first. A sample implementation
    > obviously helps. That said, this isn't REALLY an implementation of
    > powerset. It returns an iterator, not a set. Now, sure... an iterator
    > is a BETTER thing to get, for lots of reasons. But I'm not sure it
    > lives up to the function name.


    Surely writing

    def therealpowerset(iterable):
    return Set(powerset(iterable))

    (or just inlining the Set call) isn't beyond the abilities of most
    prospective users. Just like one calls list(x) on any iterable x
    if one needs specifically a list, so does one call Set(x) if one
    needs specifically a set. Sure, the name is debatable, and maybe
    'subsetsof' would be neater. But I think this is quibbling.

    IMHO, one 'real' issue with this function is that it behaves strangely
    (to me) when iterable has duplicated elements -- namely, the
    resulting iterator also has duplications. Changing the single
    statement that is currently:
    data = list(iterable)
    into
    data = Set(iterable)
    would make duplications in 'iterable' irrelevant instead.



    Alex
     
    Alex Martelli, Aug 17, 2003
    #7
  8. "Alex Martelli" <> wrote in message
    news:...
    > David Mertz wrote:
    > ...
    > > Raymond Hettinger also wrote:
    > > |from sets import Set
    > > |def powerset(iterable):
    > > | data = list(iterable)
    > > | for i in xrange(2**len(data)):
    > > | yield Set([e for j,e in enumerate(data) if i&(1<<j)])
    > >
    > > Hmmm... I should have checked the docs first. A sample implementation
    > > obviously helps. That said, this isn't REALLY an implementation of
    > > powerset. It returns an iterator, not a set. Now, sure... an iterator
    > > is a BETTER thing to get, for lots of reasons. But I'm not sure it
    > > lives up to the function name.

    >
    > Surely writing
    >
    > def therealpowerset(iterable):
    > return Set(powerset(iterable))
    >
    > (or just inlining the Set call) isn't beyond the abilities of most
    > prospective users. Just like one calls list(x) on any iterable x
    > if one needs specifically a list, so does one call Set(x) if one
    > needs specifically a set. Sure, the name is debatable, and maybe
    > 'subsetsof' would be neater. But I think this is quibbling.
    >
    > IMHO, one 'real' issue with this function is that it behaves strangely
    > (to me) when iterable has duplicated elements -- namely, the
    > resulting iterator also has duplications. Changing the single
    > statement that is currently:
    > data = list(iterable)
    > into
    > data = Set(iterable)
    > would make duplications in 'iterable' irrelevant instead.


    Good call.
    Do you guys want the revised version added to the docs
    or perhaps saved as an ASPN recipe?

    >>> from sets import Set
    >>> def powersetiterator(iterable):

    .... data = Set(iterable)
    .... for i in xrange(2**len(data)):
    .... yield Set([e for j,e in enumerate(data) if i&(1<<j)])

    >>> Set(powersetiterator('abc'))

    Set([ImmutableSet([]), ImmutableSet(['a']), ImmutableSet(['c']),
    ImmutableSet(['b']), ImmutableSet(['c', 'b']), ImmutableSet(['a', 'c']),
    ImmutableSet(['a', 'b']), ImmutableSet(['a', 'c', 'b'])])


    Raymond Hettinger
     
    Raymond Hettinger, Aug 17, 2003
    #8
  9. David Mertz

    David Mertz Guest

    |> Raymond Hettinger:
    |> |def powerset(iterable): ...

    |David Mertz wrote:
    |> Hmmm... I should have checked the docs first. A sample implementation
    |> obviously helps. That said, this isn't REALLY an implementation of
    |> powerset.

    Alex Martelli <> wrote previously:
    |def therealpowerset(iterable):
    | return Set(powerset(iterable))

    Sure. But naming the first thing 'subsetsof()' (or 'ipowerset()' to
    match some of the new itertools functions) and the second just
    'powerset()' seems better to me. Or maybe the second should be named:
    powerset_but_use_at_your_own_risk().

    However, my point above is that fewer users will lookup a sample
    implementation than would use a module function. And somewhere in that
    difference, some of them will do something wrong.

    |IMHO, one 'real' issue with this function is that it behaves strangely
    |(to me) when iterable has duplicated elements
    | data = list(iterable) --> data = Set(iterable)

    Yeah :). Which points out that I should have actually READ Raymond's
    function, rather than just cut-and-paste it. This seems to point out
    that even the Python developers are able to make mistakes in
    implementing powerset(). Which again suggests a module function
    (implemented correctly) is worth having.

    Yours, David...

    --
    _/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: Postmodern Enterprises _/_/_/
    _/_/ ~~~~~~~~~~~~~~~~~~~~[]~~~~~~~~~~~~~~~~~~~~~ _/_/
    _/_/ The opinions expressed here must be those of my employer... _/_/
    _/_/_/_/_/_/_/_/_/_/ Surely you don't think that *I* believe them! _/_/
     
    David Mertz, Aug 17, 2003
    #9
  10. [Alex Martelli]
    > |IMHO, one 'real' issue with this function is that it behaves strangely
    > |(to me) when iterable has duplicated elements
    > | data = list(iterable) --> data = Set(iterable)


    [David Mertz]
    > Yeah :). Which points out that I should have actually READ Raymond's
    > function, rather than just cut-and-paste it. This seems to point out
    > that even the Python developers are able to make mistakes in
    > implementing powerset(). Which again suggests a module function
    > (implemented correctly) is worth having.


    It wasn't a mistake.

    I do prefer Alex's improvement because it has a weaker precondition,
    but the basic code and generation method was dead-on.

    The real issue with adding a module function or Set method for
    powerset is the lack of compelling use cases; without those, it is
    simply a cute example for the docs or an ASPN recipe.

    Also, if Tim were to chime-in, I think he would abstract the
    problem and say that the algorithm falls in the domain of
    combinatorics (for which he has written a module) and that
    powersets are just a specific case of transforming a collection
    of items into a collection of all possible sub-collections.


    Raymond Hettinger
     
    Raymond Hettinger, Aug 17, 2003
    #10
  11. David Mertz

    David Mertz Guest

    |[David Mertz]
    |> Yeah :). Which points out that I should have actually READ Raymond's
    |> function, rather than just cut-and-paste it. This seems to point out
    |> that even the Python developers are able to make mistakes in
    |> implementing powerset().

    |It wasn't a mistake.
    |I do prefer Alex's improvement because it has a weaker precondition,
    |but the basic code and generation method was dead-on.

    Well... I'm not sure about dead-on. For example:

    % cat time_powerset.py
    from sets import Set
    from time import clock

    def rh_powerset(iterable):
    data = list(iterable)
    for i in xrange(2**len(data)):
    yield Set([e for j,e in enumerate(data) if i&(1<<j)])

    def am_powerset(iterable):
    data = Set(iterable)
    for i in xrange(2**len(data)):
    yield Set([e for j,e in enumerate(data) if i&(1<<j)])

    start = clock()
    print Set(rh_powerset('aaaaaaaaaaaaaaaaaaa'))
    print 'rh_powerset(): %.4f seconds' % (clock()-start)

    start = clock()
    print Set(am_powerset('aaaaaaaaaaaaaaaaaaa'))
    print 'rh_powerset(): %.4f seconds' % (clock()-start)

    What do you reckon gets output? How about:

    % python time_powerset.py
    Set([ImmutableSet([]), ImmutableSet(['a'])])
    rh_powerset(): 74.9403 seconds
    Set([ImmutableSet([]), ImmutableSet(['a'])])
    rh_powerset(): 0.0000 seconds

    So Raymond's version is AT LEAST 750,000 times slower for this
    particular iterable than Alex' :). (Alex' is just too quick to time on
    my machine, I get zero when I extend the decimals in the time display).
    Actually, the ratios quickly get even worse if I add a couple more
    "a"s... until Raymond's raises an exception (and Alex' keeps producing
    the right answer too quickly to measure.

    I know Python isn't primarily focused on speed. But somewhere around
    the point where one function is a million times slower than an
    algorithmically equivalent one, I start to think the latter is a
    smidgeon broken.

    --
    mertz@ _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
    gnosis _/_/ Postmodern Enterprises _/_/ s r
    ..cx _/_/ MAKERS OF CHAOS.... _/_/ i u
    _/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/ g s
     
    David Mertz, Aug 18, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Raymond Hettinger

    Py2.3: Feedback on Sets

    Raymond Hettinger, Aug 12, 2003, in forum: Python
    Replies:
    21
    Views:
    708
    Christos TZOTZIOY Georgiou
    Aug 20, 2003
  2. Delaney, Timothy C (Timothy)

    RE: Py2.3: Feedback on Sets

    Delaney, Timothy C (Timothy), Aug 14, 2003, in forum: Python
    Replies:
    3
    Views:
    339
    Gary Feldman
    Aug 18, 2003
  3. Beni Cherniavsky

    Re: Py2.3: Feedback on Sets

    Beni Cherniavsky, Aug 17, 2003, in forum: Python
    Replies:
    10
    Views:
    568
    Seo Sanghyeon
    Aug 28, 2003
  4. Beni Cherniavsky

    Re: Py2.3: Feedback on Sets

    Beni Cherniavsky, Aug 20, 2003, in forum: Python
    Replies:
    1
    Views:
    315
    Christos TZOTZIOY Georgiou
    Aug 22, 2003
  5. Holger Joukl

    py2.1->py2.3.3 __getattr__ confusion

    Holger Joukl, Jul 2, 2004, in forum: Python
    Replies:
    1
    Views:
    312
    Peter Otten
    Jul 2, 2004
Loading...

Share This Page