Re: Py2.3: Feedback on Sets

Discussion in 'Python' started by Beni Cherniavsky, Aug 17, 2003.

  1. In comp.lang.python, you wrote:
    >I've gotten lots of feedback on the itertools module
    >but have not heard a peep about the new sets module.
    >
    >* Are you overjoyed/outraged by the choice of | and &
    > as set operators (instead of + and *)?
    >

    ``&`` and ``|`` are good. In math too, the union/intersection operators are
    very similar to the or/and opearators. The later are overloaded as
    addition/mulitplication in boolead algebra only because the or/and operators
    are so similar and inconvenient. This overloading is quite misguided because
    these operators don't define a field or even a ring (xor/and would do that).

    >* Is the support for sets of sets necessary for your work
    > and, if so, then is the implementation sufficiently
    > powerful?
    >

    I used sets very heavily in my last project, including nested sets. The
    implementation is powerful enough. The concept of sets themselves became too
    weak at some moment (I needed to associate data with each element) but I got
    so used to the powerful union/intersection/etc. operations that I
    implemented__ them over dicts...

    __ http://www.technion.ac.il/~cben/python/dsets.py

    I suspect that the set/dict boundary will be crossed quite frequently by
    people; it's hard to know beforehand that you'll never want to asssociate
    values with the keys. Besides, set operators are a very powerful and
    high-level way to manipulate dicts. So perhaps it'd be a good idea to
    integrate sets with dicts more along the lines of my dsets module. So that
    dicts grow in power as well.

    The biggest lesson from my dsets module is that for set operations on dicts,
    you really want to choose how to handle collisions (both sets have the same
    key so two values contend for the result). And you want to choose it
    per-operation. It's OK if you can only do it on named methods but not on
    overloaded operators (there is no good syntax for that).

    My dsets module treats iterables (including sets) as dsets whose values are
    `None`. This direction also suggests that sets could be true dicts: when you
    omit the value in a dict initializer, it defaults to `None`. But after some
    thinking I think it would be suboptimal. One reason: it means you should to
    choose a collision function even when working with sets whose value are
    meaningless, which makes little sense. I now think it'd be better to separate
    them (by presence of the __getitem__ method) and only give meaning to
    collision functions if both arguments have values. So:

    - ``<set> & <set>`` means what it means now.
    - ``<dict> & <set>`` means subset of dict.
    - ``<dict>.intersection(<dict>, <collision_func>) means what it means in
    dsets.

    But there are more complications:

    - ``<dict> | <set>`` must mean union of sets because for keys not in <dict>
    you have no value.
    - Should ``<dict> | <dict>``, ``<dict> & <dict>``, etc. raise exception on
    collisions (as now) or simply return sets?

    I'll try to release a working design soon. Feedback most welcome.

    >* Is there a compelling need for additional set methods like
    > Set.powerset() and Set.isdisjoint(s) or are the current
    > offerings sufficient?
    >

    Generating the full powerset seems pretty useless. `isdisjoint` sounds like a
    good idea since it can be implemented more effeciently than ``not a & b``.

    >* Does the performance meet your expectations?
    >

    I think so. Almost half of my program's time is spent in set.py according to
    the profiler but when I optimized the sets module (and my dsets) with psyco,
    things got several percent slower. Is this a good sign ?-)

    >* Do you care that sets can only contain hashable elements?
    >

    No more than I care for dicts containing only hashable keys. How else could
    you define it?

    What I do like is the auto-copying of mutable sets when used a set items; I
    wouldn't mind something like this to be extended into other Python types
    (primarily dicts).

    >* How about the design constraint that the argument to most
    > set methods must be another Set (as opposed to any iterable)?
    >

    I'd go for any iterable where it's enough and require __contains__ where
    that's needed. I think together this should cover all method's needs. I
    think it's better not to add artificail checks, so that other set-alike types
    can be used with sets (primarily mapping or set-alike types).

    >* Are the docs clear? Can you suggest improvements?
    >

    It's been a lot of time since I had to read them which is probably a good
    sign.

    >* Are sets helpful in your daily work or does the need arise
    > only rarely?
    >

    Helpful. Since they were released on 2.2, I feel free to import sets whenever
    it reflects my meaning clearer than dicts, even when I don't use any fancy
    operations. I find the code more readable this way.

    And sets of sets (or ImmutableSets as dict keys) are very helpful when you
    need them.

    --
    Beni Cherniavsky <>
     
    Beni Cherniavsky, Aug 17, 2003
    #1
    1. Advertising

  2. Beni Cherniavsky

    Just Guest

    In article <>,
    (Beni Cherniavsky) wrote:

    > What I do like is the auto-copying of mutable sets when used a set items; I
    > wouldn't mind something like this to be extended into other Python types
    > (primarily dicts).


    Me too. I once proposed this on python-dev but there didn't seem to be
    much interest:

    http://mail.python.org/pipermail/python-dev/2003-February/033043.html

    (I must admit, the second case I mention there has more or less gone
    away in the meantime, leaving only the C implementation for sets...)

    Just
     
    Just, Aug 17, 2003
    #2
    1. Advertising

  3. Beni Cherniavsky

    Borcis Guest

    Michael Hudson wrote:
    > + isn't totally silly (strings and lists are
    > monoids, at least :). list * int, OTOH, seems a bit smelly to me
    > (though I'm not sure what a better spelling would be...).


    But strings/lists form a non-commutative monoid, so list * list
    would be better imho, ans a corresponding list ** int
     
    Borcis, Aug 18, 2003
    #3
  4. Beni Cherniavsky

    Borcis Guest

    Michael Hudson wrote:
    >
    >>>+ isn't totally silly (strings and lists are
    >>>monoids, at least :). list * int, OTOH, seems a bit smelly to me
    >>>(though I'm not sure what a better spelling would be...).

    >>
    >>But strings/lists form a non-commutative monoid, so list * list
    >>would be better imho, and a corresponding list ** int

    >
    > Hmm, you're right of course,, but for some reason I prefer +. Maybe
    > it's the commutation with len()?


    renaming it to log() would make the *2+ homomorphism natural.

    >
    > Anyhoo, this is all pointless gassing :)


    Right.
     
    Borcis, Aug 18, 2003
    #4
  5. In article <>,
    Michael Hudson <> wrote:

    > Yup! Well, *, anyway. I'd prefer a Haskell-style ++ for
    > concatenation, but + isn't totally silly (strings and lists are
    > monoids, at least :). list * int, OTOH, seems a bit smelly to me
    > (though I'm not sure what a better spelling would be...).


    Multiplication by (nonnegative) integers is a pretty standard thing to
    do in monoids, and means close to what Python's list*int syntax does:
    add the thing to itself that many times.

    I'm not sure why multiplying a list by a negative number produces the
    empty list instead of an exception, though.

    --
    David Eppstein http://www.ics.uci.edu/~eppstein/
    Univ. of California, Irvine, School of Information & Computer Science
     
    David Eppstein, Aug 18, 2003
    #5
  6. Beni Cherniavsky

    Borcis Guest

    David Eppstein wrote:
    >
    > Multiplication by (nonnegative) integers is a pretty standard thing to
    > do in monoids, and means close to what Python's list*int syntax does:
    > add the thing to itself that many times.


    "God is a superfluous hypothesis, I need just an initial condition"

    >
    > I'm not sure why multiplying a list by a negative number produces the
    > empty list instead of an exception, though.
    >


    If it is illogical for it not to raise an exception, then in an appropriate
    universe the theory is logically permitted that some other value than the
    empty list is more logical as a result, than the empty list itself.

    The-anthropy-of-the-universe-is-everrising-ly-yours-ly
     
    Borcis, Aug 18, 2003
    #6
  7. In article <>, Borcis <> wrote:

    > > I'm not sure why multiplying a list by a negative number produces the
    > > empty list instead of an exception, though.

    >
    > If it is illogical for it not to raise an exception, then in an appropriate
    > universe the theory is logically permitted that some other value than the
    > empty list is more logical as a result, than the empty list itself.


    The appropriate theory for this being the free group generated by all
    possible Python objects?

    --
    David Eppstein http://www.ics.uci.edu/~eppstein/
    Univ. of California, Irvine, School of Information & Computer Science
     
    David Eppstein, Aug 18, 2003
    #7
  8. Beni Cherniavsky

    Bob Gailer Guest


    > > I'm not sure why multiplying a list by a negative number produces the
    > > empty list instead of an exception, though.


    Actually the list has a negative # of elements in it. It is just hard to
    show this using repr. But each of these elements uses a negative amount of
    memory, so there is more RAM available for the rest of your program. You
    just have to avoid specifying a negative # too large in magnitude, or you
    will get a memory underflow.

    Bob Gailer

    303 442 2625


    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003
     
    Bob Gailer, Aug 18, 2003
    #8
  9. Beni Cherniavsky

    Andrew Dalke Guest

    Bob Gailer:
    > You just have to avoid specifying a negative # too large in magnitude,
    > or you will get a memory underflow.


    Actually, there's a limit in how much negative memory you can get

    >>> d*(-sys.maxint-1)

    []
    >>> d*(-sys.maxint-2)

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    OverflowError: long int too large to convert to int
    >>>


    Andrew
     
    Andrew Dalke, Aug 19, 2003
    #9
  10. Beni Cherniavsky

    Rick Thomas Guest

    list * int is by analogy with vector space operations. Vector1 * Scalar
    -> Vector2

    Rick


    Michael Hudson wrote:
    > list * int, OTOH, seems a bit smelly to me
    > (though I'm not sure what a better spelling would be...).
     
    Rick Thomas, Aug 28, 2003
    #10
  11. Michael Hudson wrote:
    >> list * int, OTOH, seems a bit smelly to me
    >> (though I'm not sure what a better spelling would be...).


    Rick Thomas <> wrote:
    > list * int is by analogy with vector space operations.
    > Vector1 * Scalar -> Vector2


    Huh? list * int does repetition. How does it relate to scalar
    multiplication of vectors?

    Seo Sanghyeon
     
    Seo Sanghyeon, Aug 28, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Raymond Hettinger

    Py2.3: Feedback on Sets

    Raymond Hettinger, Aug 12, 2003, in forum: Python
    Replies:
    21
    Views:
    731
    Christos TZOTZIOY Georgiou
    Aug 20, 2003
  2. Delaney, Timothy C (Timothy)

    RE: Py2.3: Feedback on Sets

    Delaney, Timothy C (Timothy), Aug 14, 2003, in forum: Python
    Replies:
    3
    Views:
    351
    Gary Feldman
    Aug 18, 2003
  3. David Mertz

    Re: Py2.3: Feedback on Sets (fwd)

    David Mertz, Aug 16, 2003, in forum: Python
    Replies:
    10
    Views:
    541
    David Mertz
    Aug 18, 2003
  4. Beni Cherniavsky

    Re: Py2.3: Feedback on Sets

    Beni Cherniavsky, Aug 20, 2003, in forum: Python
    Replies:
    1
    Views:
    324
    Christos TZOTZIOY Georgiou
    Aug 22, 2003
  5. Holger Joukl

    py2.1->py2.3.3 __getattr__ confusion

    Holger Joukl, Jul 2, 2004, in forum: Python
    Replies:
    1
    Views:
    320
    Peter Otten
    Jul 2, 2004
Loading...

Share This Page