Py2.3: Feedback on Sets

Raymond Hettinger · Aug 12, 2003

I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint(s) or are the current
offerings sufficient?

* Does the performance meet your expectations?

* Do you care that sets can only contain hashable elements?

* How about the design constraint that the argument to most
set methods must be another Set (as opposed to any iterable)?

* Are the docs clear? Can you suggest improvements?

* Are sets helpful in your daily work or does the need arise
only rarely?

User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).

Raymond Hettinger

Troels Therkelsen · Aug 12, 2003

I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

I would have to say that while I have looked at the sets module and read its
documentation, I have not used it much (more below).

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

I actually prefer the | & syntax because it makes more intuitive sense to
me, as the operators work identically to how they do when using them as
binary number operators.

[snip]

* Do you care that sets can only contain hashable elements?
No.

[snip]

* Are the docs clear? Can you suggest improvements?

The docs for the sets module, like most of the Python docs, are very good.
The example helps, too.

* Are sets helpful in your daily work or does the need arise
only rarely?

I rarely need sets explicitly, but sometimes need the logic that they offer.
For example, when wanting to concantenate two lists so that the resulting list
only has unique elements, that can be done with set logic. However, as it
is another module written in Python, and I only need the logic I usually don't
go through with the overhead of creating the Set classes, etc. If it was
better integrated (ie., a set() builtin type constructor like int(), str(),
etc) then I would feel less of a reservation against using sets. I know this
reservation isn't founded in facts, but more in the feeling of trusting
builtin types more than custom classes provided by a module.

User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).

Don't know if my feedback provided above helps, but you asked for it ;-)

Regards,

Troels Therkelsen

Carl Banks · Aug 12, 2003

Raymond said:
I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

I slightly favor | and &.

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint(s) or are the current
offerings sufficient?

I imagine isdisjoint would be useful, although it's easy enough to use
bool(s&t).

* Does the performance meet your expectations?

* Do you care that sets can only contain hashable elements?

* How about the design constraint that the argument to most
set methods must be another Set (as opposed to any iterable)?

* Are the docs clear? Can you suggest improvements?

Yeah: in the library reference, the table entry for s.union(t) should
say "synonym of s|t" instead of repeating the description. This is
especially true because it's not clear from a simple glance whether
"s.union(t)" goes with "s|t" or "s&t", because it sits right between
the two. Better yet, I would change it to a three-column table
(operation, synonym, result).

* Are sets helpful in your daily work or does the need arise
only rarely?

I haven't used sets yet (or Python 2.3), but I expect to use them a
lot. However, I imagine my typical use would be efficient testing for
membership. I have maybe half a dozen places where I use a dictionary
for that now.

Istvan Albert · Aug 12, 2003

Raymond Hettinger wrote:

First of all, thanks for the work on it, I need to use sets
in my work all the time. I had written my own
(simplistic) implementation but that adds another layer
of headaches when distributing programs since then
I have to distribute multiple modules.

Sometimes I ended up with a little set function in every
big module. Pretty silly. For me sets are a greatly useful
addition.

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

One pattern that I constantly need is to remove duplicates from
a sequence. I don't know if this an often enough used pattern to
warrant an API change, for me it would be most useful if I could
get the contents of a set as a sequence right away, without having to
explicitly code it.

> * Are you overjoyed/outraged by the choice of | and & as
> set operators (instead of + and *)?

I think that since you have have - as a difference operator it
would make sense to also have + as a union operator. Takes nothing
away from |. The & operator is the right one, * would not be appropriate
IMO.

* Do you care that sets can only contain hashable elements?

I don't really care, on the other hand, it might be better to call the
class HashSet, so that it conveys right away that it uses hashing
to store the elements.

* Are the docs clear? Can you suggest improvements?

I wondered whether it would be better to specify the immutability
of the class at the constructor level.

Then there is the update method. It feels a little bit redundant
since there is an add() method that seems to be doing the same thing
only that add() adds only one element at a time.
Would it be possible to have add() handle all additions, iterable or
not, then scrap update() altogether.

Then just by looking at the docs, it feels a little bit confusing to
have discard() and remove() do essentially the same thing but only one
of them raising an exception. Which one? I already forgot. I don't know
which one I would prefer though.

Another aspect that I did not understand, what is difference between
update() and union_update().

The long winded method names, such as difference_update() also feel
redundant when one can achieve the same thing with the -= operator. I
would drop these and instead show in the docs how to accomplish these
with the operators. Would considerably cut down on the documentation,
and apparent complexity.

I'm a big fan of having the minimal number of methods as long it is
easy to obtain the result.

For example methods like x.issubset(y) is the same as bool(x-y) so may
not be all that necessary, just a thought.

* Are sets helpful in your daily work or does the need arise
only rarely?

I use them very often and they are extremely useful.

thanks again,

Istvan.

Radovan Garabik · Aug 13, 2003

Raymond Hettinger said:
I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

I would prefer to have + in addition to |. I do not care
about *, but IMHO + is so intuitive and natural that
it is a pity not to have it (you can add lists, tuples,
strings with +, but not sets???)

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

So far my code used dictionaries with values set to None,
I expect that I will use sets soon, it will be more logical
and the code more readable.

* Are sets helpful in your daily work or does the need arise
only rarely?

daily work, production code, but so far I used lists or
dictionaries instead.
I am somewhat afraid of performance, until setrs are implemented
in C, though

--
-----------------------------------------------------------
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Russell E. Owen · Aug 13, 2003

So far my code used dictionaries with values set to None,
I expect that I will use sets soon, it will be more logical
and the code more readable.

Same here.

I don't rely on sets heavily (I do have a few implemented as
dictionaries with value=None) and am not yet ready to make my users
upgrade to Python 2.3.

I suspect the upgrade issue will significantly slow the incorporation of
sets and the other new modules, but that over time they're likely to
become quite popular. I am certainly looking forward to using sets and
csv.

I think it'd speed the adoption of new modules if they were explicitly
written to be compatible with one previous generation of Python (and
documented as such) so users could manually include them with their code
until the current generation of Python had a bit more time to be adopted.

I'm not saying this should be a rule, only suggesting it as a useful
goal. Presumably it'd be easy with some modules and not worth the work
in some cases.

-- Russell

Chris Reedy · Aug 13, 2003

Raymond -

Well now that you ask ...

Raymond said:
I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

I think that choice appeals to me more than + and * (which are already
more overloaded than I would like). I haven't seen any suggestions that
I liked better.

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

Yes I need it (desperately). Generally works as I need. However, see
more comments below.

* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint(s) or are the current
offerings sufficient?

I haven't felt the need yet. So far I've been satisfied with:

if x & y:

as opposed to

if x.isdisjoint(y)

* Does the performance meet your expectations?

So far. However, so far, I haven't been trying meet any demanding
performance requirements.

* Do you care that sets can only contain hashable elements?

This is an interesting question. In particular, I have found myself on
more than one occasion doing the following:

for x in interesting_objects:
x.foo = Set()
while something_to_do:
somex.foo |= something_I_just_computed
for x in interesting_objects:
x.foo = ImmutableSet(x.foo)
build_some_more_sets(somex.foo)

I'm not sure whether I like having to go back and change all my sets to
Immutable ones after I've finished the computation. (Or whether I just
ought to make x.foo immutable all the time.) I did appreciate the
ImmutableSet type, since it allows me to flag to myself that I don't
expect a set to change further.

* How about the design constraint that the argument to most
set methods must be another Set (as opposed to any iterable)?

In some cases I've run into that. Since I can create a set with any
iterable I've been able to do:

set op Set(iterable)

I think I might be interested in using a general iterable if that would
get me some advantage (maybe significantly faster).

* Are the docs clear? Can you suggest improvements?

No problems here. However, my background is math, and I've never had
problems with documentation (I started my career learning IBM mainframe
assembly language programming from the reference manuals) so I don't
think I'm a good test case.

* Are sets helpful in your daily work or does the need arise
only rarely?

I'm working on a project where they are critical. If it hadn't been
supplied I would have implemented one myself. I was using the backported
version of the set module with 2.2 before 2.3 came out.

User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).

Raymond Hettinger

Chris

Skip Montanaro · Aug 13, 2003

Russell> I suspect the upgrade issue will significantly slow the
Russell> incorporation of sets and the other new modules, but that over
Russell> time they're likely to become quite popular. I am certainly
Russell> looking forward to using sets and csv.

The csv module (and the _csv module which underpins it) should work with
2.2.3. If they don't, please file a bug report.

Russell> I think it'd speed the adoption of new modules if they were
Russell> explicitly written to be compatible with one previous
Russell> generation of Python (and documented as such) so users could
Russell> manually include them with their code until the current
Russell> generation of Python had a bit more time to be adopted.

That was the intention with the csv module. I wonder if some limitations to
use of sets with 2.2.x could be gotten around by adding a __future__ import?
Maybe itertools is also needed.

Russell> I'm not saying this should be a rule, only suggesting it as a
Russell> useful goal. Presumably it'd be easy with some modules and not
Russell> worth the work in some cases.

Yes, that's a worthwhile goal.

Skip

Andrew Dalke · Aug 14, 2003

Gary Feldman:

Also, I'd like to see "iterable must be <some type spec>",
though this is a general flaw in the Python doc and is perhaps
biased by my C/C++ background where you'd never dream
of doing a reference manual without explicitly indicating the
types of every parameter.

Python uses what is sometimes called "duck typing" (meaning,
if it quacks like a duck...). Lots of objects are iterable - strings,
lists, sets, dict (keys), and user-defined classes. Since you
prefer C++, think of Python more akin to templates. Templates
expect the objects templated on to have certain properties (can
be "+"ed, can be deferenced, has a method named "xyz") and
not that they have given types.

Personally, I have hard time imagining where I'd want
[remove]. If I really cared, I could check beforehand, so I think
I'd just always use discard.

I'm the other way around. I find it hard to imagine where
I would call discard. If I want to remove an element from a
set then I want to know right away if that element isn't there.
It's been handy for tracking down bugs in my code.

5.12.2
engineering_management = engineers & programmers

Actually, I don't like that example because there is too
much text to read through to see the actual symbols used.

PS I suppose I should mention my strongest pet peeve
with the Python documentation, which is the practice of
putting the member functions on a different page than
the class overview. But that's not your issue, either.

And I confess that I like to see everything on one page
and not split up between several pages. That way I can
use my browser's search facility.

Andrew
(e-mail address removed)

Raymond Hettinger · Aug 14, 2003

Skip Montanaro said:
Russell> I suspect the upgrade issue will significantly slow the
Russell> incorporation of sets and the other new modules, but that over
Russell> time they're likely to become quite popular. I am certainly
Russell> looking forward to using sets and csv.

The csv module (and the _csv module which underpins it) should work with
2.2.3. If they don't, please file a bug report.

Russell> I think it'd speed the adoption of new modules if they were
Russell> explicitly written to be compatible with one previous
Russell> generation of Python (and documented as such) so users could
Russell> manually include them with their code until the current
Russell> generation of Python had a bit more time to be adopted.

That was the intention with the csv module. I wonder if some limitations to
use of sets with 2.2.x could be gotten around by adding a __future__ import?
Maybe itertools is also needed.

In the documentation for the itertools module, I intensionally included
pure python versions of each tool that make backporting easy. You
can cut and paste the documentation into a module with
from __future__ import generators and have a Py2.2 version of
itertools that would enable the sets module to run just fine.

Still, why not upgrade to Py2.3? The bug fixes were all ported to 2.2.3
and into Py2.3 so that the essential differences are the new modules
and some minor language improvements.

Raymond Hettinger

Michael Hudson · Aug 14, 2003

Raymond Hettinger said:
I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

I'd actually rather sets didn't overload any operators at all, but
appreciate that this may be a minority position.

| and & is the only sane choice, however.

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

I don't use them as much as I should, I suspect.

* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint(s) or are the current
offerings sufficient?

I've not reached for something and not found it there yet.

* Does the performance meet your expectations?

My uses so far have not had even the faintest of performance demands,
so, yes.

* Do you care that sets can only contain hashable elements?

Not yet.

Cheers,
mwh

Bob Gailer · Aug 14, 2003

After giving blanket approval to the docs I now add:

I have a mission to set some new guidelines for Python documentation.
Perhaps this is a good place to start.
Example - currently we have:

class Set( [iterable])
Constructs a new empty Set object. If the optional iterable parameter is
supplied, updates the set with elements obtained from iteration. All of the
elements in iterable should be immutable or be transformable to an
immutable using the protocol described in section
<http://www.python.org/doc/current/lib/immutable-transforms.html#immutable-transforms>5.12.3.

Problems:
The result of Set appears to be an empty Set object. The fact that it might
be filled is hidden in the parameter description.
The parameter description itself is hidden in the paragraph, making it
harder to find, especially when the reader is in a hurry.

Some suggested guidelines to improve readability and understandability:
1 - label each paragraph so we know what it is about
2 - have a function paragraph that briefly but completely describes the
function
3 - have labeled sections for things that can be so grouped (e.g. parameters)
4 - start the description of each thing in a new paragraph.

Example:

class Set( [iterable])
function: Constructs a new empty Set object and optionally fills it.
parameters:
iterable [optional] if supplied, updates the set with elements
obtained from
iteration. All of the elements in iterable should be immutable or be
transformable to an immutable using the protocol described in
section
<http://www.python.org/doc/current/lib/immutable-transforms.html#immutable-transforms>5.12.3.

What do you think? If this layout is appealing, let's use the set docs as a
starting point to model this approach. I for one am willing to apply this
model to the rest ot the set docs, and help update other docs, but not all
of them.

BTW I also have a problem with the term "Common uses". "Common" suggests
that these are better, or more frequent. I suggest "Some examples of
application of sets".

I also agree with the suggestion that operations that are synonymous be so
indicated in the table.

Bob Gailer
(e-mail address removed)
303 442 2625

Russell E. Owen · Aug 14, 2003

Skip Montanaro said:
Russell> I suspect the upgrade issue will significantly slow the
Russell> incorporation of sets and the other new modules, but that over
Russell> time they're likely to become quite popular. I am certainly
Russell> looking forward to using sets and csv.

The csv module (and the _csv module which underpins it) should work with
2.2.3. If they don't, please file a bug report.

That's excellent news. It might be worth adding it to the documentation,
e.g. "new in version 2.3 but compatible with version 2.2.x" (surely x is
1 (with True/False) or 0 (without), or was there really some needed
feature change in 2.2.3?).

That was the intention with the csv module. I wonder if some limitations to
use of sets with 2.2.x could be gotten around by adding a __future__ import?
Maybe itertools is also needed.

That is an interesting question. Mind you, I have no idea if sets is
compatible with 2.2.x or not; I didn't try since it wasn't documented
and I didn't want to risk missing some obscure bug.

-- Russell

John Baxter · Aug 15, 2003

"Andrew Dalke said:
I read some mention of using "|" instead of "+", so I knew
to use it. I would have liked +, but not *. I know the logic
for thinking * but & doesn't have the other connotations
* has (like [1] * 2, "a"*9)

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

Click to expand...

After years of using Python without sets, I hand built a specialized
intersection a couple of months ago. Knowing the Sets module was
coming, I did only what I needed at that moment, and didn't bother
optimizing it (it takes a few seconds to do what I need...removing a
second or two isn't useful). (I worked around a "need" for difference
by changing the input generation in the overall problem.)

So..."necessary" is too strong here, but "a good thing" is certainly
apt. If I only get to choose yes or no for "necessary" the answer is
"yes".

--John

Raymond Hettinger · Aug 15, 2003

"Istvan Albert"

One pattern that I constantly need is to remove duplicates from
a sequence. I don't know if this an often enough used pattern to
warrant an API change, for me it would be most useful if I could
get the contents of a set as a sequence right away, without having to
explicitly code it.

['a', 'r', 'b', 'c', 'd']

I wondered whether it would be better to specify the immutability
of the class at the constructor level.

ImmutableSet is available as a constructor.

Then there is the update method. It feels a little bit redundant
since there is an add() method that seems to be doing the same thing
only that add() adds only one element at a time.
Would it be possible to have add() handle all additions, iterable or
not, then scrap update() altogether.

Not really.
Set.update() is for vectorizing high volume additions.
There is some analogy to list.append() vs. list.extend().

have discard() and remove() do essentially the same thing but only one
of them raising an exception. Which one? I already forgot. I don't know
which one I would prefer though.

Will clarify the docs.

Another aspect that I did not understand, what is difference between
update() and union_update().

update() works with any iterable and union_update() only with another Set.
If the API is liberized to allow any iterable for most operations, then
the distinction will vanish.

The long winded method names, such as difference_update() also feel
redundant when one can achieve the same thing with the -= operator. I
would drop these and instead show in the docs how to accomplish these
with the operators. Would considerably cut down on the documentation,
and apparent complexity.

That is a good thought; however,
some find a.union(b) to be more readable than a|b
and some find that a.symmetric_difference is more memorable than a^b.

For example methods like x.issubset(y) is the same as bool(x-y) so may
not be all that necessary, just a thought.

Granted. However:

* issubset has an early out algorithm and consumes contant memory.
In contrast, bool(x-y) builds a whole new set and then throws it away.
* issubset and issuperset are somewhat basic set operations

I use them very often and they are extremely useful.

Me too.

Raymond Hettinger

Terry Reedy · Aug 15, 2003

Raymond Hettinger said:
"Istvan Albert"

I agree that this is confusing -- like having both str.find and
str.index. I would prefer one delete function with an optional param
'silent' to switch its 'not there' response from the default (either
True or False, according to what seems to be the more common usage) to
the other choice. (I know, I should have read draft more carefully
and commented last fall -- but this seems like the sort of redundancy
that Guido wants to remove in 3.0.)

Terry J. Reedy

Gerrit Holl · Aug 15, 2003

Raymond said:
Subject: Py2.3: Feedback on Sets

* Do you care that sets can only contain hashable elements?

This is the only disadvantage for me.

For the rest, I am happy about it. I am already using it a lot
on places where I used lists before, but where a Set is much
better (no order, no duplicates, it really *is* a set)

User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).

I really like them. I would also like to be able to do
{elem for elem in set if foo(elem)} to construct a subset.

Gerrit.

Raymond Hettinger · Aug 15, 2003

"Russell E. Owen"

I don't rely on sets heavily (I do have a few implemented as
dictionaries with value=None) and am not yet ready to make my users
upgrade to Python 2.3.

I suspect the upgrade issue will significantly slow the incorporation of
sets and the other new modules, but that over time they're likely to
become quite popular. I am certainly looking forward to using sets and
csv.

I think it'd speed the adoption of new modules if they were explicitly
written to be compatible with one previous generation of Python (and
documented as such) so users could manually include them with their code
until the current generation of Python had a bit more time to be adopted.

Wish granted!

The sets module now will run under Py2.2.
It should be available for download from CVS after 24 hours:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/sets.p
y

Raymond Hettinger

Raymond Hettinger · Aug 16, 2003

"Gary Feldman"

I haven't used them yet, but since I'm working my way through
the docs in general, I thought I'd check them out and comment.

All of the issues you found have been fixed (except for the discussion of
what an iterable parameter means -- that will be addressed elsewhere).

Raymond Hettinger

Raymond Hettinger · Aug 17, 2003

"John Smith"

Suggestion: How about adding Set.isProperSubset() and
Set.isProperSuperset()?

We have them in operator form: a<b a>b
Spelling them out did not seem to add much value.
This is doubly true because some people read it
as s.isProperSubsetOf(t) and others read it as
s.hasTheProperSubset(t).

Raymond Hettinger

Thanks for this wonderful module. I've been working on data mining and
machine
learning area using Python. Set operations are very important to me.

Great. You'll love it even more when I implement it in C.

Raymond Hettinger

Feedback on Sets, and Partitions	7	Apr 30, 2004
Request for feedback on API design	4	Dec 9, 2010
More user feedback on Sets.py	11	Nov 7, 2003
Feedback on a design decision?	15	Oct 5, 2006
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
suggestions on intelligent processing of data sets in a file	2	May 9, 2007
ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014
set, dict and other structures	8	Jan 31, 2005

Py2.3: Feedback on Sets

Raymond Hettinger

Troels Therkelsen

Carl Banks

Istvan Albert

Radovan Garabik

Russell E. Owen

Chris Reedy

Skip Montanaro

Andrew Dalke

Raymond Hettinger

Michael Hudson

Bob Gailer

Russell E. Owen

John Baxter

Raymond Hettinger

Terry Reedy

Gerrit Holl

Raymond Hettinger

Raymond Hettinger

Raymond Hettinger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads