Help with sets

B. M. Whealton · Oct 5, 2010

Hello all,
So, I started learning python just recently. I got inspired
by a project that related to the semantic web. I can see why this would
be a language chosen for the applications that help to build the
semantic web. It is interesting to see that indeed python does have
structures that make it well suited for this kind of application, or I
seem to be able to see how certain structures in python do stand out as
unique and related to concepts I am reading about in relation to the
Semantic web. I did get a bit confused in reading about the concept of
sets in python and why you would use them instead of a dictionary for
example.
For example, here we have:
self._pos = {predicate:{object:set([subject])}}
This is said to be a dictionary, containing dictionaries, which in turn
contain sets. I don't know if it would be possible to explain why one
would use a set, especially in this context without showing more. I was
a bit stumped by this for some odd reason. I don't know if other
languages lack certain structures and features. It's been a long time
since I did C programming too.
Anyway, this would be used an a Class as follows:

class SimpleGraph:
def __init__(self):
self._spo = {}
self._pos = {}
self._osp = {}

This is to create indexes that are permutations of the pattern subject,
predicate object. So, using this:

self._pos = {predicate:{object:set([subject])}}

We have the first dictionary keyed off the first term, the
second dictionary keyed off the second term, and the set containing the
third terms(note terms plural). I guess I somewhat understand that sets
are used to test for membership. Cannot that be done with dictionaries,
though?
Also, if you ran this for loop, below, what is being yielded,
the key or the value in the dictionary? I refer to this for loop here:
for subject in self._pos[predicate][ojbect]: yield (subject, predicate,
object)

Thanks for any help and insights into this,
Bruce
--

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
B. M. Whealton - Web Developer, Programmer, Owner:
Future Wave Designs: http://FutureWaveDesigns.com
"On Being a Poet and Other Existential Ideas":>
http://WordSaladPoetryMagazine.com/words/
Publisher of Word Salad Poetry Magazine:
http://WordSaladPoetryMagazine.com/ - Submit your poetry
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Steven D'Aprano · Oct 5, 2010

I did get a bit confused in reading about the concept of
sets in python and why you would use them instead of a dictionary for
example.

Why would you use a spoon instead of a paper clip? Why would you use a
hat-stand instead of a pencil sharpener?

Sets aren't an alternative to dictionaries. They have a completely
different purpose.

It so happens that sets and dicts *in Python* happen to share some
implementation details, and they happen to share similar syntax, but
don't be fooled by these incidental details. They could be implemented
differently.

Dicts are used when you need a 1:1 mapping between a key and a value:

{name: address} # each person has exactly one home address

Sets are used when you have a list of items, but you don't care about the
order of the items, only whether or not something is in the list.

Since lists are ordered, searching a list to see if something is in it
can be slow. Sets give up order so that they can make membership testing
blindingly fast:

0.0022180080413818359

For example, here we have:
self._pos = {predicate:{object:set([subject])}} This is said to be a
dictionary, containing dictionaries, which in turn contain sets. I don't
know if it would be possible to explain why one would use a set,
especially in this context without showing more. I was a bit stumped by
this for some odd reason. I don't know if other languages lack certain
structures and features.

Of course they do. That's why we have more than one programming language.
Earlier versions of Python didn't have sets. Standard Pascal doesn't have
dicts. Neither Fortran nor COBOL supported recursion, at least early on.

It's been a long time since I did C
programming too.
Anyway, this would be used an a Class as follows:

class SimpleGraph:
def __init__(self):
self._spo = {}
self._pos = {}
self._osp = {}

This is to create indexes that are permutations of the pattern subject,
predicate object. So, using this:

self._pos = {predicate:{object:set([subject])}}

Sounds complicated. What are the indexes for? Why not just have the
permutations directly?

We have the first dictionary keyed off the first term, the
second dictionary keyed off the second term, and the set containing the
third terms(note terms plural). I guess I somewhat understand that sets
are used to test for membership. Cannot that be done with dictionaries,
though?
Also, if you ran this for loop, below, what is being yielded,
the key or the value in the dictionary? I refer to this for loop here:
for subject in self._pos[predicate][ojbect]: yield (subject, predicate,
object)

What is yielded is exactly what you tell Python to yield: a tuple
containing three items (subject, predicate, object), whatever they are.

Paul Rudin · Oct 5, 2010

B. M. Whealton said:
I did get a bit confused in reading about the concept of sets in
python and why you would use them instead of a dictionary for example.

Use a set when something is naturally modelled as a set... it's a
collection of unordered objects that you can test for membership, form
unions, intersections etc. reasonably efficiently.

Use a dictionary when you want something that's naturally modelled as a
dictionary - i.e. you have a unordered collection of keys each of which
has a value associated with it and you want reasonably efficient lookup
of those values from the keys.

Certainly you can model a set as a dictionary, but that's likely to be
less efficient than using a set, and quite possibly you'll need to roll
your own operations on your sets, whereas they're provided for the built
in implementation.

(Of course if you know more information about your problem domain you
might have a better representation than the built in set... for example
you might imagine an application where you want to form sets of objects
from a fixed, relatively small, universe. Then you could perhaps model a
set as a bit array, and lots of set operations can then be done by very
fast bit fiddling operations.)

nn · Oct 5, 2010

Semantic web. I did get a bit confused in reading about the concept of

sets in python and why you would use them instead of a dictionary for

Sets are faster and more convenient to do intersections, unions,
differences. They also use less space than dictionaries. Finally they
also help conveying the intent of the code.

Many things can be emulated using dictionaries. In languages like Lua
there are no sets, and no tuples, and no lists*. This makes sense if
your goal is to create a very small embedded language. Using specific
structures on the other hand has the benefit that operations common to
that structure are more convenient to express and that those
operations are optimized for space and time.

*"Tables are the sole data structuring mechanism in Lua; they can be
used to represent ordinary arrays, symbol tables, sets, records,
graphs, trees, etc." http://www.lua.org/manual/5.1/manual.html#2.2

Terry Reedy · Oct 5, 2010

Sets aren't an alternative to dictionaries. They have a completely
different purpose.

A dict/mapping is a specialized set -- a set of ordered pairs in which
each first member (the 'key') only appears once as a first member. The
set union of two mappings may not be a mapping, which is why dicts have
a specialized .update method instead of .union.

It so happens that sets and dicts *in Python* happen to share some
implementation details, and they happen to share similar syntax, but
don't be fooled by these incidental details. They could be implemented
differently.

Dicts are used when you need a 1:1 mapping between a key and a value:

The mapping does not have to be 1:1, which means that each value also
appears only once, so that the mapping can be inverted. Values can
appear more than once, which is to say, many keys can map to one value.
so the mapping is many:1. Collections (sets or lists) are used as values
to implement 1:many mappings.

Lawrence D'Oliveiro · Oct 7, 2010

Certainly you can model a set as a dictionary, but that's likely to be
less efficient than using a set, and quite possibly you'll need to roll
your own operations on your sets, whereas they're provided for the built
in implementation.

Did you know that applying the â€œsetâ€ or â€œfrozensetâ€ functions to a dict
return a set of its keys?

Seems a bit dodgy, somehow.

Gregory Ewing · Oct 9, 2010

Lawrence said:
Did you know that applying the â€œsetâ€ or â€œfrozensetâ€ functions to a dict
return a set of its keys?

Seems a bit dodgy, somehow.

That's just a consequence of the fact that dicts produce their
keys when iterated over, and the set constructor iterates over
whatever you give it.

Lawrence D'Oliveiro · Oct 11, 2010

That's just a consequence of the fact that dicts produce their
keys when iterated over, and the set constructor iterates over
whatever you give it.

Hmm. It seems that â€œiter(<dict>)â€ iterating over the keys has been around a
long time. But a dict has both keys and values: why are language constructs
treating them so specially as to grab the keys and throw away the values?

Is this an artifact of older directions in the language evolution, before
sets were introduced?

Ethan Furman · Oct 11, 2010

Lawrence said:
Hmm. It seems that â€œiter(<dict>)â€ iterating over the keys has been around a
long time. But a dict has both keys and values: why are language constructs
treating them so specially as to grab the keys and throw away the values?

What's so special about it? If you want the value, ask for it;
iterating over the dict without asking specifically for the values does
not give them. Furthermore, if you did get the key:value pairs how
would you store them in a set such that you could query the set to see
if a key were there?

~Ethan~

Robert Kern · Oct 11, 2010

Hmm. It seems that â€œiter(<dict>)â€ iterating over the keys has been around a
long time. But a dict has both keys and values: why are language constructs
treating them so specially as to grab the keys and throw away the values?

Language constructs are not treating anything specially much less "throwing
away" anything. The language construct in question does exactly the same thing
with every object nowadays: call the .__iter__() method to get the iterator and
call .next() on that iterator until it raises StopIteration. It is the
responsibility of the dict object itself to decide how it wants to be iterated over.

The reasoning for this decision is spelled out in the PEP introducing the
iterator feature:

http://www.python.org/dev/peps/pep-0234/

"""
- There has been a long discussion about whether

for x in dict: ...

should assign x the successive keys, values, or items of the
dictionary. The symmetry between "if x in y" and "for x in y"
suggests that it should iterate over keys. This symmetry has been
observed by many independently and has even been used to "explain"
one using the other. This is because for sequences, "if x in y"
iterates over y comparing the iterated values to x. If we adopt
both of the above proposals, this will also hold for
dictionaries.

The argument against making "for x in dict" iterate over the keys
comes mostly from a practicality point of view: scans of the
standard library show that there are about as many uses of "for x
in dict.items()" as there are of "for x in dict.keys()", with the
items() version having a small majority. Presumably many of the
loops using keys() use the corresponding value anyway, by writing
dict[x], so (the argument goes) by making both the key and value
available, we could support the largest number of cases. While
this is true, I (Guido) find the correspondence between "for x in
dict" and "if x in dict" too compelling to break, and there's not
much overhead in having to write dict[x] to explicitly get the
value.

For fast iteration over items, use "for key, value in
dict.iteritems()". I've timed the difference between

for key in dict: dict[key]

and

for key, value in dict.iteritems(): pass

and found that the latter is only about 7% faster.

Resolution: By BDFL pronouncement, "for x in dict" iterates over
the keys, and dictionaries have iteritems(), iterkeys(), and
itervalues() to return the different flavors of dictionary
iterators.
"""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Gregory Ewing · Oct 12, 2010

Robert said:
The reasoning for this decision is spelled out in the PEP introducing
the iterator feature:

http://www.python.org/dev/peps/pep-0234/

There's also the question of whether 'if x in dict' should
compare keys only or both keys and values. This was also
hotly debated back when dicts were given an 'in' operator
(before that you had to use dict.haskey(x)). Can't remember
all the arguments, but it was definitely thought about.

Steve Howell · Oct 12, 2010

There's also the question of whether 'if x in dict' should
compare keys only or both keys and values. This was also
hotly debated back when dicts were given an 'in' operator
(before that you had to use dict.haskey(x)). Can't remember
all the arguments, but it was definitely thought about.

The PEP seems to refer to such discussion, although no there is no
specific footnote to the mailing list archives.

- Regarding "if key in dict": there is no doubt that the
dict.has_key(x) interpretation of "x in dict" is by far the
most useful interpretation, probably the only useful one. There
has been resistance against this because "x in list" checks
whether x is present among the values, while the proposal makes
"x in dict" check whether x is present among the keys. Given
that the symmetry between lists and dictionaries is very weak,
this argument does not have much weight.

Symmetry is always a tricky balance in programming languages. Python
discounts symmetry considerations with respect to lists and
dictionaries when it comes to "in" (see above), while it goes for a
certain symmetry between "if" and "for" with respect to the "in"
keyword and dictionaries (see below):

While this is true, I (Guido) find the correspondence between
"for x in
dict" and "if x in dict" too compelling to break, and there's
not
much overhead in having to write dict[x] to explicitly get the
value.

There is no major inconsistency here; "symmetry" is subjective and
needs to be applied with a heavy dose of pragmatism.

Lawrence D'Oliveiro · Oct 12, 2010

In message

Steve said:
Symmetry is always a tricky balance in programming languages.

Is that what we used to call â€œorthogonalityâ€?

D'Arcy J.M. Cain · Oct 12, 2010

Is that what we used to call â€œorthogonalityâ€?

No, orthogonality is something else. "Orthogonal" means "perpendicular
to." See the Wikipedia article for a discussion of the word's
application to computer science.

Lawrence D'Oliveiro · Oct 13, 2010

D'Arcy said:
No, orthogonality is something else. "Orthogonal" means "perpendicular
to."

The appropriate meaning is â€˜being able to combine independentlyâ€ (as in the
orthogonal decomposition of a Fourier transform). An example of contemporary
usage, from the â€œRevised Report on the Algorithmic Language ALGOL 68â€ (1974,
I think):

0.1.2 Orthogonal design
The number of independent primitive concepts has been minimized in
order that the language be easy to describe, to learn, and to implement.
On the other hand, these concepts have been applied â€œorthogonallyâ€ in
order to maximize the expressive power of the language while trying to
avoid deleterious superï¬‚uities.

So â€œorthogonalityâ€ has to do with use of minimum number of primitive
components (operators, object types) in a maximum number of meaningful
combinations.

Steve Howell · Oct 13, 2010

The appropriate meaning is â€˜being able to combine independentlyâ€ (as in the
orthogonal decomposition of a Fourier transform). An example of contemporary
usage, from the â€œRevised Report on the Algorithmic Language ALGOL 68â€ (1974,
I think):

Â Â 0.1.2 Orthogonal design
Â Â The number of independent primitive concepts has been minimized in
Â Â order that the language be easy to describe, to learn, and to implement.
Â Â On the other hand, these concepts have been applied â€œorthogonallyâ€ in
Â Â order to maximize the expressive power of the language while trying to
Â Â avoid deleterious superï¬‚uities.

So â€œorthogonalityâ€ has to do with use of minimum number of primitive
components (operators, object types) in a maximum number of meaningful
combinations.

Lawrence, I was actually talking about symmetry, not orthogonality. I
was making the observation that Python doesn't always strive for
symmetry as its #1 driving concern (as well it shouldn't). I think
your definition of orthogonality is more what Python is about. I'd
say that symmetry and orthogonality go hand in hand for most
situations. Symmetry often helps minimize the number of primitive
components, for example. But sometimes symmetry arguments can be
forced or just be made on a weak foundation.

MRAB · Oct 13, 2010

Lawrence, I was actually talking about symmetry, not orthogonality. I
was making the observation that Python doesn't always strive for
symmetry as its #1 driving concern (as well it shouldn't). I think
your definition of orthogonality is more what Python is about. I'd
say that symmetry and orthogonality go hand in hand for most
situations. Symmetry often helps minimize the number of primitive
components, for example. But sometimes symmetry arguments can be
forced or just be made on a weak foundation.

Someone (Niklaus Wirth?) once said something about how a simpler
language isn't necessarily easier to learn. Pascal, for example, makes
a distinction between procedures and function, which can help to
prevent some bugs, and Python forbids assignment in expressions, which
also helps to prevent some bugs.

Lawrence D'Oliveiro · Oct 14, 2010

In message

Steve said:
Lawrence, I was actually talking about symmetry, not orthogonality.

So whatâ€™s the difference?

Steve Howell · Oct 14, 2010

In message

So what’s the difference?

I don't think there's a big difference between the two--you snipped
away my comment about them often going hand in hand, so I will repeat
it here that they do often go hand in hand.

I guess a lot depends on how you define "symmetry." Is your
definition of "symmetry" equivalent to your definition of
"orthogonality"?

I don't have a precise definition of "symmetry" myself, I will admit.
Perhaps for lack of a better word, I throw around the term "symmetry"
for situations where analogous constructs get used in slightly
different contexts. The example in this discussion is the "in"
keyword. You can use "in" for strings, sets, lists, and dictionaries,
which in and of itself is an example of "symmetry" in my mind.
Another example is the relationship of the "+" operator to the "*"
operator. With numbers "*" connotes repeated use of "+", as it does
with strings, even though the underlying operations for "+" are not
exactly equivalent. String star-ification is repeated concatenation,
whereas integer star-ification is repeated addition, but since
concatenation and addition are both represented by "+", it creates
some sort of symmetry to have "*" as the string-concatenation-
repetition operator, even if it is not something folks commonly
encounter outside of programming languages.

Lawrence D'Oliveiro · Oct 14, 2010

In message

Steve said:
I guess a lot depends on how you define "symmetry." Is your
definition of "symmetry" equivalent to your definition of
"orthogonality"?

No idea. Itâ€™s just that the example being discussed in this thread seemed to
come under the old term â€œorthogonalityâ€, so I wondered why a different term
was being used.

So far no-one has come up with a good excuse for using a different term.

Creating books with Sets	1	Sep 27, 2022
Dictionary used to build a Triple Store	1	Jan 7, 2010
Help with Loop	0	Mar 30, 2023
Numpy Array of Sets	7	May 24, 2014
Newbie help - Programming the Semantic Web with Python	0	Jul 10, 2011
Help with Github???	2	Jan 6, 2024
Combining sets/dictionaries	7	Sep 22, 2009
random.sample with large weighted sample-sets?	0	Feb 16, 2014

Help with sets

B. M. Whealton

Steven D'Aprano

Paul Rudin

nn

Terry Reedy

Lawrence D'Oliveiro

Gregory Ewing

Lawrence D'Oliveiro

Ethan Furman

Robert Kern

Gregory Ewing

Steve Howell

Lawrence D'Oliveiro

D'Arcy J.M. Cain

Lawrence D'Oliveiro

Steve Howell

MRAB

Lawrence D'Oliveiro

Steve Howell

Lawrence D'Oliveiro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads