Details about pythons set implementation

A

Achim Domma

Hi,

I'm interested in details about how sets are implemented in python.
They seem to be quite fast and I found some remarks who state, that
the implementation is highly optimized. I need to implemented sets in
C/C++ and need a starting point on how to do it right. Could somebody
give me a starting point?

regards,
Achim
 
H

Hrvoje Niksic

Achim Domma said:
I'm interested in details about how sets are implemented in python.
They seem to be quite fast and I found some remarks who state, that
the implementation is highly optimized. I need to implemented sets
in C/C++ and need a starting point on how to do it right. Could
somebody give me a starting point?

You can simply look at the implementation, Objects/setobject.c in the
Python source code. Most that it's mostly copy-paste from the dict
implementation (dictobject.c) and that both are quite involved and
optimized for the use by Python. They're not general implementation
of sets from use in C.

The "highly optimized" remarks should be understood in the context of
Python, not in the context of other C and C++ set libraries. I don't
know how well Python sets compare to other set libraries, but I doubt
that it's much faster than the median (which "highly optimized" could
be understood to imply).

BTW if you're using C++, why not simply use std::set? If you need it
called from C, you can wrap the needed methods in a C-accessible API.
 
S

Sion Arrowsmith

Hrvoje Niksic said:
BTW if you're using C++, why not simply use std::set?

Because ... how to be polite about this? No, I can't. std::set is
crap. The implementation is a sorted sequence -- if you're lucky,
this is a heap or a C array, and you've got O(log n) performance.
But the real killer is that requirement for a std::set<T> is that
T::eek:perator< exists. Which means, for instance, that you can't
have a set of complex numbers....
 
B

bukzor

Because ... how to be polite about this? No, I can't. std::set is
crap. The implementation is a sorted sequence -- if you're lucky,
this is a heap or a C array, and you've got O(log n) performance.
But the real killer is that requirement for a std::set<T> is that
T::eek:perator< exists. Which means, for instance, that you can't
have a set of complex numbers....

--
\S -- (e-mail address removed) --http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Why cant you implement < for complex numbers? Maybe I'm being naive,
but isn't this the normal definition?
a + bi < c + di iff sqrt(a**2 + b**2) < sqrt(c**2, d**2)

How do you implement a set without sorting?

Are you expecting better than O(log n)?

--Buck
 
A

Arnaud Delobelle

On Jan 4, 5:08 pm, Sion Arrowsmith <[email protected]>
wrote:
[...]
But the real killer is that requirement for a std::set<T> is that
T::eek:perator< exists. Which means, for instance, that you can't
have a set of complex numbers....

This is really OT but IIRC, std::set<T> is actually
std::set< T, std::less<T> >

So make your own less_complex() function (for example, use
lexicographic order), std::set<complex, less_complex> should give you
a set of complex numbers (sorry if some syntax is incorrect I haven't
done any C++ for a while :)
 
D

Diez B. Roggisch

bukzor said:
Why cant you implement < for complex numbers? Maybe I'm being naive,
but isn't this the normal definition?
a + bi < c + di iff sqrt(a**2 + b**2) < sqrt(c**2, d**2)

How do you implement a set without sorting?

Are you expecting better than O(log n)?

Of course, hashing does O(1) (most of the time, with a sane hash of course.)

Diez
 
P

Piet van Oostrum

bukzor said:
B> Why cant you implement < for complex numbers? Maybe I'm being naive,
B> but isn't this the normal definition?
B> a + bi < c + di iff sqrt(a**2 + b**2) < sqrt(c**2, d**2)

There doesn't exist a `normal' definition of < for the complex numbers. For
example you would expect that x<y and z<u would imply that x+z<y+u and
similar thing for other operators. This is impossible for complex numbers.
Now if you use it only for set building then you could relax the
constraints and use for example the lexicographical order.
 
S

Steven D'Aprano

Why cant you implement < for complex numbers? Maybe I'm being naive, but
isn't this the normal definition?
a + bi < c + di iff sqrt(a**2 + b**2) < sqrt(c**2, d**2)

No, it is not. Ordered comparisons are not defined for complex numbers.
Which is bigger, 4+2j or 2+4j?

How do you implement a set without sorting?

With a hash table.

Or if you are willing to limit yourself to sets of small integers, you
can implement it using bit flipping. E.g. 5 is an element of the set if
bit 5 is on.

Are you expecting better than O(log n)?

Sure.
 
A

Arnaud Delobelle

No, it is not.

Mathematically speaking, it is an order, and a very useful one. Only
it's a partial order, which means that if a != b then they are not
necessarily ordered (e.g. 1 and i). This is not ideal for sorting.
What you need is a total order, where any two distinct elements are
comparable.
Ordered comparisons are not defined for complex numbers.

You can't define a total order which is compatible with addition and
multiplication, but this restriction is not required to implement sets
of complex numbers. Any total order will do, the easiest to pick
being lexicographical as already pointed out, i.e.

a+bi < c + di iff a < c or a == c and b < d
Which is bigger, 4+2j or 2+4j?

4+2j, lexicographically!
 
R

r.grimm

Because ... how to be polite about this? No, I can't. std::set is
crap. The implementation is a sorted sequence -- if you're lucky,
this is a heap or a C array, and you've got O(log n) performance.
But the real killer is that requirement for a std::set<T> is that
T::eek:perator< exists. Which means, for instance, that you can't
have a set of complex numbers....

--

Hallo and Sorry for being OT.
As Arnaud pointed out, you must only overload the < Operator for the
requested type.
Something like
bool operator < ( const Type& fir, const Type& sec )....
similar to python with __lt__ .
The rest of magic will be done by the compiler/interpreter.
Assoziative Arrays (set,map,multi_set,multi_map) in the classical STL
are implemented as binary trees. Therefore the keys must be comparable
and the access time is O(log n ).
To get a dictionary with O(1), the most STL implementation support a
extension called hash_set.
The new standard TR1 support unsorted_set ... . You can download it
from www.boost.org. Newer gcc runtimes also including the new
subnamespace tr1.
There is no need to implement set in c++ to get O(1).


Greetings Rainer
 
B

bukzor

No, it is not. Ordered comparisons are not defined for complex numbers.
Which is bigger, 4+2j or 2+4j?


With a hash table.

Or if you are willing to limit yourself to sets of small integers, you
can implement it using bit flipping. E.g. 5 is an element of the set if
bit 5 is on.


Sure.

Good Answers!
 
B

bearophileHUGS

Sion Arrowsmith:
Because ... how to be polite about this? No, I can't. std::set is
crap. The implementation is a sorted sequence

What about using hash_map instead? You can use it with GCC too (but
you have to use a trick if you want to use string keys).

Bye,
bearophile
 
A

Albert van der Horst

Mathematically speaking, it depends what you require from ordering.
Mostly (and what we need for fast lookup) we want transitivity:
A>=B & B>=C => A>=C

Requiring transitivity you are right with your concern
about the OP's version of <. Although in mathematics you can
define things as you like, a non-transitive < is frowned upon.

We can make a useful ordering by defining
A<=B : if RE(A) /= RE(B) then RE(A) <= RE(B)
else IM(A) <= IM(B)

So for the OP's example: 4+2i is bigger.

This is sufficient to put a bunch of complex numbers in an
array, sort the array, and have a lookup using binary search.
(Or a heap: a heap is better for frequent additions and deletions.)

Or if you can live with O(n) implementations are trivial.

The same applies to sets of complex numbers the C++ way with the
above definition of for < for complex numbers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top