Numpy Array of Sets

L

Luis José Novoa

Hi All,

Hope you're doing great. One quick question. I am defining an array of sets using numpy as:

a=array([set([])]*3)

Now, if I want to add an element to the set in, lets say, a[0], and I use the .add(4) operation, which results in:

array([set([4]), set([4]), set([4])], dtype=object)

which I do not want. If I use the union operator

a[0] = a[0] | set([4])

then I obtain what I want:

array([set([4]), set([]), set([])], dtype=object)

Can anyone explain whay this happens?

Thank you very much.
 
R

Robert Kern

Hi All,

Hope you're doing great. One quick question. I am defining an array of sets using numpy as:

a=array([set([])]*3)

Now, if I want to add an element to the set in, lets say, a[0], and I use the .add(4) operation, which results in:

array([set([4]), set([4]), set([4])], dtype=object)

which I do not want. If I use the union operator

a[0] = a[0] | set([4])

then I obtain what I want:

array([set([4]), set([]), set([])], dtype=object)

Can anyone explain whay this happens?

Same reason why you shouldn't make a list of lists like so: [[]]*3

https://docs.python.org/2/faq/programming.html#how-do-i-create-a-multidimensional-list

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
W

Wolfgang Maier

Hi All,

Hope you're doing great. One quick question. I am defining an array of
sets using numpy as:

a=array([set([])]*3)

Has nothing to do with numpy, but the problem is exclusively with your
innermost expression [set([])]*3.
Now, if I want to add an element to the set in, lets say, a[0], and I
use the .add(4) operation, which results in:

with .add you are modifying the *existing* set.
array([set([4]), set([4]), set([4])], dtype=object)

which I do not want. If I use the union operator

a[0] = a[0] | set([4])

here you are forming a *new* set and put it in a[0] replacing the old
set at this position.
then I obtain what I want:

array([set([4]), set([]), set([])], dtype=object)

Can anyone explain whay this happens?

Same reason why you shouldn't make a list of lists like so: [[]]*3

https://docs.python.org/2/faq/programming.html#how-do-i-create-a-multidimensional-list

The above link explains the underlying problem.

Best,
Wolfgang
 
L

LJ

Wolfgang, thank you very much for your reply.

Following the example in the link, the problem appears:
A = [[0]*2]*3
A [[0, 0], [0, 0], [0, 0]]
A[0][0] = 5
A
[[5, 0], [5, 0], [5, 0]]

Now, if I use a numpy array:
array([[0, 0],
[0, 0],
[0, 0]])
array([[5, 0],
[0, 0],
[0, 0]])


What is the difference here?

Thank you,
 
P

Peter Otten

LJ said:
Wolfgang, thank you very much for your reply.

Following the example in the link, the problem appears:

You can see this as a shortcut for

value = 0
inner = [value, value]
A = [inner, inner, inner]

When the value is mutable (like your original set) a modification of the
value shows in all six entries. Likewise if you change the `inner` list the
modification shows in all three rows.
A [[0, 0], [0, 0], [0, 0]]
A[0][0] = 5
A
[[5, 0], [5, 0], [5, 0]]

Now, if I use a numpy array:
d=array([[0]*2]*3)
d
array([[0, 0],
[0, 0],
[0, 0]])
array([[5, 0],
[0, 0],
[0, 0]])


What is the difference here?

Basically a numpy array doesn't reference the lists, it uses them to
determine the required shape of the array. A simplified implementation might
be

class Array:
def __init__(self, data):
self.shape = (len(data), len(data[0]))
self._data = []
for row in data: self._data.extend(row)
def __getitem__(self, index):
y, x = index
return self._data[y * self.shape[1] + x]

With that approach you may only see simultaneous changes of multiple entries
when using mutable values.
 
L

LJ

Thank you for the reply.

So, as long as I access and modify the elements of, for example,

A=array([[set([])]*4]*3)


as (for example):

a[0][1] = a[0][1] | set([1,2])

or:

a[0][1]=set([1,2])

then I should have no problems?
 
P

Peter Otten

LJ said:
Thank you for the reply.

So, as long as I access and modify the elements of, for example,

A=array([[set([])]*4]*3)


as (for example):

a[0][1] = a[0][1] | set([1,2])

or:

a[0][1]=set([1,2])

then I should have no problems?

As long as you set (i. e. replace) elements you're fine, but modifying means
trouble. You can prevent accidental modification by using immutable values
-- in your case frozenset:
b = numpy.array([[frozenset()]*4]*3)
b[0,0].update("123")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'update'

Or you take the obvious approach and ensure that there are no shared values.
I don't know if there's a canonical form to do this in numpy, but
a = numpy.array([[set()]*3]*4)
a |= set()
works:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top