Bounding box on clusters in a 2D list

S

superprad

If I have

ex: x = [[1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0]]
what I want is a boundingbox over the region where we find clusters of
1's.So for instance in the above list first 3 roes and colums have 1's
so the area of that box is 3x3
so my final array should have an array of approx areas of clusters of
1's like
area = [ 9,4 ....]
Hope I am clear with my question.
 
B

bearophileHUGS

I hope this is what you need, sometimes understanding the question is
one of the hardest parts :)

If you can use a graph data structure, you can create a graph, and then
you can find the lenght of all its connected components (indentations
reduced to 2 spaces):

.. def mat2graph(g, m, good=None, w=1):
.. "Function docs..."
.. g.clear()
.. if m and isinstance(m, (list, tuple)):
.. if good != None:
.. m = [[good(e) for e in row] for row in m]
.. maxy = len(m)-1
.. maxx = len(m[0])-1
.. add = g.addBiArc
.. for y, row in enumerate(m):
.. for x, e in enumerate(row):
.. if e:
.. g.addNode((x,y)) # Necessary for isolated nodes.
.. if x>0 and m[y][x-1]: add( (x,y), (x-1,y), w=w)
.. if x<maxx and m[y][x+1]: add( (x,y), (x+1,y), w=w)
.. if y>0 and m[y-1][x]: add( (x,y), (x,y-1), w=w)
.. if y<maxy and m[y+1][x]: add( (x,y), (x,y+1), w=w)
..
.. from graph import Graph
.. g = Graph()
.. m = [[1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
.. [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
.. [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
.. [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0]]
.. mat2graph(g, m)
.. print map(len, g.connectedComponents())

Something similar to mat2graph will probably become a method of Graph.
A quite similar program can be used for other python graph libraries.

If you cannot use a graph data structure, of if you need to go faster,
you can use a flood fill algorithm tuned for this problem:


.. def connected(m, foreground=1, background=0):
.. def floodFill4(m, r,c, newCol, oldCol):
.. """Simplest 4-connected flood fill algorithm, there are much
.. faster non-recursive ones. I've modified this from:
.. http://www.student.kuleuven.ac.be/~m0216922/CG/floodfill.html"""
.. if c>=0 and c<maxc and r>=0 and r<maxr and m[r][c]==oldCol:
.. m[r][c] = newCol
.. floodFill4(m, r+1, c, newCol, oldCol)
.. floodFill4(m, r-1, c, newCol, oldCol)
.. floodFill4(m, r, c+1, newCol, oldCol)
.. floodFill4(m, r, c-1, newCol, oldCol)
..
.. maxr = len(m)
.. maxc = len(m[0])
.. newCol = 2
.. oldCol = foreground
.. for r in xrange(maxr):
.. for c in xrange(maxc):
.. if m[r][c] == oldCol:
.. floodFill4(m, r,c, newCol=newCol, oldCol=oldCol)
.. newCol += 1
.. # Frequences, this can be become a standard python command
.. freq = {}
.. for row in m:
.. for e in row:
.. if e in freq:
.. freq[e] += 1
.. else:
.. freq[e] = 1
.. del freq[background]
.. return freq.values()
..
.. m = [[1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
.. [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
.. [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
.. [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0]]
.. print connected(m)


There are much faster ways to do flood fills:
http://www.codeproject.com/gdi/QuickFill.asp
I think Python PIL doesn't support flood fills, and I think doing it
with Numarray is slower, and it's useful only to reduce the used
memory.
This connected program is slow and raw, and it uses recursivity a lot,
so it can crash the interpreter. You can use a data structure to
implement the stack yourself, to speed this program up. But for small
matrices I think this connected is enough ^_^

Bear hugs,
Bearophile
 
B

Bengt Richter

If I have

ex: x = [[1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0]]
what I want is a boundingbox over the region where we find clusters of
1's.So for instance in the above list first 3 roes and colums have 1's
so the area of that box is 3x3
so my final array should have an array of approx areas of clusters of
1's like
area = [ 9,4 ....]
Where does the 4 come from?
Hope I am clear with my question.
Not quite clear on if "clusters" have to be rectangular and fill their bounding boxes
or whether

[[0, 0, 0, 0, 0],
+-----+
[0,|1, 1,|0, 0],
| |
[0,|0, 1,|0, 0],
+-----+
[0, 0, 0, 0, 0]]

is a cluster with three 1's. Or indeed whether "bounding boxes" have to be rectangular.
I.e., what about diagonal clusters?
,__,
[[0, 0,|1,|0, 0],
_/ _/ ,__,
[0,/1,/0, 0,|1],
_/ _/ _/ /
[1,/0, 0,/1,/0],
_/ /
[0, 0,/1,/0, 0]]

Or should that be
+-------------+
[[|0, 0, 1, 0, 0|],
| |
[|0, 1, 0, 0, 1|],
| |
[|1, 0, 0, 1, 0|],
| |
[|0, 0, 1, 0, 0|]]
+-------------+

since two 3x3 squares around the length-3 diagonals would overlap.

Regards,
Bengt Richter
 
S

superprad

Richter,yes what I am looking for is for cluster with rectangular
bounding boxes as you explained in the first figure.
 
S

superprad

hi Bearphile! That really gives me an idea.Thanks much for that. Yes as
you said the algorithm reaches a maximium recursion depth for larger
sets i tried.I still have a question. if
m = [[0,0,0,0],[0,1,1,0,0],[0,0,1,0,0],[0,0,0,0]]

all it does is count the number of 1's and return us the number in that
cluster. But if I want the box to be considered as one cluster as
[[0, 0, 0, 0, 0],
+-----+
[0,|1, 1,|0, 0],
| |
[0,|0, 1,|0, 0],
+-----+
[0, 0, 0, 0, 0]]


because it could be that the 0 in between the cluster of ones is a part
of the group.So the area of this boundingbox is 4 rather than 3. Do you
see where I am heading
 
B

Bengt Richter

Richter,yes what I am looking for is for cluster with rectangular
bounding boxes as you explained in the first figure.
Sorry, not your fault, but I'm still not clear on diagonal connection/separation.
E.g., (removing punctuation ;-) is this 3 boxes or do the top left and bottom right
connect at their touching corners and thus become bounded with a rectangle that then
also includes the little box at the top right?

+-------+ +---+
0 | 1 1 | 0 | 1 |
| | +---+
0 | 1 1 | 0 0
+-------+?------+
0 0 0 ? 1 1 |
| |
0 0 0 | 0 1 |
+-------+

+-------+ +---+ +-------+???+---+ +---------------+
0 | 1 1 | 0 | 1 | 0 | 1 1 | 0 | 1 | 0 | 1 1 0 1 |
| | +---+ | | +---+ | |
0 | 1 1 | 0 0 0 | 1 1 | 0 0 ? 0 | 1 1 0 0 |
+-------+?------+ =>  | +-------+ => | |
0 0 0 ? 1 1 | 0 | 0 0 1 1 | 0 | 0 0 1 1 |
| | | | | |
0 0 0 | 0 1 | 0 | 0 0 0 1 | 0 | 0 0 0 1 |
+-------+ +---------------+ +---------------+

And what about nested boxes?

+-------------------+
0 | 1 1 1 1 1 |
| |
0 | 1 0 0 0 1 |
| +---+ |
0 | 1 0 | 1 | 0 1 |
| +---+ |
0 | 1 0 0 0 1 |
| |
0 | 1 1 1 1 1 |
+-------------------+

Regards,
Bengt Richter
 
B

bearophileHUGS

(e-mail address removed):
hi Bearphile! That really gives me an idea.Thanks much for that. Yes as
you said the algorithm reaches a maximium recursion depth for larger
sets i tried.

You can improve the little flood filling function, avoiding the "bad"
Python recursivity.

Do you see where I am heading

Your problem still looks a bit ill defined to me (see Bengt Richter's
nested boxes case), but you can probably modify this part of my
connected():

.. freq = {}
.. for row in m:
.. for e in row:
.. if e in freq:
.. freq[e] += 1
.. else:
.. freq[e] = 1

For the *values* of the freq dictionary you can use the 4 coordinates
of the bounding box of the key-specific cluster, that is updating its
far left, far right, etc. coordinates, with something like this:

box[e] = ( max(x, box[e][0]), min(x, box[e][1]), max(y, box[e][2]),
min(y, bbox[e][3] )

Or something faster/more correct. It's not difficult, I presume.
(connected() debug: if you want to keep the original matrix m, at the
end of the connected() you can set at 1 all its values different from
0.)

Bear hugs,
Bearophile
 
S

superprad

bearphile, Is there a way I could do the floodfill rather iteratively
than recursive. It somehow breaks although I made some optimisations to
the code.
 
S

superprad

Just was wondering how to integrate the floodfill with the stack.I
guess it will get rid of recursion problem. You mean read all the
elements of the array to a stack and then push and pop based on
conditions?
 
B

bearophileHUGS

In my first post you can find two URLs with better flood filling
algorithms. You can also modify the easy recursive function, using a
list-based stack intead of recursive calls.

Bye,
Bearophile
 
S

superprad

Bearophile,I have written the floodfill in python with stack. But i
think I am making something wrong I dont get what i need.Need your
opinion.the code follows

def connected(m, foreground=1, background=0):
def floodFill4(m, r,c, newCol, oldCol):
if newCol == oldCol:
print "empty stack"
return -1
while(m[r].pop(c)):
m[r][c] = newCol
if r+1 < maxr and m[r+1][c]==oldCol:
if m[r+1][c] == 0:return
if r-1 >= 0 and m[r-1][c]==oldCol:
if m[r-1][c] == 0:return
if c+1 < maxc and m[r][c+1]==oldCol:
if m[r][c+1] == 0:return
if c-1 >= 0 and m[r][c-1]==oldCol:
if m[r][c-1] == 0:return

maxr = len(m)
maxc = len(m[0])
newCol = 2
oldCol = foreground
for r in xrange(maxr):
for c in xrange(maxc):
if m[r][c] == oldCol:
floodFill4(m, r,c, newCol=newCol, oldCol=oldCol)
newCol += 1
 
L

Lonnie Princehouse

def floodFill4(m, r, c, newCol, oldCol):
stack = [ (r, c) ]
while stack:
r, c = stack.pop()
if c >=0 and c < maxc and r >=0 and r< maxr and m[r][c]==oldCol:
m[r][c] = newCol
stack += [ (r+1, c), (r-1, c), (r, c+1), (r, c-1) ]
 
B

bearophileHUGS

Then you can probably use something like this:

.. def boxesArea(m, foreground=1, background=0):
.. maxr = len(m)
.. maxc = len(m[0])
.. newCol = 2
.. oldCol = foreground
.. for r,row in enumerate(m):
.. for c,e in enumerate(row):
.. if e == oldCol:
.. # Flood fill by Lonnie Princehouse
.. stack = [ (r, c) ]
.. while stack:
.. r, c = stack.pop()
.. if c>=0 and c<maxc and r>=0 and r<maxr and
m[r][c]==oldCol:
.. m[r][c] = newCol
.. stack += [ (r+1, c), (r-1, c), (r, c+1), (r,
c-1) ]
.. newCol += 1
.. box = {}
.. for r,row in enumerate(m):
.. for c,e in enumerate(row):
.. if e in box:
.. box[e] = ( min(c, box[e][0]), min(r, box[e][1]),
.. max(c, box[e][2]), max(r, box[e][3] ) )
.. else:
.. box[e] = (c, r, c, r)
.. del box[background]
.. return [(b[2]-b[0]+1)*(b[3]-b[1]+1) for b in box.values()]
..
.. try: import psyco
.. except ImportError: pass
.. else: psyco.bind(boxesArea)

Note that this version modifies the given m still.

Hugs,
Bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,054
Latest member
LucyCarper

Latest Threads

Top