clustering in C

Discussion in 'C Programming' started by querypk@gmail.com, May 29, 2005.

  1. Guest

    Can someone help me with C programming here.
    If I have
    x is a 2D array of 0's and 1's
    ex: x = array([1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
    [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
    [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
    [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0])
    what I want is a boundingbox over the region where we find clusters of
    1's.So for instance in the above list first 3 roes and colums have 1's
    so the area of that box is 3x3
    so my final array should have an array of clusters of
    1's like
    coords = [ (x1,y1,x2,y2), ....]
    where x1,y1 -> top left point
    x2,y2 -> bottom right point of that rectangle.
    Hope I am clear with my question.

    [0, 0, 0, 0, 0,
    +-----+
    0,|1, 1,|0, 0,
    | |
    0,|0, 1,|0, 0,
    +-----+
    0, 0, 0, 0, 0]

    something like above. The resultant array should have the area of the
    each such box. SOmething like a flood fill algorithm.
    I would like the algorithm to return all the rectangles .Not just the
    minimum ones.
     
    , May 29, 2005
    #1
    1. Advertising

  2. jburgy Guest

    wrote:
    > Can someone help me with C programming here.

    <snip>

    Hi there "querypk",

    google for Hoshen-Kopelman algorithm...

    Jan
     
    jburgy, May 29, 2005
    #2
    1. Advertising

  3. Malcolm Guest

    <> wrote
    > Can someone help me with C programming here.
    > If I have
    > x is a 2D array of 0's and 1's
    > ex: x = array([1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
    > [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
    > [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
    > [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0])
    > what I want is a boundingbox over the region where we find clusters of
    > 1's.So for instance in the above list first 3 roes and colums have 1's
    > so the area of that box is 3x3
    > so my final array should have an array of clusters of
    > 1's like
    > coords = [ (x1,y1,x2,y2), ....]
    > where x1,y1 -> top left point
    > x2,y2 -> bottom right point of that rectangle.
    > Hope I am clear with my question.
    >
    > [0, 0, 0, 0, 0,
    > +-----+
    > 0,|1, 1,|0, 0,
    > | |
    > 0,|0, 1,|0, 0,
    > +-----+
    > 0, 0, 0, 0, 0]
    >
    > something like above. The resultant array should have the area of the
    > each such box. SOmething like a flood fill algorithm.
    > I would like the algorithm to return all the rectangles .Not just the
    > minimum ones.
    >

    First thing to do is to allocate a temporary array of integers. Set the ones
    to one and the zeroes to zero.

    Then iterate over the array. When you hit a one, do a flood fill, filling
    with 2, 3, 4 etc, like this

    void floodfill(int * array, int x, int y, int width, int height, int fill)
    {
    if(x < 0 || x >= width)
    return;
    if(y < 0 || y >= height)
    return;
    if(array[y * width + x] == fill)
    return;
    if(array[y * width + x] == 0)
    return;
    array[y * width + x] = fill;
    floodfill(array, x + 1, y, width, height, fill);
    floodfill(array, x - 1, y, width, height, fill);
    floodfill(array, x, y + 1, width, height, fill);
    floodfill(array, x, y -1, width, height, fill);
    }

    The resulting array should be a list of zeroes, and clusters from 2 up. Also
    keep a count of how many clusters you have found.

    Now allocate an array of rectangle structures, and an array of flags. Set
    all the falgs to zero.

    Iterate over the array a second time. When you hit a non-zero, check the
    flag. If it is clear, you have the top point of the cluster. If the array is
    relatively small and time isn't critical, it is probably easiest to iterate
    from the top left going down columns to find the left point, to iterate from
    the bottom up to find the bottom point, and from the right going up columns
    to find the right point.
    (If you have a huge array with small clusters you can search the cluster
    using the floodfill again, and store maximum and minimum points).

    Then set the flag to mark the fact that you have found the rectnagle for
    that cluster, and continue until you reach the end of the array.
     
    Malcolm, May 29, 2005
    #3
  4. Guest

    hi jburgy,
    that kind of helped I tied to modify the code so that instead of
    clustering 1's it would cluster 0's just to better understand the
    algorithm..But I guess I am making some mistake and it does'nt do the
    revers of what it does now. Can you help here. basically I am trying to
    understand. what should I change in the C code to make it work for 0
    clustering instead of 1's..

    int hoshen_kopelman(int **matrix, int m, int n) {

    uf_initialize(m * n / 2);

    /* scan the matrix */

    for (int i=0; i<m; i++)
    for (int j=0; j<n; j++)
    if (!matrix[j]) { // if occupied ...

    int up = (i==0 ? 0 : matrix[i-1][j]); // look up
    printf("%d\n",up);
    int left = (j==0 ? 0 : matrix[j-1]); // look left

    switch (!!up + !!left) {

    case 0:
    matrix[j] = uf_make_set(); // a new cluster
    break;

    case 1: // part of an existing cluster
    matrix[j] = max(up,left); // whichever is nonzero is
    labelled
    break;

    case 2: // this site binds two clusters
    matrix[j] = uf_union(up, left);
    break;
    }

    }

    /* apply the relabeling to the matrix */

    /* This is a little bit sneaky.. we create a mapping from the
    canonical labels
    determined by union/find into a new set of canonical labels, which
    are
    guaranteed to be sequential. */

    int *new_labels = calloc(sizeof(int), n_labels); // allocate array,
    initialized to zero

    for (int i=0; i<m; i++)
    for (int j=0; j<n; j++)
    if (matrix[j]) {
    int x = uf_find(matrix[j]);
    if (new_labels[x] == 3) {
    new_labels[0]++;
    new_labels[x] = new_labels[0];
    }
    matrix[j] = new_labels[x];
    }

    int total_clusters = new_labels[0];

    free(new_labels);
    uf_done();

    return total_clusters;
    }
    I canges the if(matrix[j]) to if(!matrix[j]) but it did not
    update the other clusters.

    http://splorg.org:8080/people/tobin/projects/hoshenkopelman/hk.c
     
    , Jun 2, 2005
    #4
  5. jburgy Guest

    wrote:
    > hi jburgy,
    > that kind of helped I tied to modify the code so that instead of
    > clustering 1's it would cluster 0's just to better understand the
    > algorithm..But I guess I am making some mistake and it does'nt do the
    > revers of what it does now. Can you help here. basically I am trying to
    > understand. what should I change in the C code to make it work for 0
    > clustering instead of 1's..
    >

    <snip>

    Hi superprad,

    Did you honestly believe you could change a single statement in Tobin's
    code and it would magically do what you want? If programming were that
    simple, we probably wouldn't discuss it so much... If you looked
    carefully, Tobin overwrites the 1's in 'matrix' with the appropriate
    cluster label. That's a common trick in implementations of the
    Hoshen-Kopelman algorithm. Unfortunately for you, it also means that
    you can't simply switch 0 and 1. What you could do is preprocess your
    matrix thusly:

    for (i = 0; i < m; i++)
    for (j = 0; j < n; j++)
    matrix[j] = 1 - matrix[j];

    before you feed it into the subroutine. Otherwise, you could take a
    simpler implementation (sorry Tobin, but I find yours unnecessarily
    complicated), sit down, have a beer, and only get up when you
    completely understand what it does. Then, and only then, can you start
    hacking it for your need. I'll look around home for a simpler
    implementation.

    Good luck,

    Jan
     
    jburgy, Jun 2, 2005
    #5
  6. Guest

    Hi Jan,
    I did preprocess my matrix replacing 1's with 0's. But the mistake i
    did was I also modified the code for that '!' condition and some more
    changes. That was the reason probably my resultant matrix was not
    giving what I expected.But if the kept the code un modified and pre
    processed the matrix alone it worked. Thanks for that. Yes as you said
    I have to take more time to understand what exactly is going on.
    Definitely a simpler implementation would really help.Thanks again for
    your valuable time and advice.
     
    , Jun 2, 2005
    #6
  7. Guest

    One more question. If I pre process the matrix then what happens is we
    are replacing 1's with 0's and otherway. So does Hoshen-Kopelman
    algorithm always replaces the 1's ie as i read the occupied cell. What
    if you want to replace un occupied cells as i discussed the previous
    post. And what is the necessity for 'union and find'. Cant we just
    scan through the matrix and look for the 0's or 1's and compare the
    neighbors and replace the neighbors with the same label as the parent.
     
    , Jun 2, 2005
    #7
  8. jburgy Guest

    wrote:
    > One more question. If I pre process the matrix then what happens is we
    > are replacing 1's with 0's and otherway. So does Hoshen-Kopelman
    > algorithm always replaces the 1's ie as i read the occupied cell. What
    > if you want to replace un occupied cells as i discussed the previous
    > post. And what is the necessity for 'union and find'. Cant we just
    > scan through the matrix and look for the 0's or 1's and compare the
    > neighbors and replace the neighbors with the same label as the parent.


    I reread you original question. Why did you want to modify Tobin's code
    in the first place? It already labels clusters of 1's exactly like you
    demand. It doesn't return a list of cluster coordinates because that's
    not a well-defined question (consider the following situation):

    000010000
    000111000
    001110000
    000111100
    000001000

    what are the "corners" of this cluster of ones? In case I wasn't clear
    enough in my first post, here's what the Hoshen-Kopelman algorithm does
    (and if it's not what you're after, please accept my apologies)

    110000000 110000000
    100001110 100002220
    000001110 -> 000002220
    110000100 330000200
    011000000 330000000

    That's why it's called cluster labeling.

    Yours,

    Jan
     
    jburgy, Jun 2, 2005
    #8
  9. Guest

    I understand what you say. Ok let me explain clearly. I think
    Hoshen-Kopelman algorithm is what I might need but ok consider this...
    _ _ _ _ _ _ _
    |2222| 1|3333 | 000010000
    |2221| 1|1333 | <------ 000111000
    |2211| 1|3333 | 001110000
    - - - - - --------
    111111111 000111100
    _ _ _ _ _ _
    |44444|1|555| 000001000
    ---------- ------
    Considering that we are looking for un occupied cells (clustering 0's
    rather than 1's)...
    this is one cluster . and corners of cluster are in this case
    {1,1,4,4}(topleft and bottom right corners of this rectangle)something
    like a bounding box...
    so that the resultant array will have
    { {1,1,4,4},{-,-,-,-} .........includes all rectangles. If there is a
    cluster inside another cluster even that is considered as a seperate
    rectangle...
     
    , Jun 2, 2005
    #9
  10. jburgy Guest

    wrote:
    > I understand what you say. Ok let me explain clearly. I think
    > Hoshen-Kopelman algorithm is what I might need but ok consider this...
    >


    Dude, you need to work on your ascii art! Alright, this is what I
    understood: you define the bounding box of an irregularly shaped
    cluster as the smallest rectangle which inscribes it entirely, don't
    you. For example

    111111111 111111111
    111111111 1+-----+1
    111 11111 1|1 111|1
    11 1 111 is bounded 1| 1 1|1
    11 11 as follows 1| |1
    1111 111 1|11 1|1
    111111111 1+-----+1
    111111111 111111111

    Is that correct? Assuming it is, here's what you need to do:

    1. make a backup copy of your matrix 'coz we're gonna overwrite it
    2. swap its 0's and 1's
    3. label the clusters of 1's (which now represent /empty/ cells)
    4. allocate as many quadruples (for bbox coordinates) as you have
    clusters, initialize them as follows (xmin=ymin=0, xmax=ncols,
    ymax=nrows)
    5. do a one-pass scan of the labeled matrix, updating the bbox
    coordinates of the corresponding cluster each time you encounter one

    That should be pretty simple. My initial guess was still right, step 3
    does the "hard" work, 1 and 2 some pre-processing and 4 and 5 some
    post-processing. I've gotta ask now: why exactly are you doing this?

    Jan

    P.S. By now, we are completely off-topic on comp.lang.c (I'm surprised
    we haven't been flamed yet), you better follow-up on comp.programming.
     
    jburgy, Jun 2, 2005
    #10
  11. Guest

    hi thanks for the help. Just to let you know that I got it working
    thanks for that. I get the Idea on HK,
     
    , Jun 2, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tito

    Clustering and FrameWork.Net

    Tito, Jul 7, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    369
  2. Samuel Stammbach
    Replies:
    0
    Views:
    704
    Samuel Stammbach
    Aug 14, 2003
  3. Jing Zheng

    SOAP clustering

    Jing Zheng, Oct 9, 2003, in forum: Java
    Replies:
    0
    Views:
    393
    Jing Zheng
    Oct 9, 2003
  4. 1992

    Clustering & Tomcat

    1992, Nov 22, 2003, in forum: Java
    Replies:
    0
    Views:
    356
  5. 1992

    Clustering & Tomcat

    1992, Nov 22, 2003, in forum: Java
    Replies:
    0
    Views:
    375
Loading...

Share This Page