Clustering technique

Discussion in 'Python' started by Luca, Dec 22, 2009.

1. LucaGuest

Dear all, excuse me if i post a simple question.. I am trying to find
a software/algorythm that can "cluster" simple data on an excel sheet

Example:
Variable a Variable b Variable c
Case 1 1 0 0
Case 2 0 1 1
Case 3 1 0 0
Case 4 1 1 0
Case 5 0 1 1

The systems recognizes that there are 3 possible clusters:

the first with cases that has Variable a as true,
the second has Variables b and c
the third is "all the rest"

Variabile a Variabile b Variabile c

Case 1 1 0 0
Case 3 1 0 0

Case 2 0 1 1
Case 5 0 1 1

Case 4 1 1 0

Luca, Dec 22, 2009

2. Jon ClementsGuest

On Dec 22, 11:12 am, Luca <> wrote:
> Dear all, excuse me if i post a simple question.. I am trying to find
> a software/algorythm that can "cluster" simple data on an excel sheet
>
> Example:
>                 Variable a   Variable b   Variable c
> Case 1        1                   0              0
> Case 2        0                   1              1
> Case 3        1                   0              0
> Case 4        1                   1              0
> Case 5        0                   1              1
>
> The systems recognizes that there are 3 possible clusters:
>
> the first with cases that has Variable a as true,
> the second has Variables b and c
> the third is "all the rest"
>
>         Variabile a    Variabile b   Variabile c
>
> Case 1     1               0            0
> Case 3     1               0            0
>
> Case 2     0               1            1
> Case 5     0               1            1
>
> Case 4     1               1            0
>

for a library than can read excel workbooks (but not 2007 yet).

Or, export as CSV...

Then using either the csv module/xlrd (both well documented) or any
other way of reading the data, you effectively want to end up with
something like this:

rows = [
#A #B #C #D
['Case 1', 1, 0 ,0],
['Case 2', 0, 1, 1],
['Case 3', 1, 0, 0],
['Case 4', 1, 1, 0],
['Case 5', 0, 1, 1]
]

One approach is to sort 'rows' by B,C & D. This will bring the
identical elements adjacent to each other in the list. Then you need
an iterator to group them... take a look at itertools.groupby.

Another is to use a defaultdict(list) found in collections. And just
loop over the rows, again with B, C & D as a key, and A being appended
to the list.

hth
Jon.

Jon Clements, Dec 22, 2009

3. Taliesin NuinGuest

Luca wrote:
> Dear all, excuse me if i post a simple question.. I am trying to find
> a software/algorythm that can "cluster" simple data on an excel sheet
>
> Example:
> Variable a Variable b Variable c
> Case 1 1 0 0
> Case 2 0 1 1
> Case 3 1 0 0
> Case 4 1 1 0
> Case 5 0 1 1
>
>
> The systems recognizes that there are 3 possible clusters:
>
> the first with cases that has Variable a as true,
> the second has Variables b and c
> the third is "all the rest"
>
> Variabile a Variabile b Variabile c
>
> Case 1 1 0 0
> Case 3 1 0 0
>
> Case 2 0 1 1
> Case 5 0 1 1
>
> Case 4 1 1 0
>
>

Luca,

How many news groups and lists have you posted this on? I just answered
this question on the PHP mailing list. If I'd seen your post here I
would have written you a different, Pythonic answer (or left it to the
other poster who has already given an answer). I appreciate that you
want a reply, but it is *not good* to cross-post all over the place and
waste a lot of people's time. Somebody takes the time and effort here to
answer your questions whilst at the same time others are duplicating the
effort elsewhere.

If you need a solution specific to a certain language, then ask on that
news group. If you're interested in a general answer then ask on a more
general news group such as comp.programming or comp.theory.

Taliesin Nuin.

Taliesin Nuin, Dec 23, 2009