Clustering technique

L

Luca

Dear all, excuse me if i post a simple question.. I am trying to find
a software/algorythm that can "cluster" simple data on an excel sheet

Example:
Variable a Variable b Variable c
Case 1 1 0 0
Case 2 0 1 1
Case 3 1 0 0
Case 4 1 1 0
Case 5 0 1 1


The systems recognizes that there are 3 possible clusters:

the first with cases that has Variable a as true,
the second has Variables b and c
the third is "all the rest"

Variabile a Variabile b Variabile c

Case 1 1 0 0
Case 3 1 0 0

Case 2 0 1 1
Case 5 0 1 1

Case 4 1 1 0


Thank you in advance
 
J

Jon Clements

Dear all, excuse me if i post a simple question.. I am trying to find
a software/algorythm that can "cluster" simple data on an excel sheet

Example:
                Variable a   Variable b   Variable c
Case 1        1                   0              0
Case 2        0                   1              1
Case 3        1                   0              0
Case 4        1                   1              0
Case 5        0                   1              1

The systems recognizes that there are 3 possible clusters:

the first with cases that has Variable a as true,
the second has Variables b and c
the third is "all the rest"

        Variabile a    Variabile b   Variabile c

Case 1     1               0            0
Case 3     1               0            0

Case 2     0               1            1
Case 5     0               1            1

Case 4     1               1            0

Thank you in advance

If you haven't already, download and install xlrd from http://www.python-excel.org
for a library than can read excel workbooks (but not 2007 yet).

Or, export as CSV...

Then using either the csv module/xlrd (both well documented) or any
other way of reading the data, you effectively want to end up with
something like this:

rows = [
#A #B #C #D
['Case 1', 1, 0 ,0],
['Case 2', 0, 1, 1],
['Case 3', 1, 0, 0],
['Case 4', 1, 1, 0],
['Case 5', 0, 1, 1]
]

One approach is to sort 'rows' by B,C & D. This will bring the
identical elements adjacent to each other in the list. Then you need
an iterator to group them... take a look at itertools.groupby.

Another is to use a defaultdict(list) found in collections. And just
loop over the rows, again with B, C & D as a key, and A being appended
to the list.

hth
Jon.
 
T

Taliesin Nuin

Luca said:
Dear all, excuse me if i post a simple question.. I am trying to find
a software/algorythm that can "cluster" simple data on an excel sheet

Example:
Variable a Variable b Variable c
Case 1 1 0 0
Case 2 0 1 1
Case 3 1 0 0
Case 4 1 1 0
Case 5 0 1 1


The systems recognizes that there are 3 possible clusters:

the first with cases that has Variable a as true,
the second has Variables b and c
the third is "all the rest"

Variabile a Variabile b Variabile c

Case 1 1 0 0
Case 3 1 0 0

Case 2 0 1 1
Case 5 0 1 1

Case 4 1 1 0


Thank you in advance

Luca,

How many news groups and lists have you posted this on? I just answered
this question on the PHP mailing list. If I'd seen your post here I
would have written you a different, Pythonic answer (or left it to the
other poster who has already given an answer). I appreciate that you
want a reply, but it is *not good* to cross-post all over the place and
waste a lot of people's time. Somebody takes the time and effort here to
answer your questions whilst at the same time others are duplicating the
effort elsewhere.

If you need a solution specific to a certain language, then ask on that
news group. If you're interested in a general answer then ask on a more
general news group such as comp.programming or comp.theory.

Taliesin Nuin.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top