H
Hal Fulton
This is actually related to an interesting problem given to me in college
by one of my professors.
I'm thinking of digging out that disk and resurrecting the problem as a
quiz. So I won't go into detail on it here.
The original problem:
- 100 text fragments come from four different sources.
- There are 25 from each source.
- They're now scrambled.
- The sources are not labeled or named as such (their order
does not matter).
- Do whatever textual or statistical analysis you deem appropriate,
and separate them into four buckets, approximating their original
buckets as well as you can.
The question I'm asking here is related only to the evalution of the
quality of a given result:
Given four bins e,f,g,h (the results of our guesswork) and given
the four original bins a,b,c,d -- evaluate how well we did.
For simplicity, assume these bins are just arrays of integers.
Note that:
- each bin has exactly 25 items
- the union of a,b,c,d is 0..99
- the union of e,f,g,h is also 0..99
- there is no special correspondence between a and e,
b and f, etc.
So my real questions are:
1. How would you solve the problem of evaluating the correctness
of the partitioning? This would have to be "fuzzy" in some
sense, of course -- maybe a Float answer between 0 and 1.
2. And the meta-question: Is the "correctness" ambiguous? I think
not, but I haven't examined it thoroughly.
Cheers,
Hal
by one of my professors.
I'm thinking of digging out that disk and resurrecting the problem as a
quiz. So I won't go into detail on it here.
The original problem:
- 100 text fragments come from four different sources.
- There are 25 from each source.
- They're now scrambled.
- The sources are not labeled or named as such (their order
does not matter).
- Do whatever textual or statistical analysis you deem appropriate,
and separate them into four buckets, approximating their original
buckets as well as you can.
The question I'm asking here is related only to the evalution of the
quality of a given result:
Given four bins e,f,g,h (the results of our guesswork) and given
the four original bins a,b,c,d -- evaluate how well we did.
For simplicity, assume these bins are just arrays of integers.
Note that:
- each bin has exactly 25 items
- the union of a,b,c,d is 0..99
- the union of e,f,g,h is also 0..99
- there is no special correspondence between a and e,
b and f, etc.
So my real questions are:
1. How would you solve the problem of evaluating the correctness
of the partitioning? This would have to be "fuzzy" in some
sense, of course -- maybe a Float answer between 0 and 1.
2. And the meta-question: Is the "correctness" ambiguous? I think
not, but I haven't examined it thoroughly.
Cheers,
Hal