Solve a statistics problem

Seth Brundle · May 11, 2007

Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the two
most balanced teams?

Obviously you could do something simple like alternate in descending order,
but this isnt guaranteed to produce the most accurate result.

Ideally, I would like to be able to balance based on the mean, median, or
both.

Code isnt necessary but of course appreciated - but if you have a link to
the fundementals of solving such a problem thats fine to - I want to learn
to do it for myself.

Paul Lalli · May 11, 2007

Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the two
most balanced teams?

This question has nothing to do with Perl. It would be the same
question regardless of what language you're using to implment your
solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Paul Lalli

Seth Brundle · May 11, 2007

thanks for the spamcop spam.
heres your cookie.

Paul Lalli said:
This question has nothing to do with Perl. It would be the same
question regardless of what language you're using to implment your
solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Paul Lalli

xhoster · May 11, 2007

Seth Brundle said:
Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the
two most balanced teams?

Mathematically, enumerate all possible sets of teams, compute the
balancedness of each, and take the one with the most balancedness. There
are various combinatoric modules and sample-code for Perl, but which one
you want probably depends on what exactly constitutes a legal pair of
teams.1

Obviously you could do something simple like alternate in descending
order, but this isnt guaranteed to produce the most accurate result.

It is not guaranteed to produce the *optimal* result. Accuracy probably
doesn't enter into it.

Ideally, I would like to be able to balance based on the mean, median, or
both.

Do both teams have to be the same size? If so, then balancing on mean is
the same as balancing on sum, right? That sounds like a variant of the
well-known bin-packing or knap-sack problems. For the median, that depends
strongly on whether the teams have to be divided equally, and also on
whether the number on each team is going to be odd or even, and on how you
define median in the case of even-membered teams.

Code isnt necessary but of course appreciated - but if you have a link to
the fundementals of solving such a problem thats fine to - I want to
learn to do it for myself.

knap-sack and bin-packing are very widely discussed fundamentals. Google
on those terms, you will find more than you care for on them.

Xho

Seth Brundle · May 11, 2007

Unless you have gross outliers, you will probably do well by sorting

the list by score, then you would take (unshift) pairs of people with
close scores (that is, take pair of persons 1 and 2 (two worst

thanks for the reply, but that solution is the one I referred to (basically
a phys-ed style draft pick) that I dont believe to be the most accurate.

Yes, the outcome I seek is that after the sort the two teams will have the
closest average (mean) mathematically possible.

However, it would be cool if I could do it by median as well, or both, just
for comparison.

Seth Brundle · May 11, 2007

knap-sack and bin-packing are very widely discussed fundamentals. Google
on those terms, you will find more than you care for on them.

Excellent thanks!
I see there is actually a perl-Algorithm-Knapsack module.

Nothing better then a unique keyword to help you find what your looking
for - thanks!

xhoster · May 11, 2007

Ignoramus6365 said:
Try enumerating all possibilities for two teams of 20 people each.

That would be 2^40 possibilities.

Well, more like 2^39. But even less if the teams have to be balanced in

size as well as in said:
1 trillion of them or so.

He said "mathematically", not "algorithmically". Since when are
mathematicians concerned about run time?

Xho

Michele Dondi · May 11, 2007

solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Along with some answers from crackpots who claim to have a simple FLT
proof, that is! ;-)

Michele

Seth Brundle · May 11, 2007

1 trillion of them or so.

He said "mathematically", not "algorithmically". Since when are
mathematicians concerned about run time?

hehe I think you will find a lot of them are just as efficiency-obsessed any
programmer.

Seth Brundle · May 11, 2007

``Obviously you could do something simple like alternate in descending

order, but this isnt guaranteed to produce the most accurate result.''

My suggestion does not merely alternate in descending order, it
randomizes the pairs. Yours guarantees that one team will be always
better than another. Whereas mine does not.

Sorry I misunderstood - interesting option!

Seth Brundle · May 11, 2007

knap-sack and bin-packing are very widely discussed fundamentals. Google

on those terms, you will find more than you care for on them.

Interesting, the knapsack and binpack solutions (which as I understand it
from reviewing these modules), seem to both be the exact same problem
domain, but that domain seems significantly different from my problem.

Basically, they are solving the problems of how to fill x number of y-sized
bins so there is the least amount of space left in the containers.

However, I dont think it necessarily guarantees the bins are filled equally,
as the last bin may be just the leftovers from the last-efficiently-filled
bin.

But I will play with them.

Michele Dondi · May 11, 2007

thanks for the reply, but that solution is the one I referred to (basically
a phys-ed style draft pick) that I dont believe to be the most accurate.

Yes, the outcome I seek is that after the sort the two teams will have the
closest average (mean) mathematically possible.

Yours is still a mathematical, and admittedly interesting, question.
Like Paul Lalli rightly pointed out, you'd better ask in a
mathematics, statistics or algorithm newsgroup or forum. Personally,
since I have no idea how to specifically help you with this particular
problem, I can only keep the discussion moderately on topic by
pointing you to a suitable CPAN search:

http://search.cpan.org/search?mode=module&query=statistics

See if any module can help you. Other than this, I could only write
some brute force approach for your, which I suppose is not what you're
after.

Michele

xhoster · May 11, 2007

Seth Brundle said:
Interesting, the knapsack and binpack solutions (which as I understand
it from reviewing these modules), seem to both be the exact same problem
domain, but that domain seems significantly different from my problem.

Depending on the exact nature of your problem, there may be ways to map it
into knapsack or bin-packing. There are several small-seeming
clarifications you haven't yet provided which have a very big impact on how
and whether this can be done.

Basically, they are solving the problems of how to fill x number of
y-sized bins so there is the least amount of space left in the
containers.

If you pack your objects into a bin whose capacity is one half of the total
score of all the objects, then the arrangement what maximizes the score
of the bin without going over the capacity will be the same arrangement
that makes the total of two bins closest to each other.

Anyway, when you make off-topic posts and then insult the regulars who
point this out to you, I'm less inclined to go much more out of my way to
help out.

Xho

Michele Dondi · May 11, 2007

He said "mathematically", not "algorithmically". Since when are
mathematicians concerned about run time?

Well, they are. In fact they began thinkering about this kinda stuff
even before actual computers were really built.

Michele

Tad McClellan · May 12, 2007

Seth Brundle said:
thanks for the spamcop spam.

Thanks for so clearly identifying your lack of respect
for all of the people here.

heres your cookie.

Here's your *plonk*!

Michele Dondi · May 12, 2007

See if any module can help you. Other than this, I could only write
some brute force approach for your, which I suppose is not what you're
after.

After all, you got me intrigued so I cooked up a quick'n'dirty brute
force attempt. Please note that this lacks even the obvious
optimization of only searching half of the actual search space. I was
about to add that, but then I realized how really ugly and messy the
thingie is: it was not false modesty, it would reallty rather need a
complete rewrite, probably in terms of OO code. Anyway, hereafter it
is. Use at your own risk!

#!/usr/bin/perl -l

use strict;
use warnings;
use List::Util qw/sum min max/;

my @names=qw/alice bob carl darla eddie fred
gina hank kate jay larry mandy nick
noah pete randy stu terry/;
my %score=map { $_ => rand 1000_000} @names;
my $len=@names;

{
my $maxi=$#names;
my $verytot=sum values %score;

my ($t1,$t2);
sub preprocess {
my $v=shift;
($t1,$t2)=();
my $num=0;
for (0..$maxi) {
my $vec=vec $v, $_, 1;
push @{ $vec ? $t1 : $t2}, $names[$_];
$num += $vec;
}
$num;
}

my ($othertot,$othernum);
sub avg {
my $num=preprocess(shift);
my $tot=sum @score{@$t1};
($othernum,$othertot)=($len-$num,$verytot-$tot);
$tot/$num;
}

sub oavg () { $othertot/$othernum }

sub teams(\@\@) {
@{ $_[0] }=@$t1;
@{ $_[1] }=@$t2;
}
}

my $diff=abs(max(values %score)-min(values %score));
for (1..2**$len-2) {
my $v=pack 'C*', do {
my @v;
while ($_) {
push @v, $_%256;
$_ >>= 8;
}
@v;
};
my $avg1=avg $v;
my $avg2=oavg;
my (@t1,@t2);
teams(@t1,@t2);
next unless (my $tdiff=abs($avg1-$avg2))<$diff;
$diff=$tdiff;
print '-' x 72, "\n";
print <<"EOT";
Team #1: [@t1]
Team #2: [@t2]
Avg #1: $avg1 - Avg #2: $avg2 - Diff: $diff
EOT
}

__END__

Michele

brundlefly76 · May 25, 2007

thanks for the spamcop spam.

Thanks for so clearly identifying your lack of respect
for all of the people here.

Thanks for speaking on their behalf.

Josef Moellers · May 25, 2007

Thanks for speaking on their behalf.

In this case, Ted speaks for me, too.
You asked a question which was inappropriate in this newsgroup. You have
been given advice that it may be more helpful to ask in another
newsgroup. You have then been insulting.
Do you regularly go to the butcher to ask for bread or to the dentist to
have a wart taken care of? What to you say to them if they point out
that they won't be of much help.

If you had the math worked out and were looking for an ingenious way to
speed up the 95x57 matrix calculations involved in Perl, you're welcome
to show your code and we're likely to help.

As it stands, there's not much we can do.

seth brundle · May 26, 2007

Thanks for so clearly identifying your lack of respect

Thanks for speaking on their behalf.

In this case, Ted speaks for me, too.

Maybe you two can get together in a chat room or something and talk about
how this experience made you feel.

skywriter14 · May 26, 2007

Maybe you two can get together in a chat room or something and talk about
how this experience made you feel.

Idiot!

Is a Metaclass the appropriate way to solve this problem?	0	Aug 7, 2013
Using PERL to solve a specific bioinformatics problem	0	Jun 21, 2007
Interpreting Web statistics	10	Feb 20, 2006
Correct Identation/Contex can solve the too many compiler error messages problem when a closing brac	107	Sep 9, 2005
C newbie problem	3	Jun 28, 2008
Interesting challenge - can it be done?	4	Aug 23, 2008
opinion: comp lang docs style	10	Jan 4, 2011
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012

Solve a statistics problem

Seth Brundle

Paul Lalli

Seth Brundle

xhoster

Seth Brundle

Seth Brundle

xhoster

Michele Dondi

Seth Brundle

Seth Brundle

Seth Brundle

Michele Dondi

xhoster

Michele Dondi

Tad McClellan

Michele Dondi

brundlefly76

Josef Moellers

seth brundle

skywriter14

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads