Solve a statistics problem

S

Seth Brundle

Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the two
most balanced teams?

Obviously you could do something simple like alternate in descending order,
but this isnt guaranteed to produce the most accurate result.

Ideally, I would like to be able to balance based on the mean, median, or
both.

Code isnt necessary but of course appreciated - but if you have a link to
the fundementals of solving such a problem thats fine to - I want to learn
to do it for myself.
 
P

Paul Lalli

Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the two
most balanced teams?

This question has nothing to do with Perl. It would be the same
question regardless of what language you're using to implment your
solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Paul Lalli
 
S

Seth Brundle

thanks for the spamcop spam.
heres your cookie.


Paul Lalli said:
This question has nothing to do with Perl. It would be the same
question regardless of what language you're using to implment your
solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Paul Lalli
 
X

xhoster

Seth Brundle said:
Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the
two most balanced teams?

Mathematically, enumerate all possible sets of teams, compute the
balancedness of each, and take the one with the most balancedness. There
are various combinatoric modules and sample-code for Perl, but which one
you want probably depends on what exactly constitutes a legal pair of
teams.1
Obviously you could do something simple like alternate in descending
order, but this isnt guaranteed to produce the most accurate result.

It is not guaranteed to produce the *optimal* result. Accuracy probably
doesn't enter into it.
Ideally, I would like to be able to balance based on the mean, median, or
both.

Do both teams have to be the same size? If so, then balancing on mean is
the same as balancing on sum, right? That sounds like a variant of the
well-known bin-packing or knap-sack problems. For the median, that depends
strongly on whether the teams have to be divided equally, and also on
whether the number on each team is going to be odd or even, and on how you
define median in the case of even-membered teams.
Code isnt necessary but of course appreciated - but if you have a link to
the fundementals of solving such a problem thats fine to - I want to
learn to do it for myself.

knap-sack and bin-packing are very widely discussed fundamentals. Google
on those terms, you will find more than you care for on them.

Xho
 
S

Seth Brundle

Unless you have gross outliers, you will probably do well by sorting
the list by score, then you would take (unshift) pairs of people with
close scores (that is, take pair of persons 1 and 2 (two worst

thanks for the reply, but that solution is the one I referred to (basically
a phys-ed style draft pick) that I dont believe to be the most accurate.

Yes, the outcome I seek is that after the sort the two teams will have the
closest average (mean) mathematically possible.

However, it would be cool if I could do it by median as well, or both, just
for comparison.
 
S

Seth Brundle

knap-sack and bin-packing are very widely discussed fundamentals. Google
on those terms, you will find more than you care for on them.

Excellent thanks!
I see there is actually a perl-Algorithm-Knapsack module.

Nothing better then a unique keyword to help you find what your looking
for - thanks!
 
X

xhoster

Ignoramus6365 said:
Try enumerating all possibilities for two teams of 20 people each.

That would be 2^40 possibilities.

Well, more like 2^39. But even less if the teams have to be balanced in
size as well as in said:
1 trillion of them or so.

He said "mathematically", not "algorithmically". Since when are
mathematicians concerned about run time?

Xho
 
M

Michele Dondi

solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Along with some answers from crackpots who claim to have a simple FLT
proof, that is! ;-)


Michele
 
S

Seth Brundle

1 trillion of them or so.
He said "mathematically", not "algorithmically". Since when are
mathematicians concerned about run time?

hehe I think you will find a lot of them are just as efficiency-obsessed any
programmer.
 
S

Seth Brundle

``Obviously you could do something simple like alternate in descending
order, but this isnt guaranteed to produce the most accurate result.''

My suggestion does not merely alternate in descending order, it
randomizes the pairs. Yours guarantees that one team will be always
better than another. Whereas mine does not.

Sorry I misunderstood - interesting option!
 
S

Seth Brundle

knap-sack and bin-packing are very widely discussed fundamentals. Google
on those terms, you will find more than you care for on them.

Interesting, the knapsack and binpack solutions (which as I understand it
from reviewing these modules), seem to both be the exact same problem
domain, but that domain seems significantly different from my problem.

Basically, they are solving the problems of how to fill x number of y-sized
bins so there is the least amount of space left in the containers.

However, I dont think it necessarily guarantees the bins are filled equally,
as the last bin may be just the leftovers from the last-efficiently-filled
bin.

But I will play with them.
 
M

Michele Dondi

thanks for the reply, but that solution is the one I referred to (basically
a phys-ed style draft pick) that I dont believe to be the most accurate.

Yes, the outcome I seek is that after the sort the two teams will have the
closest average (mean) mathematically possible.

Yours is still a mathematical, and admittedly interesting, question.
Like Paul Lalli rightly pointed out, you'd better ask in a
mathematics, statistics or algorithm newsgroup or forum. Personally,
since I have no idea how to specifically help you with this particular
problem, I can only keep the discussion moderately on topic by
pointing you to a suitable CPAN search:

http://search.cpan.org/search?mode=module&query=statistics

See if any module can help you. Other than this, I could only write
some brute force approach for your, which I suppose is not what you're
after.


Michele
 
X

xhoster

Seth Brundle said:
Interesting, the knapsack and binpack solutions (which as I understand
it from reviewing these modules), seem to both be the exact same problem
domain, but that domain seems significantly different from my problem.

Depending on the exact nature of your problem, there may be ways to map it
into knapsack or bin-packing. There are several small-seeming
clarifications you haven't yet provided which have a very big impact on how
and whether this can be done.
Basically, they are solving the problems of how to fill x number of
y-sized bins so there is the least amount of space left in the
containers.

If you pack your objects into a bin whose capacity is one half of the total
score of all the objects, then the arrangement what maximizes the score
of the bin without going over the capacity will be the same arrangement
that makes the total of two bins closest to each other.

Anyway, when you make off-topic posts and then insult the regulars who
point this out to you, I'm less inclined to go much more out of my way to
help out.

Xho
 
M

Michele Dondi

He said "mathematically", not "algorithmically". Since when are
mathematicians concerned about run time?

Well, they are. In fact they began thinkering about this kinda stuff
even before actual computers were really built.


Michele
 
M

Michele Dondi

See if any module can help you. Other than this, I could only write
some brute force approach for your, which I suppose is not what you're
after.

After all, you got me intrigued so I cooked up a quick'n'dirty brute
force attempt. Please note that this lacks even the obvious
optimization of only searching half of the actual search space. I was
about to add that, but then I realized how really ugly and messy the
thingie is: it was not false modesty, it would reallty rather need a
complete rewrite, probably in terms of OO code. Anyway, hereafter it
is. Use at your own risk!


#!/usr/bin/perl -l

use strict;
use warnings;
use List::Util qw/sum min max/;

my @names=qw/alice bob carl darla eddie fred
gina hank kate jay larry mandy nick
noah pete randy stu terry/;
my %score=map { $_ => rand 1000_000} @names;
my $len=@names;

{
my $maxi=$#names;
my $verytot=sum values %score;

my ($t1,$t2);
sub preprocess {
my $v=shift;
($t1,$t2)=();
my $num=0;
for (0..$maxi) {
my $vec=vec $v, $_, 1;
push @{ $vec ? $t1 : $t2}, $names[$_];
$num += $vec;
}
$num;
}

my ($othertot,$othernum);
sub avg {
my $num=preprocess(shift);
my $tot=sum @score{@$t1};
($othernum,$othertot)=($len-$num,$verytot-$tot);
$tot/$num;
}

sub oavg () { $othertot/$othernum }

sub teams(\@\@) {
@{ $_[0] }=@$t1;
@{ $_[1] }=@$t2;
}
}

my $diff=abs(max(values %score)-min(values %score));
for (1..2**$len-2) {
my $v=pack 'C*', do {
my @v;
while ($_) {
push @v, $_%256;
$_ >>= 8;
}
@v;
};
my $avg1=avg $v;
my $avg2=oavg;
my (@t1,@t2);
teams(@t1,@t2);
next unless (my $tdiff=abs($avg1-$avg2))<$diff;
$diff=$tdiff;
print '-' x 72, "\n";
print <<"EOT";
Team #1: [@t1]
Team #2: [@t2]
Avg #1: $avg1 - Avg #2: $avg2 - Diff: $diff
EOT
}

__END__


Michele
 
J

Josef Moellers

Thanks for speaking on their behalf.

In this case, Ted speaks for me, too.
You asked a question which was inappropriate in this newsgroup. You have
been given advice that it may be more helpful to ask in another
newsgroup. You have then been insulting.
Do you regularly go to the butcher to ask for bread or to the dentist to
have a wart taken care of? What to you say to them if they point out
that they won't be of much help.

If you had the math worked out and were looking for an ingenious way to
speed up the 95x57 matrix calculations involved in Perl, you're welcome
to show your code and we're likely to help.

As it stands, there's not much we can do.
 
S

seth brundle

Thanks for so clearly identifying your lack of respect
Thanks for speaking on their behalf.
In this case, Ted speaks for me, too.

Maybe you two can get together in a chat room or something and talk about
how this experience made you feel.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top