# Solve a statistics problem

Discussion in 'Perl Misc' started by Seth Brundle, May 11, 2007.

1. ### Seth BrundleGuest

Say I have a hash with peoples names and a quality score, like their free
throw percentage.

Mathematically what is the most accurate way to divide the list into the two
most balanced teams?

Obviously you could do something simple like alternate in descending order,
but this isnt guaranteed to produce the most accurate result.

Ideally, I would like to be able to balance based on the mean, median, or
both.

Code isnt necessary but of course appreciated - but if you have a link to
the fundementals of solving such a problem thats fine to - I want to learn
to do it for myself.

Seth Brundle, May 11, 2007

2. ### Paul LalliGuest

On May 11, 10:47 am, "Seth Brundle" <> wrote:
> Say I have a hash with peoples names and a quality score, like their free
> throw percentage.
>
> Mathematically what is the most accurate way to divide the list into the two
> most balanced teams?

This question has nothing to do with Perl. It would be the same
question regardless of what language you're using to implment your
solution. You're asking a mathematics question. You're more likely
to get good answers in a group that discusses mathematics.

Paul Lalli

Paul Lalli, May 11, 2007

3. ### Seth BrundleGuest

thanks for the spamcop spam.

"Paul Lalli" <> wrote in message
news:...
> On May 11, 10:47 am, "Seth Brundle" <> wrote:
>> Say I have a hash with peoples names and a quality score, like their free
>> throw percentage.
>>
>> Mathematically what is the most accurate way to divide the list into the
>> two
>> most balanced teams?

>
> This question has nothing to do with Perl. It would be the same
> question regardless of what language you're using to implment your
> solution. You're asking a mathematics question. You're more likely
> to get good answers in a group that discusses mathematics.
>
> Paul Lalli
>

Seth Brundle, May 11, 2007
4. ### Guest

"Seth Brundle" <> wrote:
> Say I have a hash with peoples names and a quality score, like their free
> throw percentage.
>
> Mathematically what is the most accurate way to divide the list into the
> two most balanced teams?

Mathematically, enumerate all possible sets of teams, compute the
balancedness of each, and take the one with the most balancedness. There
are various combinatoric modules and sample-code for Perl, but which one
you want probably depends on what exactly constitutes a legal pair of
teams.1

> Obviously you could do something simple like alternate in descending
> order, but this isnt guaranteed to produce the most accurate result.

It is not guaranteed to produce the *optimal* result. Accuracy probably
doesn't enter into it.

>
> Ideally, I would like to be able to balance based on the mean, median, or
> both.

Do both teams have to be the same size? If so, then balancing on mean is
the same as balancing on sum, right? That sounds like a variant of the
well-known bin-packing or knap-sack problems. For the median, that depends
strongly on whether the teams have to be divided equally, and also on
whether the number on each team is going to be odd or even, and on how you
define median in the case of even-membered teams.

>
> Code isnt necessary but of course appreciated - but if you have a link to
> the fundementals of solving such a problem thats fine to - I want to
> learn to do it for myself.

knap-sack and bin-packing are very widely discussed fundamentals. Google
on those terms, you will find more than you care for on them.

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, May 11, 2007
5. ### Seth BrundleGuest

> Unless you have gross outliers, you will probably do well by sorting
> the list by score, then you would take (unshift) pairs of people with
> close scores (that is, take pair of persons 1 and 2 (two worst

thanks for the reply, but that solution is the one I referred to (basically
a phys-ed style draft pick) that I dont believe to be the most accurate.

Yes, the outcome I seek is that after the sort the two teams will have the
closest average (mean) mathematically possible.

However, it would be cool if I could do it by median as well, or both, just
for comparison.

Seth Brundle, May 11, 2007
6. ### Seth BrundleGuest

> knap-sack and bin-packing are very widely discussed fundamentals. Google
> on those terms, you will find more than you care for on them.

Excellent thanks!
I see there is actually a perl-Algorithm-Knapsack module.

for - thanks!

Seth Brundle, May 11, 2007
7. ### Guest

Ignoramus6365 <ignoramus6365@NOSPAM.6365.invalid> wrote:
> On 11 May 2007 15:16:03 GMT, <> wrote:
> > "Seth Brundle" <> wrote:
> >> Say I have a hash with peoples names and a quality score, like their
> >> free throw percentage.
> >>
> >> Mathematically what is the most accurate way to divide the list into
> >> the two most balanced teams?

> >
> > Mathematically, enumerate all possible sets of teams, compute the
> > balancedness of each, and take the one with the most balancedness.
> > There are various combinatoric modules and sample-code for Perl, but
> > which one you want probably depends on what exactly constitutes a legal
> > pair of teams.1

>
> Try enumerating all possibilities for two teams of 20 people each.
>
> That would be 2^40 possibilities.

Well, more like 2^39. But even less if the teams have to be balanced in
size as well as in <whatever> score.

> 1 trillion of them or so.

He said "mathematically", not "algorithmically". Since when are

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, May 11, 2007
8. ### Michele DondiGuest

On 11 May 2007 08:00:54 -0700, Paul Lalli <> wrote:

>solution. You're asking a mathematics question. You're more likely
>to get good answers in a group that discusses mathematics.

Along with some answers from crackpots who claim to have a simple FLT
proof, that is! ;-)

Michele
--
{\$_=pack'B8'x25,unpack'A8'x32,\$a^=sub{pop^pop}->(map substr
((\$a||=join'',map--\$|x\$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,\$_,
256),7,249);s/[^\w,]/ /g;\$ \=/^J/?\$/:"\r";print,redo}#JAPH,

Michele Dondi, May 11, 2007
9. ### Seth BrundleGuest

>> 1 trillion of them or so.
>
> He said "mathematically", not "algorithmically". Since when are
> mathematicians concerned about run time?

hehe I think you will find a lot of them are just as efficiency-obsessed any
programmer.

Seth Brundle, May 11, 2007
10. ### Seth BrundleGuest

> ``Obviously you could do something simple like alternate in descending
> order, but this isnt guaranteed to produce the most accurate result.''
>
> My suggestion does not merely alternate in descending order, it
> randomizes the pairs. Yours guarantees that one team will be always
> better than another. Whereas mine does not.

Sorry I misunderstood - interesting option!

Seth Brundle, May 11, 2007
11. ### Seth BrundleGuest

> knap-sack and bin-packing are very widely discussed fundamentals. Google
> on those terms, you will find more than you care for on them.

Interesting, the knapsack and binpack solutions (which as I understand it
from reviewing these modules), seem to both be the exact same problem
domain, but that domain seems significantly different from my problem.

Basically, they are solving the problems of how to fill x number of y-sized
bins so there is the least amount of space left in the containers.

However, I dont think it necessarily guarantees the bins are filled equally,
as the last bin may be just the leftovers from the last-efficiently-filled
bin.

But I will play with them.

Seth Brundle, May 11, 2007
12. ### Michele DondiGuest

On Fri, 11 May 2007 11:20:53 -0400, "Seth Brundle"
<> wrote:

>thanks for the reply, but that solution is the one I referred to (basically
>a phys-ed style draft pick) that I dont believe to be the most accurate.
>
>Yes, the outcome I seek is that after the sort the two teams will have the
>closest average (mean) mathematically possible.

Yours is still a mathematical, and admittedly interesting, question.
Like Paul Lalli rightly pointed out, you'd better ask in a
mathematics, statistics or algorithm newsgroup or forum. Personally,
since I have no idea how to specifically help you with this particular
problem, I can only keep the discussion moderately on topic by
pointing you to a suitable CPAN search:

http://search.cpan.org/search?mode=module&query=statistics

See if any module can help you. Other than this, I could only write
some brute force approach for your, which I suppose is not what you're
after.

Michele
--
{\$_=pack'B8'x25,unpack'A8'x32,\$a^=sub{pop^pop}->(map substr
((\$a||=join'',map--\$|x\$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,\$_,
256),7,249);s/[^\w,]/ /g;\$ \=/^J/?\$/:"\r";print,redo}#JAPH,

Michele Dondi, May 11, 2007
13. ### Guest

"Seth Brundle" <> wrote:
> > knap-sack and bin-packing are very widely discussed fundamentals.
> > Google on those terms, you will find more than you care for on them.

>
> Interesting, the knapsack and binpack solutions (which as I understand
> it from reviewing these modules), seem to both be the exact same problem
> domain, but that domain seems significantly different from my problem.

Depending on the exact nature of your problem, there may be ways to map it
into knapsack or bin-packing. There are several small-seeming
clarifications you haven't yet provided which have a very big impact on how
and whether this can be done.

> Basically, they are solving the problems of how to fill x number of
> y-sized bins so there is the least amount of space left in the
> containers.

If you pack your objects into a bin whose capacity is one half of the total
score of all the objects, then the arrangement what maximizes the score
of the bin without going over the capacity will be the same arrangement
that makes the total of two bins closest to each other.

Anyway, when you make off-topic posts and then insult the regulars who
point this out to you, I'm less inclined to go much more out of my way to
help out.

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, May 11, 2007
14. ### Michele DondiGuest

On 11 May 2007 15:44:17 GMT, wrote:

>> 1 trillion of them or so.

>
>He said "mathematically", not "algorithmically". Since when are

even before actual computers were really built.

Michele
--
{\$_=pack'B8'x25,unpack'A8'x32,\$a^=sub{pop^pop}->(map substr
((\$a||=join'',map--\$|x\$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,\$_,
256),7,249);s/[^\w,]/ /g;\$ \=/^J/?\$/:"\r";print,redo}#JAPH,

Michele Dondi, May 11, 2007

Seth Brundle <> wrote:
> "Paul Lalli" <> wrote in message
> news:...
>> On May 11, 10:47 am, "Seth Brundle" <> wrote:
>>> Say I have a hash with peoples names and a quality score, like their free
>>> throw percentage.
>>>
>>> Mathematically what is the most accurate way to divide the list into the
>>> two
>>> most balanced teams?

>>
>> This question has nothing to do with Perl. It would be the same
>> question regardless of what language you're using to implment your
>> solution. You're asking a mathematics question. You're more likely
>> to get good answers in a group that discusses mathematics.
>>

> thanks for the spamcop spam.

Thanks for so clearly identifying your lack of respect
for all of the people here.

--
Perl programming
Fort Worth, Texas

16. ### Michele DondiGuest

On Fri, 11 May 2007 18:03:38 +0200, Michele Dondi
<> wrote:

>See if any module can help you. Other than this, I could only write
>some brute force approach for your, which I suppose is not what you're
>after.

After all, you got me intrigued so I cooked up a quick'n'dirty brute
force attempt. Please note that this lacks even the obvious
optimization of only searching half of the actual search space. I was
about to add that, but then I realized how really ugly and messy the
thingie is: it was not false modesty, it would reallty rather need a
complete rewrite, probably in terms of OO code. Anyway, hereafter it
is. Use at your own risk!

#!/usr/bin/perl -l

use strict;
use warnings;
use List::Util qw/sum min max/;

my @names=qw/alice bob carl darla eddie fred
gina hank kate jay larry mandy nick
noah pete randy stu terry/;
my %score=map { \$_ => rand 1000_000} @names;
my \$len=@names;

{
my \$maxi=\$#names;
my \$verytot=sum values %score;

my (\$t1,\$t2);
sub preprocess {
my \$v=shift;
(\$t1,\$t2)=();
my \$num=0;
for (0..\$maxi) {
my \$vec=vec \$v, \$_, 1;
push @{ \$vec ? \$t1 : \$t2}, \$names[\$_];
\$num += \$vec;
}
\$num;
}

my (\$othertot,\$othernum);
sub avg {
my \$num=preprocess(shift);
my \$tot=sum @score{@\$t1};
(\$othernum,\$othertot)=(\$len-\$num,\$verytot-\$tot);
\$tot/\$num;
}

sub oavg () { \$othertot/\$othernum }

sub teams(\@\@) {
@{ \$_[0] }=@\$t1;
@{ \$_[1] }=@\$t2;
}
}

my \$diff=abs(max(values %score)-min(values %score));
for (1..2**\$len-2) {
my \$v=pack 'C*', do {
my @v;
while (\$_) {
push @v, \$_%256;
\$_ >>= 8;
}
@v;
};
my \$avg1=avg \$v;
my \$avg2=oavg;
my (@t1,@t2);
teams(@t1,@t2);
next unless (my \$tdiff=abs(\$avg1-\$avg2))<\$diff;
\$diff=\$tdiff;
print '-' x 72, "\n";
print <<"EOT";
Team #1: [@t1]
Team #2: [@t2]
Avg #1: \$avg1 - Avg #2: \$avg2 - Diff: \$diff
EOT
}

__END__

Michele
--
{\$_=pack'B8'x25,unpack'A8'x32,\$a^=sub{pop^pop}->(map substr
((\$a||=join'',map--\$|x\$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,\$_,
256),7,249);s/[^\w,]/ /g;\$ \=/^J/?\$/:"\r";print,redo}#JAPH,

Michele Dondi, May 12, 2007
17. ### Guest

> > thanks for the spamcop spam.
>
> Thanks for so clearly identifying your lack of respect
> for all of the people here.

Thanks for speaking on their behalf.

, May 25, 2007
18. ### Josef MoellersGuest

wrote:
>>>thanks for the spamcop spam.

>>
>>Thanks for so clearly identifying your lack of respect
>>for all of the people here.

>
>
> Thanks for speaking on their behalf.

In this case, Ted speaks for me, too.
You asked a question which was inappropriate in this newsgroup. You have
newsgroup. You have then been insulting.
Do you regularly go to the butcher to ask for bread or to the dentist to
have a wart taken care of? What to you say to them if they point out
that they won't be of much help.

If you had the math worked out and were looking for an ingenious way to
speed up the 95x57 matrix calculations involved in Perl, you're welcome
to show your code and we're likely to help.

As it stands, there's not much we can do.
--
These are my personal views and not those of Fujitsu Siemens Computers!
Josef MÃ¶llers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize (T. Pratchett)
Company Details: http://www.fujitsu-siemens.com/imprint.html

Josef Moellers, May 25, 2007
19. ### seth brundleGuest

>>Thanks for so clearly identifying your lack of respect
>>for all of the people here.

>
> Thanks for speaking on their behalf.

> In this case, Ted speaks for me, too.

Maybe you two can get together in a chat room or something and talk about
how this experience made you feel.

seth brundle, May 26, 2007
20. ### skywriter14Guest

On May 26, 7:07 am, "seth brundle" <> wrote:
> >>Thanks for so clearly identifying your lack of respect
> >>for all of the people here.

>
> > Thanks for speaking on their behalf.
> > In this case, Ted speaks for me, too.

>
> Maybe you two can get together in a chat room or something and talk about
> how this experience made you feel.

Idiot!

skywriter14, May 26, 2007