Solve a statistics problem

Discussion in 'Perl Misc' started by Seth Brundle, May 11, 2007.

  1. Seth Brundle

    Seth Brundle Guest

    Say I have a hash with peoples names and a quality score, like their free
    throw percentage.

    Mathematically what is the most accurate way to divide the list into the two
    most balanced teams?

    Obviously you could do something simple like alternate in descending order,
    but this isnt guaranteed to produce the most accurate result.

    Ideally, I would like to be able to balance based on the mean, median, or
    both.

    Code isnt necessary but of course appreciated - but if you have a link to
    the fundementals of solving such a problem thats fine to - I want to learn
    to do it for myself.
     
    Seth Brundle, May 11, 2007
    #1
    1. Advertising

  2. Seth Brundle

    Paul Lalli Guest

    On May 11, 10:47 am, "Seth Brundle" <> wrote:
    > Say I have a hash with peoples names and a quality score, like their free
    > throw percentage.
    >
    > Mathematically what is the most accurate way to divide the list into the two
    > most balanced teams?


    This question has nothing to do with Perl. It would be the same
    question regardless of what language you're using to implment your
    solution. You're asking a mathematics question. You're more likely
    to get good answers in a group that discusses mathematics.

    Paul Lalli
     
    Paul Lalli, May 11, 2007
    #2
    1. Advertising

  3. Seth Brundle

    Seth Brundle Guest

    thanks for the spamcop spam.
    heres your cookie.


    "Paul Lalli" <> wrote in message
    news:...
    > On May 11, 10:47 am, "Seth Brundle" <> wrote:
    >> Say I have a hash with peoples names and a quality score, like their free
    >> throw percentage.
    >>
    >> Mathematically what is the most accurate way to divide the list into the
    >> two
    >> most balanced teams?

    >
    > This question has nothing to do with Perl. It would be the same
    > question regardless of what language you're using to implment your
    > solution. You're asking a mathematics question. You're more likely
    > to get good answers in a group that discusses mathematics.
    >
    > Paul Lalli
    >
     
    Seth Brundle, May 11, 2007
    #3
  4. Seth Brundle

    Guest

    "Seth Brundle" <> wrote:
    > Say I have a hash with peoples names and a quality score, like their free
    > throw percentage.
    >
    > Mathematically what is the most accurate way to divide the list into the
    > two most balanced teams?


    Mathematically, enumerate all possible sets of teams, compute the
    balancedness of each, and take the one with the most balancedness. There
    are various combinatoric modules and sample-code for Perl, but which one
    you want probably depends on what exactly constitutes a legal pair of
    teams.1

    > Obviously you could do something simple like alternate in descending
    > order, but this isnt guaranteed to produce the most accurate result.


    It is not guaranteed to produce the *optimal* result. Accuracy probably
    doesn't enter into it.

    >
    > Ideally, I would like to be able to balance based on the mean, median, or
    > both.


    Do both teams have to be the same size? If so, then balancing on mean is
    the same as balancing on sum, right? That sounds like a variant of the
    well-known bin-packing or knap-sack problems. For the median, that depends
    strongly on whether the teams have to be divided equally, and also on
    whether the number on each team is going to be odd or even, and on how you
    define median in the case of even-membered teams.

    >
    > Code isnt necessary but of course appreciated - but if you have a link to
    > the fundementals of solving such a problem thats fine to - I want to
    > learn to do it for myself.


    knap-sack and bin-packing are very widely discussed fundamentals. Google
    on those terms, you will find more than you care for on them.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , May 11, 2007
    #4
  5. Seth Brundle

    Seth Brundle Guest

    > Unless you have gross outliers, you will probably do well by sorting
    > the list by score, then you would take (unshift) pairs of people with
    > close scores (that is, take pair of persons 1 and 2 (two worst


    thanks for the reply, but that solution is the one I referred to (basically
    a phys-ed style draft pick) that I dont believe to be the most accurate.

    Yes, the outcome I seek is that after the sort the two teams will have the
    closest average (mean) mathematically possible.

    However, it would be cool if I could do it by median as well, or both, just
    for comparison.
     
    Seth Brundle, May 11, 2007
    #5
  6. Seth Brundle

    Seth Brundle Guest


    > knap-sack and bin-packing are very widely discussed fundamentals. Google
    > on those terms, you will find more than you care for on them.


    Excellent thanks!
    I see there is actually a perl-Algorithm-Knapsack module.

    Nothing better then a unique keyword to help you find what your looking
    for - thanks!
     
    Seth Brundle, May 11, 2007
    #6
  7. Seth Brundle

    Guest

    Ignoramus6365 <ignoramus6365@NOSPAM.6365.invalid> wrote:
    > On 11 May 2007 15:16:03 GMT, <> wrote:
    > > "Seth Brundle" <> wrote:
    > >> Say I have a hash with peoples names and a quality score, like their
    > >> free throw percentage.
    > >>
    > >> Mathematically what is the most accurate way to divide the list into
    > >> the two most balanced teams?

    > >
    > > Mathematically, enumerate all possible sets of teams, compute the
    > > balancedness of each, and take the one with the most balancedness.
    > > There are various combinatoric modules and sample-code for Perl, but
    > > which one you want probably depends on what exactly constitutes a legal
    > > pair of teams.1

    >
    > Try enumerating all possibilities for two teams of 20 people each.
    >
    > That would be 2^40 possibilities.


    Well, more like 2^39. But even less if the teams have to be balanced in
    size as well as in <whatever> score.

    > 1 trillion of them or so.


    He said "mathematically", not "algorithmically". Since when are
    mathematicians concerned about run time?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , May 11, 2007
    #7
  8. On 11 May 2007 08:00:54 -0700, Paul Lalli <> wrote:

    >solution. You're asking a mathematics question. You're more likely
    >to get good answers in a group that discusses mathematics.


    Along with some answers from crackpots who claim to have a simple FLT
    proof, that is! ;-)


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, May 11, 2007
    #8
  9. Seth Brundle

    Seth Brundle Guest

    >> 1 trillion of them or so.
    >
    > He said "mathematically", not "algorithmically". Since when are
    > mathematicians concerned about run time?


    hehe I think you will find a lot of them are just as efficiency-obsessed any
    programmer.
     
    Seth Brundle, May 11, 2007
    #9
  10. Seth Brundle

    Seth Brundle Guest

    > ``Obviously you could do something simple like alternate in descending
    > order, but this isnt guaranteed to produce the most accurate result.''
    >
    > My suggestion does not merely alternate in descending order, it
    > randomizes the pairs. Yours guarantees that one team will be always
    > better than another. Whereas mine does not.


    Sorry I misunderstood - interesting option!
     
    Seth Brundle, May 11, 2007
    #10
  11. Seth Brundle

    Seth Brundle Guest

    > knap-sack and bin-packing are very widely discussed fundamentals. Google
    > on those terms, you will find more than you care for on them.


    Interesting, the knapsack and binpack solutions (which as I understand it
    from reviewing these modules), seem to both be the exact same problem
    domain, but that domain seems significantly different from my problem.

    Basically, they are solving the problems of how to fill x number of y-sized
    bins so there is the least amount of space left in the containers.

    However, I dont think it necessarily guarantees the bins are filled equally,
    as the last bin may be just the leftovers from the last-efficiently-filled
    bin.

    But I will play with them.
     
    Seth Brundle, May 11, 2007
    #11
  12. On Fri, 11 May 2007 11:20:53 -0400, "Seth Brundle"
    <> wrote:

    >thanks for the reply, but that solution is the one I referred to (basically
    >a phys-ed style draft pick) that I dont believe to be the most accurate.
    >
    >Yes, the outcome I seek is that after the sort the two teams will have the
    >closest average (mean) mathematically possible.


    Yours is still a mathematical, and admittedly interesting, question.
    Like Paul Lalli rightly pointed out, you'd better ask in a
    mathematics, statistics or algorithm newsgroup or forum. Personally,
    since I have no idea how to specifically help you with this particular
    problem, I can only keep the discussion moderately on topic by
    pointing you to a suitable CPAN search:

    http://search.cpan.org/search?mode=module&query=statistics

    See if any module can help you. Other than this, I could only write
    some brute force approach for your, which I suppose is not what you're
    after.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, May 11, 2007
    #12
  13. Seth Brundle

    Guest

    "Seth Brundle" <> wrote:
    > > knap-sack and bin-packing are very widely discussed fundamentals.
    > > Google on those terms, you will find more than you care for on them.

    >
    > Interesting, the knapsack and binpack solutions (which as I understand
    > it from reviewing these modules), seem to both be the exact same problem
    > domain, but that domain seems significantly different from my problem.


    Depending on the exact nature of your problem, there may be ways to map it
    into knapsack or bin-packing. There are several small-seeming
    clarifications you haven't yet provided which have a very big impact on how
    and whether this can be done.

    > Basically, they are solving the problems of how to fill x number of
    > y-sized bins so there is the least amount of space left in the
    > containers.


    If you pack your objects into a bin whose capacity is one half of the total
    score of all the objects, then the arrangement what maximizes the score
    of the bin without going over the capacity will be the same arrangement
    that makes the total of two bins closest to each other.

    Anyway, when you make off-topic posts and then insult the regulars who
    point this out to you, I'm less inclined to go much more out of my way to
    help out.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , May 11, 2007
    #13
  14. On 11 May 2007 15:44:17 GMT, wrote:

    >> 1 trillion of them or so.

    >
    >He said "mathematically", not "algorithmically". Since when are
    >mathematicians concerned about run time?


    Well, they are. In fact they began thinkering about this kinda stuff
    even before actual computers were really built.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, May 11, 2007
    #14
  15. Seth Brundle <> wrote:
    > "Paul Lalli" <> wrote in message
    > news:...
    >> On May 11, 10:47 am, "Seth Brundle" <> wrote:
    >>> Say I have a hash with peoples names and a quality score, like their free
    >>> throw percentage.
    >>>
    >>> Mathematically what is the most accurate way to divide the list into the
    >>> two
    >>> most balanced teams?

    >>
    >> This question has nothing to do with Perl. It would be the same
    >> question regardless of what language you're using to implment your
    >> solution. You're asking a mathematics question. You're more likely
    >> to get good answers in a group that discusses mathematics.
    >>

    > thanks for the spamcop spam.



    Thanks for so clearly identifying your lack of respect
    for all of the people here.


    > heres your cookie.



    Here's your *plonk*!


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, May 12, 2007
    #15
  16. On Fri, 11 May 2007 18:03:38 +0200, Michele Dondi
    <> wrote:

    >See if any module can help you. Other than this, I could only write
    >some brute force approach for your, which I suppose is not what you're
    >after.


    After all, you got me intrigued so I cooked up a quick'n'dirty brute
    force attempt. Please note that this lacks even the obvious
    optimization of only searching half of the actual search space. I was
    about to add that, but then I realized how really ugly and messy the
    thingie is: it was not false modesty, it would reallty rather need a
    complete rewrite, probably in terms of OO code. Anyway, hereafter it
    is. Use at your own risk!


    #!/usr/bin/perl -l

    use strict;
    use warnings;
    use List::Util qw/sum min max/;

    my @names=qw/alice bob carl darla eddie fred
    gina hank kate jay larry mandy nick
    noah pete randy stu terry/;
    my %score=map { $_ => rand 1000_000} @names;
    my $len=@names;

    {
    my $maxi=$#names;
    my $verytot=sum values %score;

    my ($t1,$t2);
    sub preprocess {
    my $v=shift;
    ($t1,$t2)=();
    my $num=0;
    for (0..$maxi) {
    my $vec=vec $v, $_, 1;
    push @{ $vec ? $t1 : $t2}, $names[$_];
    $num += $vec;
    }
    $num;
    }

    my ($othertot,$othernum);
    sub avg {
    my $num=preprocess(shift);
    my $tot=sum @score{@$t1};
    ($othernum,$othertot)=($len-$num,$verytot-$tot);
    $tot/$num;
    }

    sub oavg () { $othertot/$othernum }

    sub teams(\@\@) {
    @{ $_[0] }=@$t1;
    @{ $_[1] }=@$t2;
    }
    }

    my $diff=abs(max(values %score)-min(values %score));
    for (1..2**$len-2) {
    my $v=pack 'C*', do {
    my @v;
    while ($_) {
    push @v, $_%256;
    $_ >>= 8;
    }
    @v;
    };
    my $avg1=avg $v;
    my $avg2=oavg;
    my (@t1,@t2);
    teams(@t1,@t2);
    next unless (my $tdiff=abs($avg1-$avg2))<$diff;
    $diff=$tdiff;
    print '-' x 72, "\n";
    print <<"EOT";
    Team #1: [@t1]
    Team #2: [@t2]
    Avg #1: $avg1 - Avg #2: $avg2 - Diff: $diff
    EOT
    }

    __END__


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, May 12, 2007
    #16
  17. Seth Brundle

    Guest

    > > thanks for the spamcop spam.
    >
    > Thanks for so clearly identifying your lack of respect
    > for all of the people here.


    Thanks for speaking on their behalf.
     
    , May 25, 2007
    #17
  18. wrote:
    >>>thanks for the spamcop spam.

    >>
    >>Thanks for so clearly identifying your lack of respect
    >>for all of the people here.

    >
    >
    > Thanks for speaking on their behalf.


    In this case, Ted speaks for me, too.
    You asked a question which was inappropriate in this newsgroup. You have
    been given advice that it may be more helpful to ask in another
    newsgroup. You have then been insulting.
    Do you regularly go to the butcher to ask for bread or to the dentist to
    have a wart taken care of? What to you say to them if they point out
    that they won't be of much help.

    If you had the math worked out and were looking for an ingenious way to
    speed up the 95x57 matrix calculations involved in Perl, you're welcome
    to show your code and we're likely to help.

    As it stands, there's not much we can do.
    --
    These are my personal views and not those of Fujitsu Siemens Computers!
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://www.fujitsu-siemens.com/imprint.html
     
    Josef Moellers, May 25, 2007
    #18
  19. Seth Brundle

    seth brundle Guest

    >>Thanks for so clearly identifying your lack of respect
    >>for all of the people here.

    >
    > Thanks for speaking on their behalf.


    > In this case, Ted speaks for me, too.


    Maybe you two can get together in a chat room or something and talk about
    how this experience made you feel.
     
    seth brundle, May 26, 2007
    #19
  20. Seth Brundle

    skywriter14 Guest

    On May 26, 7:07 am, "seth brundle" <> wrote:
    > >>Thanks for so clearly identifying your lack of respect
    > >>for all of the people here.

    >
    > > Thanks for speaking on their behalf.
    > > In this case, Ted speaks for me, too.

    >
    > Maybe you two can get together in a chat room or something and talk about
    > how this experience made you feel.


    Idiot!
     
    skywriter14, May 26, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Abdelhalim

    VHDL features Usage statistics

    Abdelhalim, May 18, 2004, in forum: VHDL
    Replies:
    3
    Views:
    704
    Abdelhalim
    May 19, 2004
  2. Paps

    Site Statistics

    Paps, Nov 24, 2003, in forum: ASP .Net
    Replies:
    5
    Views:
    457
    Jacob Yang [MSFT]
    Nov 27, 2003
  3. Lucas Tam
    Replies:
    2
    Views:
    428
    John Rivers
    Aug 27, 2005
  4. Mr Newbie

    Site Statistics - Need Display Package

    Mr Newbie, Dec 26, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    308
    Christopher Reed
    Dec 26, 2005
  5. Mark

    Learning with statistics

    Mark, Jun 27, 2003, in forum: Java
    Replies:
    4
    Views:
    884
    Marc Rochkind
    Jun 30, 2003
Loading...

Share This Page