Ranking an array (with ties and tollerances)

Discussion in 'C Programming' started by Michael Angelo Ravera, Aug 17, 2011.

  1. I WILL be writing this myself (unless someone posts a solution that I particularly like), but does anyone have any tricks for efficient method of ranking the values in an unsorted array.
    The result that I would like to achieve is that we have an unsorted array "score" (for the first case we will assume that it is a 32-bit signed integer) of size "size" and two arrays of unsigned integers "rank" and "ties" large enough to hold a number at least as large as "size". Upon completion, I want each element of "rank" to be one larger than the number of items in score that are greater than the corresponding element of "score" and each element of "ties" to equal the number of other elements in "score" that are equal to the corresponding element in "score".

    For the second case, "fscore" is a floating point number of appropriate precision and "size" "rank" and "ties" are as before, but we have a multiplicative tollerance within which scores are to be considered as equal.

    What I have done in the past is to iniialize all of the elements of rank to1 and all of the elements of ties to 0 and for each element of score ade the comparision and incremented rank and ties as necessary while skipping the current element.

    Something like this for the integer case.

    #define size appropriately
    int this_el, comp_el;
    INT32 score [size];
    unsigned rank [size], ties [size];

    for (this_el = 0; this_el < size; this_el ++)
    {
    rank [this_el] = 1;
    ties [this_el] = 0;
    }

    for (this_el = 0; this_el < size; this_el ++)
    {
    for (comp_el = 0; comp_el < this_el; comp_el ++)
    {
    if (score [this_el] < score [comp_el])
    rank [this_el] ++;
    else if (score [this_el] == score [comp_el])
    ties [this_el] ++;
    }
    for (comp_el = this_el + 1; comp_el < size; comp_el ++)
    {
    if (score [this_el] < score [comp_el])
    rank [this_el] ++;
    else if (score [this_el] == score [comp_el])
    ties [this_el] ++;
    }
    }

    Yeah, you can just have one internal loop and test for and skip over the (this_el == comp_el) but I think that the way that I have shown here is the better first cut.

    For the floating point case it looks like this:
    #define size appropriately
    /* tol should be between 0 and 1 */
    #define tol 0.001
    int this_el, comp_el;
    float score [size];
    unsigned rank [size], ties [size];

    for (this_el = 0; this_el < size; this_el ++)
    {
    rank [this_el] = 1;
    ties [this_el] = 0;
    }

    for (this_el = 0; this_el < size; this_el ++)
    {
    for (comp_el = 0; comp_el < this_el; comp_el ++)
    {
    if (score [this_el] < (score [comp_el] / (1 + tol)))
    rank [this_el] ++;
    else if (score [this_el] < ((1 + tol) * score [comp_el]))
    ties [this_el] ++;
    }
    for (comp_el = this_el + 1; comp_el < size; comp_el ++)
    {
    if (score [this_el] < (score [comp_el] / (1 + tol)))
    rank [this_el] ++;
    else if (score [this_el] < ((1 + tol) * score [comp_el]))
    ties [this_el] ++;
    }
    }

    Is there some trick to doing a lot better than this given the assumptions? The reason that sorting is undesirable is that I need to be able to presentranks for several different scores for the same contestant. If someone wants to make a credible argument that I can sort, compute ranks and ties and present the results more efficiently than just computing each as above, I am willing to listen.
     
    Michael Angelo Ravera, Aug 17, 2011
    #1
    1. Advertising

  2. Michael Angelo Ravera

    Ben Pfaff Guest

    Michael Angelo Ravera <> writes:

    > Is there some trick to doing a lot better than this given the
    > assumptions? The reason that sorting is undesirable is that I
    > need to be able to present ranks for several different scores
    > for the same contestant. If someone wants to make a credible
    > argument that I can sort, compute ranks and ties and present
    > the results more efficiently than just computing each as above,
    > I am willing to listen.


    You are using nested loops to do ranking, with cost O(n**2).
    A competently implemented sort-based rank would cost O(n lg n).
    If n is small, your loops are probably just as fast, or at any
    rate fast enough that the difference doesn't signify. But as
    n grows, you should find that the sort-based solution passes
    nested loops in performance.
    --
    char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
    ={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
    =b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
    2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
     
    Ben Pfaff, Aug 17, 2011
    #2
    1. Advertising

  3. Michael Angelo Ravera

    Gene Guest

    On Aug 17, 2:49 pm, Michael Angelo Ravera <>
    wrote:
    > I WILL be writing this myself (unless someone posts a solution that I particularly like), but does anyone have any tricks for efficient method of ranking the values in an unsorted array.
    > The result that I would like to achieve is that we have an unsorted array"score" (for the first case we will assume that it is a 32-bit signed integer) of size "size" and two arrays of unsigned integers "rank" and "ties" large enough to hold a number at least as large as "size". Upon completion, I want each element of "rank" to be one larger than the number of items in score that are greater than the corresponding element of "score" and each element of "ties" to equal the number of other elements in "score" that are equal to the corresponding element in "score".
    >
    > For the second case, "fscore" is a floating point number of appropriate precision and "size" "rank" and "ties" are as before, but we have a multiplicative tollerance within which scores are to be considered as equal.
    >
    > What I have done in the past is to iniialize all of the elements of rank to 1 and all of the elements of ties to 0 and for each element of score adethe comparision and incremented rank and ties as necessary while skipping the current element.
    >
    > Something like this for the integer case.
    >
    > #define size appropriately
    > int this_el, comp_el;
    > INT32 score [size];
    > unsigned rank [size], ties [size];
    >
    > for (this_el = 0; this_el < size; this_el ++)
    >   {
    >   rank [this_el] = 1;
    >   ties [this_el] = 0;
    >   }
    >
    > for (this_el = 0; this_el < size; this_el ++)
    >   {
    >   for (comp_el = 0; comp_el < this_el; comp_el ++)
    >     {
    >     if (score [this_el] < score [comp_el])
    >       rank [this_el] ++;
    >     else if (score [this_el] == score [comp_el])
    >       ties [this_el] ++;
    >     }
    >   for (comp_el = this_el + 1; comp_el < size; comp_el ++)
    >     {
    >     if (score [this_el] < score [comp_el])
    >       rank [this_el] ++;
    >     else if (score [this_el] == score [comp_el])
    >       ties [this_el] ++;
    >     }
    >   }
    >
    > Yeah, you can just have one internal loop and test for and skip over the (this_el == comp_el) but I think that the way that I have shown here isthe better first cut.
    >
    > For the floating point case it looks like this:
    > #define size appropriately
    > /* tol should be between 0 and 1 */
    > #define tol 0.001
    > int this_el, comp_el;
    > float score [size];
    > unsigned rank [size], ties [size];
    >
    > for (this_el = 0; this_el < size; this_el ++)
    >   {
    >   rank [this_el] = 1;
    >   ties [this_el] = 0;
    >   }
    >
    > for (this_el = 0; this_el < size; this_el ++)
    >   {
    >   for (comp_el = 0; comp_el < this_el; comp_el ++)
    >     {
    >     if (score [this_el] < (score [comp_el] / (1 + tol)))
    >       rank [this_el] ++;
    >     else if (score [this_el] < ((1 + tol) * score [comp_el]))
    >       ties [this_el] ++;
    >     }
    >   for (comp_el = this_el + 1; comp_el < size; comp_el ++)
    >     {
    >     if (score [this_el] < (score [comp_el] / (1 + tol)))
    >       rank [this_el] ++;
    >     else if (score [this_el] < ((1 + tol) * score [comp_el]))
    >       ties [this_el] ++;
    >     }
    >   }
    >
    > Is there some trick to doing a lot better than this given the assumptions? The reason that sorting is undesirable is that I need to be able to present ranks for several different scores for the same contestant. If someone wants to make a credible argument that I can sort, compute ranks and ties and present the results more efficiently than just computing each as above, Iam willing to listen.


    First, prepare to be told this isn't a C question. Comp.programming
    used to be great for this sort of thing, but lo it has fallen...

    How about an indirect sort in a temporary array of indices, then
    compute the ranks and ties in one pass and throw the array away? If
    size is over a couple of hundred or so, there ought to be a
    performance improvement. Of course this is relative. If you're doing
    the computation once an hour, it won't make much difference perhaps
    until roughly 10K to 100K.

    A more interesting question is how to maintain a data structure "on
    line" so that you can always get the rank and number of ties of any
    team on the fly as scores change. One way is an order statistics
    tree, which is at heart a specialized search tree (BST, B-Tree, etc.).
     
    Gene, Aug 17, 2011
    #3
  4. On Wednesday, August 17, 2011 12:38:59 PM UTC-7, Gene wrote:
    > On Aug 17, 2:49 pm, Michael Angelo Ravera <>
    > wrote:
    > > I WILL be writing this myself (unless someone posts a solution that I particularly like), but does anyone have any tricks for efficient method of ranking the values in an unsorted array.
    > > The result that I would like to achieve is that we have an unsorted array "score" (for the first case we will assume that it is a 32-bit signed integer) of size "size" and two arrays of unsigned integers "rank" and "ties"large enough to hold a number at least as large as "size". Upon completion, I want each element of "rank" to be one larger than the number of items in score that are greater than the corresponding element of "score" and eachelement of "ties" to equal the number of other elements in "score" that are equal to the corresponding element in "score".
    > >
    > > For the second case, "fscore" is a floating point number of appropriateprecision and "size" "rank" and "ties" are as before, but we have a multiplicative tollerance within which scores are to be considered as equal.
    > >
    > > What I have done in the past is to iniialize all of the elements of rank to 1 and all of the elements of ties to 0 and for each element of score ade the comparision and incremented rank and ties as necessary while skipping the current element.
    > >
    > > Something like this for the integer case.
    > >
    > > #define size appropriately
    > > int this_el, comp_el;
    > > INT32 score [size];
    > > unsigned rank [size], ties [size];
    > >
    > > for (this_el = 0; this_el < size; this_el ++)
    > >   {
    > >   rank [this_el] = 1;
    > >   ties [this_el] = 0;
    > >   }
    > >
    > > for (this_el = 0; this_el < size; this_el ++)
    > >   {
    > >   for (comp_el = 0; comp_el < this_el; comp_el ++)
    > >     {
    > >     if (score [this_el] < score [comp_el])
    > >       rank [this_el] ++;
    > >     else if (score [this_el] == score [comp_el])
    > >       ties [this_el] ++;
    > >     }
    > >   for (comp_el = this_el + 1; comp_el < size; comp_el ++)
    > >     {
    > >     if (score [this_el] < score [comp_el])
    > >       rank [this_el] ++;
    > >     else if (score [this_el] == score [comp_el])
    > >       ties [this_el] ++;
    > >     }
    > >   }
    > >
    > > Yeah, you can just have one internal loop and test for and skip over the (this_el == comp_el) but I think that the way that I have shown here is the better first cut.
    > >
    > > For the floating point case it looks like this:
    > > #define size appropriately
    > > /* tol should be between 0 and 1 */
    > > #define tol 0.001
    > > int this_el, comp_el;
    > > float score [size];
    > > unsigned rank [size], ties [size];
    > >
    > > for (this_el = 0; this_el < size; this_el ++)
    > >   {
    > >   rank [this_el] = 1;
    > >   ties [this_el] = 0;
    > >   }
    > >
    > > for (this_el = 0; this_el < size; this_el ++)
    > >   {
    > >   for (comp_el = 0; comp_el < this_el; comp_el ++)
    > >     {
    > >     if (score [this_el] < (score [comp_el] / (1 + tol)))
    > >       rank [this_el] ++;
    > >     else if (score [this_el] < ((1 + tol) * score [comp_el]))
    > >       ties [this_el] ++;
    > >     }
    > >   for (comp_el = this_el + 1; comp_el < size; comp_el ++)
    > >     {
    > >     if (score [this_el] < (score [comp_el] / (1 + tol)))
    > >       rank [this_el] ++;
    > >     else if (score [this_el] < ((1 + tol) * score [comp_el]))
    > >       ties [this_el] ++;
    > >     }
    > >   }
    > >
    > > Is there some trick to doing a lot better than this given the assumptions? The reason that sorting is undesirable is that I need to be able to present ranks for several different scores for the same contestant. If someonewants to make a credible argument that I can sort, compute ranks and ties and present the results more efficiently than just computing each as above,I am willing to listen.

    >
    > First, prepare to be told this isn't a C question. Comp.programming
    > used to be great for this sort of thing, but lo it has fallen...


    I agree. This is actually an algorithms question. However, I was looking for a C "trick". So, it is actually topical.

    > How about an indirect sort in a temporary array of indices, then
    > compute the ranks and ties in one pass and throw the array away? If
    > size is over a couple of hundred or so, there ought to be a
    > performance improvement. Of course this is relative. If you're doing
    > the computation once an hour, it won't make much difference perhaps
    > until roughly 10K to 100K.


    Once sorted, the common case *looks* linear, but I think that the *worst* case is still O(N**2). Maybe you can compare forward testing for equality (or tollerance) and never do any retracing.

    > A more interesting question is how to maintain a data structure "on
    > line" so that you can always get the rank and number of ties of any
    > team on the fly as scores change. One way is an order statistics
    > tree, which is at heart a specialized search tree (BST, B-Tree, etc.).


    I agree that the problem that you proposed is a more interesting case. In fact, in the online case, you might be interested in absolute tollerances aswell as multiplicative ones, as well as fuzzy ranking of contestants who scores can't be strictly compared at present because of competitions that are in progress while others are complete.
     
    Michael Angelo Ravera, Aug 17, 2011
    #4
  5. Michael Angelo Ravera <> writes:
    > I WILL be writing this myself (unless someone posts a solution that I particularly like), but does anyone have any tricks for efficient method of ranking the values in an unsorted array.

    [...]

    Michael, it would be helpful if you'd format your text in lines no
    longer than 72 columns (or 80 at the most). Thanks.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 17, 2011
    #5
  6. Michael Angelo Ravera

    Eric Sosman Guest

    On 8/17/2011 4:21 PM, Michael Angelo Ravera wrote:
    > On Wednesday, August 17, 2011 12:38:59 PM UTC-7, Gene wrote:
    >> On Aug 17, 2:49 pm, Michael Angelo Ravera<>
    >> wrote:
    >>> ]...]

    >> First, prepare to be told this isn't a C question. Comp.programming
    >> used to be great for this sort of thing, but lo it has fallen...

    >
    > I agree. This is actually an algorithms question. However, I was looking for a C "trick". So, it is actually topical.


    There are very few tricks to C. Lots of tricksters, but they're
    just playing a shell game. There is less to it than meets the eye.

    >> How about an indirect sort in a temporary array of indices, then
    >> compute the ranks and ties in one pass and throw the array away? If
    >> size is over a couple of hundred or so, there ought to be a
    >> performance improvement. Of course this is relative. If you're doing
    >> the computation once an hour, it won't make much difference perhaps
    >> until roughly 10K to 100K.

    >
    > Once sorted, the common case *looks* linear, but I think that the *worst* case is still O(N**2). Maybe you can compare forward testing for equality (or tollerance) and never do any retracing.


    Linear.

    for (int bgn = 0, end; bgn < size; bgn = end) {
    for (end = bgn; ++end < size; ) {
    ... suitable tests and `break' ...
    }
    }

    Visits each array element once and once only, hence linear.

    --
    Eric Sosman
    d
     
    Eric Sosman, Aug 18, 2011
    #6
  7. Until about a week ago, GoogleGroups did that for me. I guess the new version
    stopped.
     
    Michael Angelo Ravera, Aug 18, 2011
    #7
  8. In article <>,
    Keith Thompson <> wrote:
    >Michael Angelo Ravera <> writes:
    >> Until about a week ago, GoogleGroups did that for me. I guess the new version
    >> stopped.

    >
    >Google Groups also used to properly quote previous articles. I think
    >you can get the correct behavior back by going back to the old
    >interface.
    >
    >Or, better yet, get a real newsreader and sign up with a free NNTP
    >server (I use eternal-september.org). Google Groups is useful for
    >searching, and find for its own non-Usenet groups, but horrible for
    >reading and posting to Usenet.


    If it is (so) horrible, then why do so many people use it?

    I.e., isn't it fair to say that your complaint is about as effective as
    telling us that "Dancing With The Stars" is horrible? No one who likes DWTS
    (or GG) is going to listen to your complaint.

    --
    Is God willing to prevent evil, but not able? Then he is not omnipotent.
    Is he able, but not willing? Then he is malevolent.
    Is he both able and willing? Then whence cometh evil?
    Is he neither able nor willing? Then why call him God?
    ~ Epicurus
     
    Kenny McCormack, Aug 18, 2011
    #8
  9. Michael Angelo Ravera

    John Gordon Guest

    In <j2jrdq$ibr$> (Kenny McCormack) writes:

    > >server (I use eternal-september.org). Google Groups is useful for
    > >searching, and find for its own non-Usenet groups, but horrible for
    > >reading and posting to Usenet.


    > If it is (so) horrible, then why do so many people use it?


    Perhaps because many people don't want to go to the effort of getting
    and using a real newsreader, or are simply unaware that better options
    exist.

    Surely you do not mean to imply that popularity equals worth.

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
     
    John Gordon, Aug 18, 2011
    #9
  10. Michael Angelo Ravera

    Seebs Guest

    On 2011-08-18, John Gordon <> wrote:
    > In <j2jrdq$ibr$> (Kenny McCormack) writes:
    >> If it is (so) horrible, then why do so many people use it?


    > Perhaps because many people don't want to go to the effort of getting
    > and using a real newsreader, or are simply unaware that better options
    > exist.


    > Surely you do not mean to imply that popularity equals worth.


    I think he rather does, given that he's said so in the past in so many
    words. He does not believe there is value other than status; popularity
    is a kind of status.

    -s
    --
    Copyright 2011, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    I am not speaking for my employer, although they do rent some of my opinions.
     
    Seebs, Aug 19, 2011
    #10
  11. Michael Angelo Ravera

    BartC Guest

    "Eric Sosman" <> wrote in message
    news:j2hnf3$8pm$...
    > On 8/17/2011 4:21 PM, Michael Angelo Ravera wrote:


    >> Once sorted, the common case *looks* linear, but I think that the *worst*
    >> case is still O(N**2). Maybe you can compare forward testing for equality
    >> (or tollerance) and never do any retracing.

    >
    > Linear.
    >
    > for (int bgn = 0, end; bgn < size; bgn = end) {
    > for (end = bgn; ++end < size; ) {
    > ... suitable tests and `break' ...
    > }
    > }
    >
    > Visits each array element once and once only, hence linear.


    I suppose you could visit each element twice (or a hundred times) and it
    would still be linear?

    --
    Bartc
     
    BartC, Aug 19, 2011
    #11
  12. Michael Angelo Ravera

    Eric Sosman Guest

    On 8/19/2011 5:54 AM, BartC wrote:
    > "Eric Sosman" <> wrote in message
    > news:j2hnf3$8pm$...
    >> On 8/17/2011 4:21 PM, Michael Angelo Ravera wrote:

    >
    >>> Once sorted, the common case *looks* linear, but I think that the
    >>> *worst* case is still O(N**2). Maybe you can compare forward testing
    >>> for equality (or tollerance) and never do any retracing.

    >>
    >> Linear.
    >>
    >> for (int bgn = 0, end; bgn < size; bgn = end) {
    >> for (end = bgn; ++end < size; ) {
    >> ... suitable tests and `break' ...
    >> }
    >> }
    >>
    >> Visits each array element once and once only, hence linear.

    >
    > I suppose you could visit each element twice (or a hundred times) and it
    > would still be linear?


    Yes. The point is that the leading term in the number of
    visits is C*size for some constant C, not the C*size*size the
    O.P. feared.

    --
    Eric Sosman
    d
     
    Eric Sosman, Aug 19, 2011
    #12
  13. In article <j2jrq8$gnh$>,
    John Gordon <> wrote:
    >In <j2jrdq$ibr$> (Kenny
    >McCormack) writes:
    >
    >> >server (I use eternal-september.org). Google Groups is useful for
    >> >searching, and find for its own non-Usenet groups, but horrible for
    >> >reading and posting to Usenet.

    >
    >> If it is (so) horrible, then why do so many people use it?

    >
    >Perhaps because many people don't want to go to the effort of getting
    >and using a real newsreader, or are simply unaware that better options
    >exist.
    >
    >Surely you do not mean to imply that popularity equals worth.


    What I am saying is that it doesn't do any more good to tell the GG folks
    that what they are using is crap than it does to tell the DWTS folks that
    they are watching crap. There's a whole lot more of them than there are of
    us - and they aren't listening to anything we say.

    --
    "Remember when teachers, public employees, Planned Parenthood, NPR and PBS
    crashed the stock market, wiped out half of our 401Ks, took trillions in
    TARP money, spilled oil in the Gulf of Mexico, gave themselves billions in
    bonuses, and paid no taxes? Yeah, me neither."
     
    Kenny McCormack, Aug 19, 2011
    #13
  14. OK, so I did an index sort (I used a student's sort for the particular example, but I will do something more effcient later) and then went on to rank and process ties. It looks like I recompared the values for each contestantno more than twice. So, the ranking and reprocessing for ties is linear.

    for (temp_idx = 0; temp_idx < SiZe; temp_idx ++)
    {
    for (num_ties = 0;
    (((temp_idx + num_ties + 1) < SiZe) &&
    (team_totals [team_index_ranks [temp_idx]] ==
    team_totals [team_index_ranks [temp_idx + 1 + num_ties]]));
    num_ties ++)
    ;
    for (name_team = 0; name_team <= num_ties; name_team ++)
    {
    team_total_ranks [team_index_ranks [temp_idx + name_team]] = temp_idx+ 1;
    team_total_ties [team_index_ranks [temp_idx + name_team]] = num_ties;
    }
    temp_idx += num_ties;
    }
     
    Michael Angelo Ravera, Aug 21, 2011
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. -Rob

    google and page ranking

    -Rob, Oct 7, 2003, in forum: HTML
    Replies:
    16
    Views:
    659
    Mini Me
    Oct 17, 2003
  2. Andrew Thompson

    Code that ties up the compiler

    Andrew Thompson, Nov 26, 2008, in forum: Java
    Replies:
    1
    Views:
    295
    Roedy Green
    Nov 27, 2008
  3. Axel Etzold

    Finding ties in sorting?

    Axel Etzold, Sep 14, 2007, in forum: Ruby
    Replies:
    4
    Views:
    131
    Dan Zwell
    Sep 14, 2007
  4. Andrew Hamm

    Chains of ties

    Andrew Hamm, Jul 9, 2004, in forum: Perl Misc
    Replies:
    0
    Views:
    111
    Andrew Hamm
    Jul 9, 2004
  5. kj
    Replies:
    8
    Views:
    150
    phaylon
    Mar 14, 2005
Loading...

Share This Page