Re: A binary search algorithm that searches for an element similar to the key

Discussion in 'C Programming' started by Ben Bacarisse, Dec 21, 2011.

  1. pozz <> writes:

    > I want to write the function binsearch_low() with the same interface
    > of bsearch() standard function:
    >
    > void * binsearch_low (const void *key, const void *array, size_t
    > count, size_t size, comparison_fn_t compare);
    >
    > This function should return a pointer to the element of the array that
    > is the greatest among the element with a lower value than key. If no
    > element lower than key exists, the function should return NULL.
    >
    > For example, if I have the array { 10, 12, 17, 20, 25, 30 }, the
    > function should return NULL for key=8, 10 for key=11, 12 for key=12,
    > 20 for key=22, 30 for key=33.
    >
    > I started from a bsearch() implementation...


    Some of the problems in your come from using this as the starting
    point. It's not a particularly good model.

    > void *
    > binsearch_low (const void *key, const void *array, size_t count,
    > size_t size, comparison_fn_t compare)
    > {
    > int left = 0, right = count - 1;
    > while (left <= right) {
    > const int mid = (left + right) / 2;
    > void *element = (unsigned char *)array + mid * size;
    > if (compare(key, element) > 0) left = mid + 1;
    > else if (compare(key, element) < 0) right = mid - 1;
    > else return (unsigned char *)array + mid * size;
    > }
    > return NULL;
    > }
    >
    > ...and arrived to the following algorithm:
    >
    > void *
    > binsearch_low (const void *key, const void *array, size_t count,
    > size_t size, comparison_fn_t compare)
    > {
    > int left = 0, right = count - 1;


    int may be too small to hold the values you need. size_t will almost
    always have a much wider range than int. Worse, however, is the fact
    that when count is zero, the initialisation will go wrong.

    > while (left < right - 1) {


    If you change to using size_t, watch this - 1.

    > const int mid = (left + right) / 2;


    To avoid overflow, divide both and then add.

    > void *element = (unsigned char *)array + mid * size;
    > if (compare(key, element) > 0) left = mid;
    > else if (compare(key, element) < 0) right = mid - 1;
    > else return (unsigned char *)array + mid * size;


    This does not match your specification (and you could just have said
    "return element;").

    > }
    > if (compare(key, (unsigned char *)array + left * size) < 0) {
    > return NULL;
    > } else if (compare(key, (unsigned char *)array + right * size) >= 0) {
    > return (unsigned char *)array + right * size;
    > } else {
    > return (unsigned char *)array + left * size;
    > }
    > }


    It would be better if the casts that are not on return statements were
    to cont char * with the corresponding change to element's declaration.
    Since the function does not return a pointer to const void (mirroring
    bsearch) you have to break const correctness on the returns.

    > Do you have any suggestions and/or improvements?


    --
    Ben.
     
    Ben Bacarisse, Dec 21, 2011
    #1
    1. Advertising

  2. Ben Bacarisse

    pozz Guest

    Re: A binary search algorithm that searches for an element similar tothe key

    On 21 Dic, 01:14, Ben Bacarisse <> wrote:
    > pozz <> writes:
    > > I want to write the function binsearch_low() with the same interface
    > > of bsearch() standard function:

    >
    > > void * binsearch_low (const void *key, const void *array, size_t
    > > count, size_t size, comparison_fn_t compare);

    >
    > > This function should return a pointer to the element of the array that
    > > is the greatest among the element with a lower value than key.  If no
    > > element lower than key exists, the function should return NULL.

    >
    > > For example, if I have the array { 10, 12, 17, 20, 25, 30 }, the
    > > function should return NULL for key=8, 10 for key=11, 12 for key=12,
    > > 20 for key=22, 30 for key=33.

    >
    > > I started from a bsearch() implementation...

    >
    > Some of the problems in your come from using this as the starting
    > point.  It's not a particularly good model.


    Could you suggest some other implementation?


    > > void *
    > > binsearch_low (const void *key, const void *array, size_t count,
    > > size_t size, comparison_fn_t compare)
    > > {
    > >   int left = 0, right = count - 1;
    > >   while (left <= right) {
    > >     const int mid = (left + right) / 2;
    > >     void *element = (unsigned char *)array + mid * size;
    > >     if (compare(key, element) > 0) left = mid + 1;
    > >     else if (compare(key, element) < 0) right = mid - 1;
    > >     else return (unsigned char *)array + mid * size;
    > >   }
    > >   return NULL;
    > > }

    >
    > > ...and arrived to the following algorithm:

    >
    > > void *
    > > binsearch_low (const void *key, const void *array, size_t count,
    > > size_t size, comparison_fn_t compare)
    > > {
    > >   int left = 0, right = count - 1;

    >
    > int may be too small to hold the values you need.  size_t will almost
    > always have a much wider range than int.


    Good suggestion.


    > Worse, however, is the fact
    > that when count is zero, the initialisation will go wrong.


    count is unsigned and the operation "count - 1" is performed as
    unsigned.
    If count == 0, count - 1 is UINT_MAX that is not representable in a
    int
    variable. Is this the problem?


    > >   while (left < right - 1) {

    >
    > If you change to using size_t, watch this - 1.


    I changed to:
    while (right - left > 1) {

    > >     const int mid = (left + right) / 2;

    >
    > To avoid overflow, divide both and then add.


    Do you suggest to write

    const int mid = left / 2 + right / 2;

    ? I think it doesn't always work. If left is 5 and right is 7, mid
    should be 6, but 5/2 + 7/2 = 5.


    > >     void *element = (unsigned char *)array + mid * size;
    > >     if (compare(key, element) > 0) left = mid;
    > >     else if (compare(key, element) < 0) right = mid - 1;
    > >     else return (unsigned char *)array + mid * size;

    >
    > This does not match your specification (and you could just have said
    > "return element;").


    I was wrong writing the specifications. The correct specification is:

    "This function should return a pointer to the element of the array
    that
    is the greatest among the elements with a lower or the same value as
    the
    key. If no element lower than key exists, the function should return
    NULL."


    > >   }
    > >   if (compare(key, (unsigned char *)array + left * size) < 0) {
    > >     return NULL;
    > >   } else if (compare(key, (unsigned char *)array + right * size) >=0) {
    > >     return (unsigned char *)array + right * size;
    > >   } else {
    > >     return (unsigned char *)array + left * size;
    > >   }
    > > }

    >
    > It would be better if the casts that are not on return statements were
    > to cont char * with the corresponding change to element's declaration.
    > Since the function does not return a pointer to const void (mirroring
    > bsearch) you have to break const correctness on the returns.


    Why should I cast array to (const char *)?

    This is my last implementation:

    void *
    binsearch_low (const void *key, const void *array, size_t count,
    size_t size, comparison_fn_t compare)
    {
    if (count) {
    size_t left = 0;
    size_t right = count - 1;
    while (right - left > 1) {
    const int mid = (left + right) / 2;
    const void *element = (unsigned char *)array + mid * size;
    if (compare(key, element) > 0) left = mid;
    else if (compare(key, element) < 0) right = mid - 1;
    else return (void *)element;
    }
    if (compare(key, (const char *)array + left * size) < 0) {
    return NULL;
    } else if (compare(key, (const char *)array + right * size) >= 0)
    {
    return (unsigned char *)array + right * size;
    } else {
    return (unsigned char *)array + left * size;
    }
    }
    return NULL;
    }
     
    pozz, Dec 21, 2011
    #2
    1. Advertising

  3. pozz <> writes:

    > On 21 Dic, 01:14, Ben Bacarisse <> wrote:
    >> pozz <> writes:
    >> > I want to write the function binsearch_low() with the same interface
    >> > of bsearch() standard function:

    >>
    >> > void * binsearch_low (const void *key, const void *array, size_t
    >> > count, size_t size, comparison_fn_t compare);

    >>
    >> > This function should return a pointer to the element of the array that
    >> > is the greatest among the element with a lower value than key.  If no
    >> > element lower than key exists, the function should return NULL.

    >>
    >> > For example, if I have the array { 10, 12, 17, 20, 25, 30 }, the
    >> > function should return NULL for key=8, 10 for key=11, 12 for key=12,
    >> > 20 for key=22, 30 for key=33.

    >>
    >> > I started from a bsearch() implementation...

    >>
    >> Some of the problems in your come from using this as the starting
    >> point.  It's not a particularly good model.

    >
    > Could you suggest some other implementation?


    Sorry, I don't know anything off hand.

    <snip>
    >> > ...and arrived to the following algorithm:

    >>
    >> > void *
    >> > binsearch_low (const void *key, const void *array, size_t count,
    >> > size_t size, comparison_fn_t compare)
    >> > {
    >> >   int left = 0, right = count - 1;

    >>
    >> int may be too small to hold the values you need.  size_t will almost
    >> always have a much wider range than int.

    >
    > Good suggestion.
    >
    >> Worse, however, is the fact
    >> that when count is zero, the initialisation will go wrong.

    >
    > count is unsigned and the operation "count - 1" is performed as
    > unsigned.
    > If count == 0, count - 1 is UINT_MAX that is not representable in a
    > int variable. Is this the problem?


    That's half of it. If it's not representable you get whatever the
    implementation decides to do (which can include raising an exception)
    but if it *is* representable as int, you get a value that will probably
    cause all sorts of grief later on.

    <snip>
    >> >     const int mid = (left + right) / 2;

    >>
    >> To avoid overflow, divide both and then add.

    >
    > Do you suggest to write
    >
    > const int mid = left / 2 + right / 2;
    >
    > ? I think it doesn't always work. If left is 5 and right is 7, mid
    > should be 6, but 5/2 + 7/2 = 5.


    I think it can be made to work for binary search, but you are right that
    in your case it does not because you want to set low == mid when you
    advance low. There are probably lots of ways round this, but the
    crudest is to mirror your original expression. Maybe

    (left + right&1)/2 + right/2

    >> >     void *element = (unsigned char *)array + mid * size;
    >> >     if (compare(key, element) > 0) left = mid;
    >> >     else if (compare(key, element) < 0) right = mid - 1;
    >> >     else return (unsigned char *)array + mid * size;

    <snip>
    >> >   }
    >> >   if (compare(key, (unsigned char *)array + left * size) < 0) {
    >> >     return NULL;
    >> >   } else if (compare(key, (unsigned char *)array + right * size) >= 0) {
    >> >     return (unsigned char *)array + right * size;
    >> >   } else {
    >> >     return (unsigned char *)array + left * size;
    >> >   }
    >> > }

    >>
    >> It would be better if the casts that are not on return statements were
    >> to cont char * with the corresponding change to element's declaration.
    >> Since the function does not return a pointer to const void (mirroring
    >> bsearch) you have to break const correctness on the returns.

    >
    > Why should I cast array to (const char *)?


    I like to be told when I "cast away const" so preserving it when
    possible reduces the number of warnings. Also, it is some very simple
    sense the right thing to do!

    > This is my last implementation:
    >
    > void *
    > binsearch_low (const void *key, const void *array, size_t count,
    > size_t size, comparison_fn_t compare)
    > {
    > if (count) {
    > size_t left = 0;
    > size_t right = count - 1;
    > while (right - left > 1) {


    This still goes wrong when count == 0. I would keep 'right' one bigger
    than you do and adjust the rest of the code accordingly.

    > const int mid = (left + right) / 2;
    > const void *element = (unsigned char *)array + mid * size;
    > if (compare(key, element) > 0) left = mid;
    > else if (compare(key, element) < 0) right = mid - 1;
    > else return (void *)element;
    > }
    > if (compare(key, (const char *)array + left * size) < 0) {
    > return NULL;
    > } else if (compare(key, (const char *)array + right * size) >= 0)
    > {
    > return (unsigned char *)array + right * size;
    > } else {
    > return (unsigned char *)array + left * size;
    > }
    > }
    > return NULL;
    > }


    --
    Ben.
     
    Ben Bacarisse, Dec 21, 2011
    #3
  4. Ben Bacarisse

    pozz Guest

    Re: A binary search algorithm that searches for an element similar tothe key

    On 21 Dic, 22:04, Ben Bacarisse <> wrote:
    > pozz <> writes:

    [...]
    > > void *
    > > binsearch_low (const void *key, const void *array, size_t count,
    > >   size_t size, comparison_fn_t compare)
    > > {
    > >   if (count) {
    > >     size_t left = 0;
    > >     size_t right = count - 1;
    > >     while (right - left > 1) {

    >
    > This still goes wrong when count == 0.


    If count == 0, the execution skip after the if() statement and goes
    directly to

    return NULL;

    I think it works as I wrote.
     
    pozz, Dec 22, 2011
    #4
  5. Ben Bacarisse

    pozz Guest

    Re: A binary search algorithm that searches for an element similar tothe key

    On 21 Dic, 22:04, Ben Bacarisse <> wrote:
    [...]
    > >> >     const int mid = (left + right) / 2;

    >
    > >> To avoid overflow, divide both and then add.


    I tried with this...

    size_t left, right, mid;
    if (sizeof(size_t) == 4) {
    right = left = 0xFFFFFFFFU;
    mid = (left + right) / 2;
    if (mid == 0x7FFFFFFFU) printf("4: OK\n");
    } else if (sizeof(size_t) == 2) {
    right = left = 0xFFFFU;
    mid = (left + right) / 2;
    if (mid == 0x7FFFU) printf("2: OK\n");
    }

    ....and it prints "4: OK", so it seems there wasn't overflow. Isn't it
    guaranteed by the standard?
     
    pozz, Dec 22, 2011
    #5
  6. Ben Bacarisse

    Seebs Guest

    Re: A binary search algorithm that searches for an element similarto the key

    On 2011-12-22, pozz <> wrote:
    > I tried with this...


    > size_t left, right, mid;
    > if (sizeof(size_t) == 4) {


    Right here you are being silly. Don't do this.

    I don't know why you keep doing this. Rather than writing something
    which is correct on any system, you seem to go out of your way to find
    new ways to test for specific systems when they don't matter.

    > right = left = 0xFFFFFFFFU;
    > mid = (left + right) / 2;
    > if (mid == 0x7FFFFFFFU) printf("4: OK\n");
    > } else if (sizeof(size_t) == 2) {
    > right = left = 0xFFFFU;
    > mid = (left + right) / 2;
    > if (mid == 0x7FFFU) printf("2: OK\n");
    > }


    > ...and it prints "4: OK", so it seems there wasn't overflow. Isn't it
    > guaranteed by the standard?


    This test doesn't even come close to proving what you think it proves. Here's
    the thing. Let's ignore all the gratuitous portability problems and imagine
    that size_t is 32 bits and has a max of 0xFFFFFFFF. Now, if you do the math,
    you'll note:
    0xFFFFFFFF + 0xFFFFFFFF = 0xFFFFFFFE. Divide that by two, and
    you get 0x7FFFFFFF. Okay, but...

    Is that between them?

    The goal here is to get a value which is the average of left and right.
    Let's make it a bit easier to see.

    Let's use:
    left = 0xF0000001;
    right = 0xF0000003;

    The number we want, the average between them, is clearly 0xF0000002. But
    what do we get? Well, we probably get (0xE0000004 / 2), which comes out as
    0x70000002.

    You just carefully wrote a program which proves that there is in fact
    overflow, and you're not handling it.

    Consider, as an alternative:

    mid = ((right - left) / 2) + left;

    for size_t, we know that left and right are both always positive, and the
    algorithm guarantees that right is never smaller than left. This will
    always give you a value between left and right (or, if they're exactly
    one apart, equal to left), and will never overflow, and doesn't depend
    on guesswork as to the range of size_t. (And there are real systems with
    size_t being 8 bytes, which is yet another flaw.)

    General Rule: If you are writing logic which uses sizeof() to guess range,
    you are almost certainly doing it wrong.

    Also, stop and sanity-check your answers. How on earth do you conclude that
    7FFFFFFF is the average of two numbers larger than itself?

    -s
    --
    Copyright 2011, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    I am not speaking for my employer, although they do rent some of my opinions.
     
    Seebs, Dec 22, 2011
    #6
  7. pozz <> writes:

    > On 21 Dic, 22:04, Ben Bacarisse <> wrote:
    >> pozz <> writes:

    > [...]
    >> > void *
    >> > binsearch_low (const void *key, const void *array, size_t count,
    >> >   size_t size, comparison_fn_t compare)
    >> > {
    >> >   if (count) {
    >> >     size_t left = 0;
    >> >     size_t right = count - 1;
    >> >     while (right - left > 1) {

    >>
    >> This still goes wrong when count == 0.

    >
    > If count == 0, the execution skip after the if() statement and goes
    > directly to
    >
    > return NULL;
    >
    > I think it works as I wrote.


    If count is 0, right == SIZE_MAX so right - left is greater than 1 and
    into the loop we go. There we will try to access array[SIZE_MAX/2].

    --
    Ben.
     
    Ben Bacarisse, Dec 22, 2011
    #7
  8. Ben Bacarisse

    pozz Guest

    Re: A binary search algorithm that searches for an element similar tothe key

    On 22 Dic, 13:02, Ben Bacarisse <> wrote:
    > pozz <> writes:
    > > On 21 Dic, 22:04, Ben Bacarisse <> wrote:
    > >> pozz <> writes:

    > > [...]
    > >> > void *
    > >> > binsearch_low (const void *key, const void *array, size_t count,
    > >> >   size_t size, comparison_fn_t compare)
    > >> > {
    > >> >   if (count) {
    > >> >     size_t left = 0;
    > >> >     size_t right = count - 1;
    > >> >     while (right - left > 1) {

    >
    > >> This still goes wrong when count == 0.

    >
    > > If count == 0, the execution skip after the if() statement and goes
    > > directly to

    >
    > >   return NULL;

    >
    > > I think it works as I wrote.

    >
    > If count is 0, right == SIZE_MAX so right - left is greater than 1 and
    > into the loop we go.  There we will try to access array[SIZE_MAX/2].


    If count is 0 I go directly to return NULL, skipping if statements.
     
    pozz, Dec 22, 2011
    #8
  9. pozz <> writes:

    > On 22 Dic, 13:02, Ben Bacarisse <> wrote:
    >> pozz <> writes:
    >> > On 21 Dic, 22:04, Ben Bacarisse <> wrote:
    >> >> pozz <> writes:
    >> > [...]
    >> >> > void *
    >> >> > binsearch_low (const void *key, const void *array, size_t count,
    >> >> >   size_t size, comparison_fn_t compare)
    >> >> > {
    >> >> >   if (count) {
    >> >> >     size_t left = 0;
    >> >> >     size_t right = count - 1;
    >> >> >     while (right - left > 1) {

    >>
    >> >> This still goes wrong when count == 0.

    >>
    >> > If count == 0, the execution skip after the if() statement and goes
    >> > directly to

    >>
    >> >   return NULL;

    >>
    >> > I think it works as I wrote.

    >>
    >> If count is 0, right == SIZE_MAX so right - left is greater than 1 and
    >> into the loop we go.  There we will try to access array[SIZE_MAX/2].

    >
    > If count is 0 I go directly to return NULL, skipping if statements.


    Yes, of course. I missed the if. I missed it, in part because it seems
    like an odd thing to do, but it's certainly correct.

    --
    Ben.
     
    Ben Bacarisse, Dec 22, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ahmed Moustafa
    Replies:
    0
    Views:
    779
    Ahmed Moustafa
    Nov 15, 2003
  2. Bogdan

    Binary tree search vs Binary search

    Bogdan, Oct 18, 2010, in forum: C Programming
    Replies:
    22
    Views:
    3,077
    Michael Angelo Ravera
    Oct 21, 2010
  3. Bogdan
    Replies:
    4
    Views:
    2,039
    Alf P. Steinbach /Usenet
    Oct 19, 2010
  4. M P
    Replies:
    1
    Views:
    475
  5. 88888 Dihedral
    Replies:
    35
    Views:
    1,127
    88888 Dihedral
    Jan 4, 2012
Loading...

Share This Page