Re: A binary search algorithm that searches for an element similar to the key

Discussion in 'C Programming' started by Ben Bacarisse, Dec 21, 2011.

1. Ben BacarisseGuest

pozz <> writes:

> I want to write the function binsearch_low() with the same interface
> of bsearch() standard function:
>
> void * binsearch_low (const void *key, const void *array, size_t
> count, size_t size, comparison_fn_t compare);
>
> This function should return a pointer to the element of the array that
> is the greatest among the element with a lower value than key. If no
> element lower than key exists, the function should return NULL.
>
> For example, if I have the array { 10, 12, 17, 20, 25, 30 }, the
> function should return NULL for key=8, 10 for key=11, 12 for key=12,
> 20 for key=22, 30 for key=33.
>
> I started from a bsearch() implementation...

Some of the problems in your come from using this as the starting
point. It's not a particularly good model.

> void *
> binsearch_low (const void *key, const void *array, size_t count,
> size_t size, comparison_fn_t compare)
> {
> int left = 0, right = count - 1;
> while (left <= right) {
> const int mid = (left + right) / 2;
> void *element = (unsigned char *)array + mid * size;
> if (compare(key, element) > 0) left = mid + 1;
> else if (compare(key, element) < 0) right = mid - 1;
> else return (unsigned char *)array + mid * size;
> }
> return NULL;
> }
>
> ...and arrived to the following algorithm:
>
> void *
> binsearch_low (const void *key, const void *array, size_t count,
> size_t size, comparison_fn_t compare)
> {
> int left = 0, right = count - 1;

int may be too small to hold the values you need. size_t will almost
always have a much wider range than int. Worse, however, is the fact
that when count is zero, the initialisation will go wrong.

> while (left < right - 1) {

If you change to using size_t, watch this - 1.

> const int mid = (left + right) / 2;

To avoid overflow, divide both and then add.

> void *element = (unsigned char *)array + mid * size;
> if (compare(key, element) > 0) left = mid;
> else if (compare(key, element) < 0) right = mid - 1;
> else return (unsigned char *)array + mid * size;

This does not match your specification (and you could just have said
"return element;").

> }
> if (compare(key, (unsigned char *)array + left * size) < 0) {
> return NULL;
> } else if (compare(key, (unsigned char *)array + right * size) >= 0) {
> return (unsigned char *)array + right * size;
> } else {
> return (unsigned char *)array + left * size;
> }
> }

It would be better if the casts that are not on return statements were
to cont char * with the corresponding change to element's declaration.
Since the function does not return a pointer to const void (mirroring
bsearch) you have to break const correctness on the returns.

> Do you have any suggestions and/or improvements?

--
Ben.

Ben Bacarisse, Dec 21, 2011

2. pozzGuest

Re: A binary search algorithm that searches for an element similar tothe key

On 21 Dic, 01:14, Ben Bacarisse <> wrote:
> pozz <> writes:
> > I want to write the function binsearch_low() with the same interface
> > of bsearch() standard function:

>
> > void * binsearch_low (const void *key, const void *array, size_t
> > count, size_t size, comparison_fn_t compare);

>
> > This function should return a pointer to the element of the array that
> > is the greatest among the element with a lower value than key.  If no
> > element lower than key exists, the function should return NULL.

>
> > For example, if I have the array { 10, 12, 17, 20, 25, 30 }, the
> > function should return NULL for key=8, 10 for key=11, 12 for key=12,
> > 20 for key=22, 30 for key=33.

>
> > I started from a bsearch() implementation...

>
> Some of the problems in your come from using this as the starting
> point.  It's not a particularly good model.

Could you suggest some other implementation?

> > void *
> > binsearch_low (const void *key, const void *array, size_t count,
> > size_t size, comparison_fn_t compare)
> > {
> >   int left = 0, right = count - 1;
> >   while (left <= right) {
> >     const int mid = (left + right) / 2;
> >     void *element = (unsigned char *)array + mid * size;
> >     if (compare(key, element) > 0) left = mid + 1;
> >     else if (compare(key, element) < 0) right = mid - 1;
> >     else return (unsigned char *)array + mid * size;
> >   }
> >   return NULL;
> > }

>
> > ...and arrived to the following algorithm:

>
> > void *
> > binsearch_low (const void *key, const void *array, size_t count,
> > size_t size, comparison_fn_t compare)
> > {
> >   int left = 0, right = count - 1;

>
> int may be too small to hold the values you need.  size_t will almost
> always have a much wider range than int.

Good suggestion.

> Worse, however, is the fact
> that when count is zero, the initialisation will go wrong.

count is unsigned and the operation "count - 1" is performed as
unsigned.
If count == 0, count - 1 is UINT_MAX that is not representable in a
int
variable. Is this the problem?

> >   while (left < right - 1) {

>
> If you change to using size_t, watch this - 1.

I changed to:
while (right - left > 1) {

> >     const int mid = (left + right) / 2;

>
> To avoid overflow, divide both and then add.

Do you suggest to write

const int mid = left / 2 + right / 2;

? I think it doesn't always work. If left is 5 and right is 7, mid
should be 6, but 5/2 + 7/2 = 5.

> >     void *element = (unsigned char *)array + mid * size;
> >     if (compare(key, element) > 0) left = mid;
> >     else if (compare(key, element) < 0) right = mid - 1;
> >     else return (unsigned char *)array + mid * size;

>
> This does not match your specification (and you could just have said
> "return element;").

I was wrong writing the specifications. The correct specification is:

"This function should return a pointer to the element of the array
that
is the greatest among the elements with a lower or the same value as
the
key. If no element lower than key exists, the function should return
NULL."

> >   }
> >   if (compare(key, (unsigned char *)array + left * size) < 0) {
> >     return NULL;
> >   } else if (compare(key, (unsigned char *)array + right * size) >=0) {
> >     return (unsigned char *)array + right * size;
> >   } else {
> >     return (unsigned char *)array + left * size;
> >   }
> > }

>
> It would be better if the casts that are not on return statements were
> to cont char * with the corresponding change to element's declaration.
> Since the function does not return a pointer to const void (mirroring
> bsearch) you have to break const correctness on the returns.

Why should I cast array to (const char *)?

This is my last implementation:

void *
binsearch_low (const void *key, const void *array, size_t count,
size_t size, comparison_fn_t compare)
{
if (count) {
size_t left = 0;
size_t right = count - 1;
while (right - left > 1) {
const int mid = (left + right) / 2;
const void *element = (unsigned char *)array + mid * size;
if (compare(key, element) > 0) left = mid;
else if (compare(key, element) < 0) right = mid - 1;
else return (void *)element;
}
if (compare(key, (const char *)array + left * size) < 0) {
return NULL;
} else if (compare(key, (const char *)array + right * size) >= 0)
{
return (unsigned char *)array + right * size;
} else {
return (unsigned char *)array + left * size;
}
}
return NULL;
}

pozz, Dec 21, 2011

3. Ben BacarisseGuest

pozz <> writes:

> On 21 Dic, 01:14, Ben Bacarisse <> wrote:
>> pozz <> writes:
>> > I want to write the function binsearch_low() with the same interface
>> > of bsearch() standard function:

>>
>> > void * binsearch_low (const void *key, const void *array, size_t
>> > count, size_t size, comparison_fn_t compare);

>>
>> > This function should return a pointer to the element of the array that
>> > is the greatest among the element with a lower value than key. Â If no
>> > element lower than key exists, the function should return NULL.

>>
>> > For example, if I have the array { 10, 12, 17, 20, 25, 30 }, the
>> > function should return NULL for key=8, 10 for key=11, 12 for key=12,
>> > 20 for key=22, 30 for key=33.

>>
>> > I started from a bsearch() implementation...

>>
>> Some of the problems in your come from using this as the starting
>> point. Â It's not a particularly good model.

>
> Could you suggest some other implementation?

Sorry, I don't know anything off hand.

<snip>
>> > ...and arrived to the following algorithm:

>>
>> > void *
>> > binsearch_low (const void *key, const void *array, size_t count,
>> > size_t size, comparison_fn_t compare)
>> > {
>> > Â  int left = 0, right = count - 1;

>>
>> int may be too small to hold the values you need. Â size_t will almost
>> always have a much wider range than int.

>
> Good suggestion.
>
>> Worse, however, is the fact
>> that when count is zero, the initialisation will go wrong.

>
> count is unsigned and the operation "count - 1" is performed as
> unsigned.
> If count == 0, count - 1 is UINT_MAX that is not representable in a
> int variable. Is this the problem?

That's half of it. If it's not representable you get whatever the
implementation decides to do (which can include raising an exception)
but if it *is* representable as int, you get a value that will probably
cause all sorts of grief later on.

<snip>
>> > Â  Â  const int mid = (left + right) / 2;

>>
>> To avoid overflow, divide both and then add.

>
> Do you suggest to write
>
> const int mid = left / 2 + right / 2;
>
> ? I think it doesn't always work. If left is 5 and right is 7, mid
> should be 6, but 5/2 + 7/2 = 5.

I think it can be made to work for binary search, but you are right that
in your case it does not because you want to set low == mid when you
advance low. There are probably lots of ways round this, but the
crudest is to mirror your original expression. Maybe

(left + right&1)/2 + right/2

>> > Â  Â  void *element = (unsigned char *)array + mid * size;
>> > Â  Â  if (compare(key, element) > 0) left = mid;
>> > Â  Â  else if (compare(key, element) < 0) right = mid - 1;
>> > Â  Â  else return (unsigned char *)array + mid * size;

<snip>
>> > Â  }
>> > Â  if (compare(key, (unsigned char *)array + left * size) < 0) {
>> > Â  Â  return NULL;
>> > Â  } else if (compare(key, (unsigned char *)array + right * size) >= 0) {
>> > Â  Â  return (unsigned char *)array + right * size;
>> > Â  } else {
>> > Â  Â  return (unsigned char *)array + left * size;
>> > Â  }
>> > }

>>
>> It would be better if the casts that are not on return statements were
>> to cont char * with the corresponding change to element's declaration.
>> Since the function does not return a pointer to const void (mirroring
>> bsearch) you have to break const correctness on the returns.

>
> Why should I cast array to (const char *)?

I like to be told when I "cast away const" so preserving it when
possible reduces the number of warnings. Also, it is some very simple
sense the right thing to do!

> This is my last implementation:
>
> void *
> binsearch_low (const void *key, const void *array, size_t count,
> size_t size, comparison_fn_t compare)
> {
> if (count) {
> size_t left = 0;
> size_t right = count - 1;
> while (right - left > 1) {

This still goes wrong when count == 0. I would keep 'right' one bigger
than you do and adjust the rest of the code accordingly.

> const int mid = (left + right) / 2;
> const void *element = (unsigned char *)array + mid * size;
> if (compare(key, element) > 0) left = mid;
> else if (compare(key, element) < 0) right = mid - 1;
> else return (void *)element;
> }
> if (compare(key, (const char *)array + left * size) < 0) {
> return NULL;
> } else if (compare(key, (const char *)array + right * size) >= 0)
> {
> return (unsigned char *)array + right * size;
> } else {
> return (unsigned char *)array + left * size;
> }
> }
> return NULL;
> }

--
Ben.

Ben Bacarisse, Dec 21, 2011
4. pozzGuest

Re: A binary search algorithm that searches for an element similar tothe key

On 21 Dic, 22:04, Ben Bacarisse <> wrote:
> pozz <> writes:

[...]
> > void *
> > binsearch_low (const void *key, const void *array, size_t count,
> >   size_t size, comparison_fn_t compare)
> > {
> >   if (count) {
> >     size_t left = 0;
> >     size_t right = count - 1;
> >     while (right - left > 1) {

>
> This still goes wrong when count == 0.

If count == 0, the execution skip after the if() statement and goes
directly to

return NULL;

I think it works as I wrote.

pozz, Dec 22, 2011
5. pozzGuest

Re: A binary search algorithm that searches for an element similar tothe key

On 21 Dic, 22:04, Ben Bacarisse <> wrote:
[...]
> >> >     const int mid = (left + right) / 2;

>
> >> To avoid overflow, divide both and then add.

I tried with this...

size_t left, right, mid;
if (sizeof(size_t) == 4) {
right = left = 0xFFFFFFFFU;
mid = (left + right) / 2;
if (mid == 0x7FFFFFFFU) printf("4: OK\n");
} else if (sizeof(size_t) == 2) {
right = left = 0xFFFFU;
mid = (left + right) / 2;
if (mid == 0x7FFFU) printf("2: OK\n");
}

....and it prints "4: OK", so it seems there wasn't overflow. Isn't it
guaranteed by the standard?

pozz, Dec 22, 2011
6. SeebsGuest

Re: A binary search algorithm that searches for an element similarto the key

On 2011-12-22, pozz <> wrote:
> I tried with this...

> size_t left, right, mid;
> if (sizeof(size_t) == 4) {

Right here you are being silly. Don't do this.

I don't know why you keep doing this. Rather than writing something
which is correct on any system, you seem to go out of your way to find
new ways to test for specific systems when they don't matter.

> right = left = 0xFFFFFFFFU;
> mid = (left + right) / 2;
> if (mid == 0x7FFFFFFFU) printf("4: OK\n");
> } else if (sizeof(size_t) == 2) {
> right = left = 0xFFFFU;
> mid = (left + right) / 2;
> if (mid == 0x7FFFU) printf("2: OK\n");
> }

> ...and it prints "4: OK", so it seems there wasn't overflow. Isn't it
> guaranteed by the standard?

This test doesn't even come close to proving what you think it proves. Here's
the thing. Let's ignore all the gratuitous portability problems and imagine
that size_t is 32 bits and has a max of 0xFFFFFFFF. Now, if you do the math,
you'll note:
0xFFFFFFFF + 0xFFFFFFFF = 0xFFFFFFFE. Divide that by two, and
you get 0x7FFFFFFF. Okay, but...

Is that between them?

The goal here is to get a value which is the average of left and right.
Let's make it a bit easier to see.

Let's use:
left = 0xF0000001;
right = 0xF0000003;

The number we want, the average between them, is clearly 0xF0000002. But
what do we get? Well, we probably get (0xE0000004 / 2), which comes out as
0x70000002.

You just carefully wrote a program which proves that there is in fact
overflow, and you're not handling it.

Consider, as an alternative:

mid = ((right - left) / 2) + left;

for size_t, we know that left and right are both always positive, and the
algorithm guarantees that right is never smaller than left. This will
always give you a value between left and right (or, if they're exactly
one apart, equal to left), and will never overflow, and doesn't depend
on guesswork as to the range of size_t. (And there are real systems with
size_t being 8 bytes, which is yet another flaw.)

General Rule: If you are writing logic which uses sizeof() to guess range,
you are almost certainly doing it wrong.

Also, stop and sanity-check your answers. How on earth do you conclude that
7FFFFFFF is the average of two numbers larger than itself?

-s
--
Copyright 2011, all wrongs reversed. Peter Seebach /
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.

Seebs, Dec 22, 2011
7. Ben BacarisseGuest

pozz <> writes:

> On 21 Dic, 22:04, Ben Bacarisse <> wrote:
>> pozz <> writes:

> [...]
>> > void *
>> > binsearch_low (const void *key, const void *array, size_t count,
>> > Â  size_t size, comparison_fn_t compare)
>> > {
>> > Â  if (count) {
>> > Â  Â  size_t left = 0;
>> > Â  Â  size_t right = count - 1;
>> > Â  Â  while (right - left > 1) {

>>
>> This still goes wrong when count == 0.

>
> If count == 0, the execution skip after the if() statement and goes
> directly to
>
> return NULL;
>
> I think it works as I wrote.

If count is 0, right == SIZE_MAX so right - left is greater than 1 and
into the loop we go. There we will try to access array[SIZE_MAX/2].

--
Ben.

Ben Bacarisse, Dec 22, 2011
8. pozzGuest

Re: A binary search algorithm that searches for an element similar tothe key

On 22 Dic, 13:02, Ben Bacarisse <> wrote:
> pozz <> writes:
> > On 21 Dic, 22:04, Ben Bacarisse <> wrote:
> >> pozz <> writes:

> > [...]
> >> > void *
> >> > binsearch_low (const void *key, const void *array, size_t count,
> >> >   size_t size, comparison_fn_t compare)
> >> > {
> >> >   if (count) {
> >> >     size_t left = 0;
> >> >     size_t right = count - 1;
> >> >     while (right - left > 1) {

>
> >> This still goes wrong when count == 0.

>
> > If count == 0, the execution skip after the if() statement and goes
> > directly to

>
> >   return NULL;

>
> > I think it works as I wrote.

>
> If count is 0, right == SIZE_MAX so right - left is greater than 1 and
> into the loop we go.  There we will try to access array[SIZE_MAX/2].

If count is 0 I go directly to return NULL, skipping if statements.

pozz, Dec 22, 2011
9. Ben BacarisseGuest

pozz <> writes:

> On 22 Dic, 13:02, Ben Bacarisse <> wrote:
>> pozz <> writes:
>> > On 21 Dic, 22:04, Ben Bacarisse <> wrote:
>> >> pozz <> writes:
>> > [...]
>> >> > void *
>> >> > binsearch_low (const void *key, const void *array, size_t count,
>> >> > Â  size_t size, comparison_fn_t compare)
>> >> > {
>> >> > Â  if (count) {
>> >> > Â  Â  size_t left = 0;
>> >> > Â  Â  size_t right = count - 1;
>> >> > Â  Â  while (right - left > 1) {

>>
>> >> This still goes wrong when count == 0.

>>
>> > If count == 0, the execution skip after the if() statement and goes
>> > directly to

>>
>> > Â  return NULL;

>>
>> > I think it works as I wrote.

>>
>> If count is 0, right == SIZE_MAX so right - left is greater than 1 and
>> into the loop we go. Â There we will try to access array[SIZE_MAX/2].

>
> If count is 0 I go directly to return NULL, skipping if statements.

Yes, of course. I missed the if. I missed it, in part because it seems
like an odd thing to do, but it's certainly correct.

--
Ben.

Ben Bacarisse, Dec 22, 2011