Where's the bug?

J

J. Campbell

As a programming exercise, I was playing with the quicksort algorithm.
The commented line in the "partition" function speeds things up a
bit. However, when used on (my machine) arrays larger than about
5,000,000 elements, the sort appears to freeze. I don't understand
this behavior. With this line commented as shown, the sort works fine
up to available memory (circa 50,000,000 elements, (200MB) on my 256MB
machine). I'm guessing the error has to do with "val" taking a
minimum or maximum value, but I can't fugure it out. If this Q is
off-topic to c++ and more appropriate on an "algorithm-oriented"
newsgroup, I apologize in advance.

Thanks,

Joe

#include <iostream>
const int kSMALL_ENOUGH = 16; //when to call selection-sort

using namespace std;

void fillRND(unsigned int* a, int size);
void showit(unsigned int* a, int size);
void quicksort(unsigned int* a, int left, int right);
int partition(unsigned int* a, int left, int right);
void selectionsort(unsigned int* a, int left, int right);
int verify(unsigned int* a, int size);

int main(){
int clo;
cout << "Enter the number of elements to be sorted. ";
int sz;
cin >> sz;
unsigned int* a = new unsigned int[sz];
cout << "generating data...";
fillRND(a, sz);
cout << "done\nchecking data...";
verify(a, sz);
//showit(a, sz);

clo=clock();
cout << "sorting data...";
quicksort(a, 0, sz-1);
cout << "done in " << clock() - clo << " (ms)\nchecking data...";
verify(a, sz);
//showit(a, sz);

delete [] a;
system("pause");
return 0;
}

void quicksort(unsigned int* a, int left, int right){
if(left + kSMALL_ENOUGH < right) {
int split_pt = partition(a,left, right);
quicksort(a, left, split_pt);
quicksort(a, split_pt+1, right);
}else selectionsort(a, left, right);
}

int partition(unsigned int* a, int left, int right){
unsigned int val = a
///2 + a
/2
;//<---note end of line here!
int lm = left-1;
int rm = right+1;
for(;;){
while(a[--rm] > val);
while(a[++lm] < val);
if(lm < rm){
unsigned int temp = a[rm];
a[rm] = a[lm];
a[lm] = temp;
}else return rm;
}
}

void selectionsort(unsigned int* data, int left, int right){
for(int i = left; i < right; ++i) {
int min = i;
for(int j=i+1; j <= right; ++j)
if(data[j] < data[min]) min = j;
unsigned int temp = data[min];
data[min] = data;
data = temp;
}
}

int verify(unsigned int* a, int size){
for(int i = 0; i<size-1; ++i){
if(a > a[i+1]){
cout << "numbers not sorted\n";
return 1;
}
}
cout << "numbers are sorted\n";
return 0;
}

void fillRND(unsigned int* a, int size){
srand(time(0));
int bits = (CHAR_BIT * sizeof(int))/2;
for(int i = 0; i < size; ++i)
a = rand() + (rand() << bits); //to fill more than 15 bits...
}

void showit(unsigned int* a, int size){
for(int i = 0; i < size; ++i)
cout << i << ' ' << a << endl;
cout << endl;
}
 
K

Karl Heinz Buchegger

int partition(unsigned int* a, int left, int right){
unsigned int val = a
///2 + a
/2
;//<---note end of line here!
int lm = left-1;
int rm = right+1;
for(;;){
while(a[--rm] > val);
while(a[++lm] < val);
if(lm < rm){
unsigned int temp = a[rm];
a[rm] = a[lm];
a[lm] = temp;
}else return rm;
}
}​


consider your array to sort consists of just 2 numbers

5 5

what will be the value of val?

a
/ 2 -> 5 / 2 -> 2
a
/ 2 -> 5 / 2 -> 2

val therefore will get a value of 4

now look at the while loops
while( a[--rm] > val);
that loop will sure run out of bounds
while( a[++lm] < val);
again: out of bounds

Out of bounds may mean in this case: outside the
range left-right. The partition function then swaps
some array elements it shouldn't, which will influence
the whole algorithm.

The problem is, that with the calculation you do, you may end up
with a number for val which is not even in the boundaries of the
numbers you try to sort (Sorry: I am not a native english speaker
so the above may sound confusing, but I don't know how to express
it better).

One technique I learned for comming up with a value for that trashold
value is something like this:
take a
, a
, a[(left+right)/2] and discard the smallest
and highest value (sort them and take the second in sorting order)​
 
V

Victor Bazarov

J. Campbell said:
As a programming exercise, I was playing with the quicksort algorithm.
The commented line in the "partition" function speeds things up a
bit. However, when used on (my machine) arrays larger than about
5,000,000 elements, the sort appears to freeze. I don't understand
this behavior. With this line commented as shown, the sort works fine
up to available memory (circa 50,000,000 elements, (200MB) on my 256MB
machine). I'm guessing the error has to do with "val" taking a
minimum or maximum value, but I can't fugure it out. If this Q is
off-topic to c++ and more appropriate on an "algorithm-oriented"
newsgroup, I apologize in advance.

It's not related to language, nor it is related to algorithms, I am
afraid. It's completely compiler- and system-specific. The secret
is that your program is likely causing stack overflow or some such.

Quicksort is recursive. The depth of its recursion depends on data
(quantity and contents) and difficult to predict. Since it uses
[stack] memory for recursion, you can expect it to behave differently
if you allocate more for stack, but that's beyond the scope of the
language. See your compiler options.

Victor
 
J

Joe C

Karl Heinz Buchegger said:
int partition(unsigned int* a, int left, int right){
unsigned int val = a
///2 + a
/2
;//<---note end of line here!
int lm = left-1;
int rm = right+1;
for(;;){
while(a[--rm] > val);
while(a[++lm] < val);
if(lm < rm){
unsigned int temp = a[rm];
a[rm] = a[lm];
a[lm] = temp;
}else return rm;
}
}​


consider your array to sort consists of just 2 numbers​


Karl, Thanks for the reply. I need to consider your suggestions more
closely. However, please note that in my implementation, the array never
has fewer than 16 elements (else the selection sort is used). I need to
examine if it has to do with overrunning the array, as you suggest, or with
overrunning the memory, as suggested by victor. (Or another problem
altogether). Thanks again.

5 5

what will be the value of val?

a
/ 2 -> 5 / 2 -> 2
a
/ 2 -> 5 / 2 -> 2

val therefore will get a value of 4

now look at the while loops
while( a[--rm] > val);
that loop will sure run out of bounds
while( a[++lm] < val);
again: out of bounds

Out of bounds may mean in this case: outside the
range left-right. The partition function then swaps
some array elements it shouldn't, which will influence
the whole algorithm.

The problem is, that with the calculation you do, you may end up
with a number for val which is not even in the boundaries of the
numbers you try to sort (Sorry: I am not a native english speaker
so the above may sound confusing, but I don't know how to express
it better).

One technique I learned for comming up with a value for that trashold
value is something like this:
take a
, a
, a[(left+right)/2] and discard the smallest
and highest value (sort them and take the second in sorting order)
 
J

Joe C

Karl Heinz Buchegger said:
consider your array to sort consists of just 2 numbers

5 5

what will be the value of val?

a
/ 2 -> 5 / 2 -> 2
a
/ 2 -> 5 / 2 -> 2

val therefore will get a value of 4

now look at the while loops
while( a[--rm] > val);
that loop will sure run out of bounds
while( a[++lm] < val);
again: out of bounds

Out of bounds may mean in this case: outside the
range left-right. The partition function then swaps
some array elements it shouldn't, which will influence
the whole algorithm.

The problem is, that with the calculation you do, you may end up
with a number for val which is not even in the boundaries of the
numbers you try to sort (Sorry: I am not a native english speaker
so the above may sound confusing, but I don't know how to express
it better).

One technique I learned for comming up with a value for that trashold
value is something like this:
take a
, a
, a[(left+right)/2] and discard the smallest
and highest value (sort them and take the second in sorting order)


Karl...I understand your point, and I believe you are correct...thanks for
the help. Joe​
 
J

Joe C

Karl Heinz Buchegger said:
consider your array to sort consists of just 2 numbers

5 5

what will be the value of val?

a
/ 2 -> 5 / 2 -> 2
a
/ 2 -> 5 / 2 -> 2

val therefore will get a value of 4

now look at the while loops
while( a[--rm] > val);
that loop will sure run out of bounds
while( a[++lm] < val);
again: out of bounds

Out of bounds may mean in this case: outside the
range left-right. The partition function then swaps
some array elements it shouldn't, which will influence
the whole algorithm.

The problem is, that with the calculation you do, you may end up
with a number for val which is not even in the boundaries of the
numbers you try to sort (Sorry: I am not a native english speaker
so the above may sound confusing, but I don't know how to express
it better).

One technique I learned for comming up with a value for that trashold
value is something like this:
take a
, a
, a[(left+right)/2] and discard the smallest
and highest value (sort them and take the second in sorting order)


Thanks Karl...this modification fixes the bug....and maintains the speed
boost ;-) good job to us both!!!!

#include <iostream>
const int kSMALL_ENOUGH = 16; //when to call selection-sort

using namespace std;

void fillRND(unsigned int* a, int size);
void showit(unsigned int* a, int size);
void quicksort(unsigned int* a, int left, int right);
int partition(unsigned int* a, int left, int right);
void selectionsort(unsigned int* a, int left, int right);
int verify(unsigned int* a, int size);
//unsigned int triplet(unsigned int* a, int left, int right);

int main(){
int clo;
cout << "Enter the number of elements to be sorted. ";
int sz;
cin >> sz;
unsigned int* a = new unsigned int[sz];
cout << "generating data...";
fillRND(a, sz);
cout << "done\nchecking data...";
verify(a, sz);
//showit(a, sz);

clo=clock();
cout << "sorting data...";
quicksort(a, 0, sz-1);
cout << "done in " << clock() - clo << " (ms)\nchecking data...";
verify(a, sz);
//showit(a, sz);

delete [] a;
system("pause");
return 0;
}

void quicksort(unsigned int* a, int left, int right){
if(left + kSMALL_ENOUGH < right) {
int split_pt = partition(a,left, right);
quicksort(a, left, split_pt);
quicksort(a, split_pt+1, right);
}else selectionsort(a, left, right);
}

int partition(unsigned int* a, int left, int right){
unsigned int val = a
/2 + a
/2;// = triplet(unsigned int* a,
int left, int right);
unsigned int x = a
;
unsigned int z = a
;

if(x < val)
if(val < z); //val = y; //x < y < z
else if(x < z) val = z; //x < z < y
else val = x; //z < x < y
else if(z < val); //val = y; //z < y < x
else if(z < x) val = z; //y < z < x
else val = x; //y < x < z

int lm = left-1;
int rm = right+1;
for(;;){
while(a[--rm] > val);
while(a[++lm] < val);
if(lm < rm){
unsigned int temp = a[rm];
a[rm] = a[lm];
a[lm] = temp;
}else return rm;
}
}

void selectionsort(unsigned int* data, int left, int right){
for(int i = left; i < right; ++i) {
int min = i;
for(int j=i+1; j <= right; ++j)
if(data[j] < data[min]) min = j;
unsigned int temp = data[min];
data[min] = data;
data = temp;
}
}

int verify(unsigned int* a, int size){
for(int i = 0; i<size-1; ++i){
if(a > a[i+1]){
cout << "numbers not sorted\n";
return 1;
}
}
cout << "numbers are sorted\n";
return 0;
}

void fillRND(unsigned int* a, int size){
srand(time(0));
int bits = (CHAR_BIT * sizeof(int))/2;
for(int i = 0; i < size; ++i)
a = rand() + (rand() << bits); //to fill more than 15 bits...
}

void showit(unsigned int* a, int size){
for(int i = 0; i < size; ++i)
cout << i << ' ' << a << endl;
cout << endl;
}
 
K

Karl Heinz Buchegger

Joe said:
Karl, Thanks for the reply. I need to consider your suggestions more
closely. However, please note that in my implementation, the array never
has fewer than 16 elements (else the selection sort is used).

Well. With such a large array of ints, it's quite possible that you
have a sequence of 16 identical numbers which qualify for that problem.
I need to
examine if it has to do with overrunning the array, as you suggest, or with
overrunning the memory, as suggested by victor. (Or another problem
altogether). Thanks again.

You have an 'endless loop' in partition(). That's OK, since there
is a return in it, thus it isn't realy endless. But I would start
with setting a high bound in this for loop and see if this bound is
ever reached:

int lm = left - 1;
int rm = right + 1;

for( int tmp = 0; tmp < 500000; ++tmp ) {
while( a[--rm] > val );
while( a[++lm] < val );

...
}

/* should never get here. */
assert( 0 );

I would also insert some assertion into the while loop

while( a[--rm] > val )
assert( rm > left );

while( a[++rm] < val )
assert( lm < right );


If things are the way I think they are (which is hard to diagnose
with a dataset of 50,000,000 items) one of the assertions will fire.
You then have a foot in the door and your debugger will tell you more.
 
K

Karl Heinz Buchegger

Joe said:
Thanks Karl...this modification fixes the bug....and maintains the speed
boost ;-) good job to us both!!!!

Glad to hear that.
It was some shot in the dark. When seeing the partition function
and recognizing what val is used for I thought by myself: That's
strange, I have never seen it done that way (and I haven't seen
a quicksort since years).
int partition(unsigned int* a, int left, int right){
unsigned int val = a
/2 + a
/2;// = triplet(unsigned int* a,
int left, int right);
unsigned int x = a
;
unsigned int z = a
;

if(x < val)
if(val < z); //val = y; //x < y < z
else if(x < z) val = z; //x < z < y
else val = x; //z < x < y
else if(z < val); //val = y; //z < y < x
else if(z < x) val = z; //y < z < x
else val = x; //y < x < z​


Yep. That looks like the thing I can remember from
my algorithms class (nearly 20 years ago). The idea
is: Ideally you want the median of all the numbers. But
the median is expensive to compute. On the other hand you
definitly want to avoid to take the lowest or the highest
value, since no partitioning would take place then. So what
to do: take 3 samples of the numbers (it doesn't matter which
ones) and choose the middle. That's good enough in most cases.​
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,132
Latest member
TeresaWcq1
Top