A

#### aminer

Look down the the following link , it's about parallel partition:

http://www.redgenes.com/Lecture-Sorting.pdf

I have tried to simulate this parallel partition method ,

but i don't think it will scale cause we have to do a merging,

which essentially is an array-copy operation but this array-copy

operations will be expensive compared to an integer compare

operation that you find inside the partition fuinction, and it will still

be expensive compared to a string compare operation that you find

inside the partition function. So since it's not scaling i have abondoned

the idea to implement this parallel partition method in my parallel

quicksort.

I have also just read the following paper about Parallel Merging:

http://www.economyinformatics.ase.ro/content/EN4/alecu.pdf

And i have implemented this algorithm just to see what is its performance.

and i have noticed that the serial algorithm is 8 times slower than the

merge

function that you find in the serial mergesort algorithm.So 8 times slower,

it's too slow.

So the only way to do a somewhat better parallel sorting algorithm,

it's to use the following algorithm;

http://www.drdobbs.com/parallel/parallel-merge/229204454?queryText=parallel+sort

The idea is simple:

Let's assume we want to merge sorted arrays X and Y. Select X[m]

median element in X. Elements in X[ .. m-1] are less than or equal to

X[m]. Using binary search find index k of the first element in Y greater

than X[m]. Thus Y[ .. k-1] are less than or equal to X[m] as well.

Elements in X[m+1..] are greater than or equal to X[m] and Y[k .. ]

are greater. So merge(X, Y) can be defined as

concat(merge(X[ .. m-1], Y[ .. k-1]), X[m], merge(X[m+1.. ], Y[k .. ]))

now we can recursively in parallel do merge(X[ .. m-1], Y[ .. k-1]) and

merge(X[m+1 .. ], Y[k .. ]) and then concat results.

There is another parallel method for parallel partition, here it's:

http://www.cs.sunysb.edu/~rezaul/Spring-2012/CSE613/CSE613-lecture-9.pdf

but as you will notice it's still too expensive, causeyou have to create

3 arrays and copy in them:

3. array B[ 0: n ? 1 ], lt[ 0: n ? 1 ], gt[ 0: n ? 1 ]

You can use SIMD instructions on the parallel-prefix-sum function

8. lt [ 0: n ? 1 ] ¬ Par-Prefix-Sum ( lt[ 0: n ? 1 ], + )

:

But the algorithm is still expensive i think on a quad or eight cores or

even

16 cores, you have to have much more than 16 cores to be able to benefit

from this method i think.

And read this:

"Parallel hybrid merge algorithm was developed that outperformed

sequential simple merge as well as STL merge by 0.9-5.8 times overall

and by over 5 times for larger arrays"

http://www.drdobbs.com/parallel/parallel-merge/229204454?pgno=3

This paper as you have noticed has fogot to tell that this method is

dependant on the distribution of the data

Read for exemple this:

"Select X[m] median element in X. Elements in X[ .. m-1] are less than

or equal to X[m]. Using binary search find index k of the first element in

Y greater than X[m]."

So if "median element:" of X is not near or equal to the "median

element" of Y so this method can have bad parallel performance

and it may not scale as you think.

Bucket sort is not a sorting algorithm itself, rather it is a

procedure for partitioning data to be sorted using some given

sorting algorithm-a "meta-algorithm" so to speak.

Bucket sort will be better, when elements are uniformly distributed

over an interval [a, b] and buckets have not significantly different

number of elements.

bucketsort sequential computational complexity using quicksort is

= p × (n/p) log(n/p) = n log(n/p)

bucket sort parallel computational complexity using quicksort

= (n/p) log(n/p)

Parallel BucketSort gave me also 3x scaling when sorting strings on a

quad cores, it scales better than my parallel quicksort and it can be

much faster than my parallel quicksort.

I have thought about my parallel quicksort , and i have found

that parallelquicksort is fine but the problem with it is that the

partition function is not parallelizable , so i have thought about this

and i have decided to write a parallel BucketSort,and this parallel

bucketsort can give you much better performance than parallel quicksort

when sorting 100000 strings or more, just test it yourself and see,

parallel bucketsort can sort just strings at the moment, and it can use

quicksort or mergesort, mergesort is faster.

I have updated parallel bucketsort to version 1.01 , i have

changed a little bit the interface, now you have to pass

to the bucketsort method three parameters: the array,

a TSortCompare function and a TSortCompare1 function.

Here is a small example in Object Pascal:

program test;

uses parallelbucketsort,sysutils,timer;

type

TStudent = Class

public

mystring:string;

end;

var tab:Ttabpointer;

myobj:TParallelSort;

student:TStudent;

i,J:integer;

a:integer;

function comp1(Item1ointer): string;

begin

result:=TStudent(Item1).mystring ;

end;

?

function comp(Item1, Item2: Pointer): integer;

begin

if TStudent(Item1).mystring > TStudent(Item2).mystring

then

begin

result:=1;

exit;

end;

if TStudent(Item1).mystring < TStudent(Item2).mystring

then

begin

result:=-1;

exit;

end;

?

if TStudent(Item1).mystring = TStudent(Item2).mystring

then

begin

result:=0;

exit;

end;

end;

?

begin

myobj:=TParallelSort.create(1,ctQuicksort); // set to the number of cores...

setlength(tab,1000000);

?

for i:=low(tab) to high(tab)

do

begin

student:=TStudent.create;

student.mystring:= inttostr(i);

tab[high(tab)-i]:= student;

end;

?

HPT.Timestart;

myobj.bucketsort(tab,comp,comp1);

//myobj.qsort(tab,comp);

writeln('Time in microseconds: ',hpt.TimePeriod);

?

writeln;

writeln('Please press a key to continu...');

readln;

for i := LOW(tab) to HIGH(Tab)-1

do

begin

if tstudent(tab

*).mystring > tstudent(tab[i+1]).mystring*

then

begin

writeln('sort has failed...');

halt;

end;

end;

for i := LOW(tab) to HIGH(Tab)

do

begin

writeln(TStudent(tab

then

begin

writeln('sort has failed...');

halt;

end;

end;

for i := LOW(tab) to HIGH(Tab)

do

begin

writeln(TStudent(tab

*).mystring,' ');*

end;

for i := 0 to HIGH(Tab) do freeandnil(TStudent(tabend;

for i := 0 to HIGH(Tab) do freeandnil(TStudent(tab

*));*

setlength(tab,0);

myobj.free;

end.

You can download parallel bucketsort from:

http://pages.videotron.com/aminer/

Amine Moulay Ramdane.setlength(tab,0);

myobj.free;

end.

You can download parallel bucketsort from:

http://pages.videotron.com/aminer/

Amine Moulay Ramdane.