Paralle sorting algorithms...

Discussion in 'C++' started by aminer, Sep 7, 2012.

  1. aminer

    aminer Guest

    Hello,

    Look down the the following link , it's about parallel partition:

    http://www.redgenes.com/Lecture-Sorting.pdf


    I have tried to simulate this parallel partition method ,
    but i don't think it will scale cause we have to do a merging,
    which essentially is an array-copy operation but this array-copy
    operations will be expensive compared to an integer compare
    operation that you find inside the partition fuinction, and it will still
    be expensive compared to a string compare operation that you find
    inside the partition function. So since it's not scaling i have abondoned
    the idea to implement this parallel partition method in my parallel
    quicksort.

    I have also just read the following paper about Parallel Merging:

    http://www.economyinformatics.ase.ro/content/EN4/alecu.pdf

    And i have implemented this algorithm just to see what is its performance.
    and i have noticed that the serial algorithm is 8 times slower than the
    merge
    function that you find in the serial mergesort algorithm.So 8 times slower,
    it's too slow.

    So the only way to do a somewhat better parallel sorting algorithm,
    it's to use the following algorithm;

    http://www.drdobbs.com/parallel/parallel-merge/229204454?queryText=parallel+sort


    The idea is simple:

    Let's assume we want to merge sorted arrays X and Y. Select X[m]
    median element in X. Elements in X[ .. m-1] are less than or equal to
    X[m]. Using binary search find index k of the first element in Y greater
    than X[m]. Thus Y[ .. k-1] are less than or equal to X[m] as well.
    Elements in X[m+1..] are greater than or equal to X[m] and Y[k .. ]
    are greater. So merge(X, Y) can be defined as
    concat(merge(X[ .. m-1], Y[ .. k-1]), X[m], merge(X[m+1.. ], Y[k .. ]))
    now we can recursively in parallel do merge(X[ .. m-1], Y[ .. k-1]) and
    merge(X[m+1 .. ], Y[k .. ]) and then concat results.


    There is another parallel method for parallel partition, here it's:

    http://www.cs.sunysb.edu/~rezaul/Spring-2012/CSE613/CSE613-lecture-9.pdf

    but as you will notice it's still too expensive, causeyou have to create
    3 arrays and copy in them:

    3. array B[ 0: n ? 1 ], lt[ 0: n ? 1 ], gt[ 0: n ? 1 ]

    You can use SIMD instructions on the parallel-prefix-sum function

    8. lt [ 0: n ? 1 ] ¬ Par-Prefix-Sum ( lt[ 0: n ? 1 ], + )
    :
    But the algorithm is still expensive i think on a quad or eight cores or
    even
    16 cores, you have to have much more than 16 cores to be able to benefit
    from this method i think.

    And read this:

    "Parallel hybrid merge algorithm was developed that outperformed
    sequential simple merge as well as STL merge by 0.9-5.8 times overall
    and by over 5 times for larger arrays"

    http://www.drdobbs.com/parallel/parallel-merge/229204454?pgno=3


    This paper as you have noticed has fogot to tell that this method is
    dependant on the distribution of the data

    Read for exemple this:

    "Select X[m] median element in X. Elements in X[ .. m-1] are less than
    or equal to X[m]. Using binary search find index k of the first element in
    Y greater than X[m]."

    So if "median element:" of X is not near or equal to the "median
    element" of Y so this method can have bad parallel performance
    and it may not scale as you think.

    Bucket sort is not a sorting algorithm itself, rather it is a
    procedure for partitioning data to be sorted using some given
    sorting algorithm-a "meta-algorithm" so to speak.

    Bucket sort will be better, when elements are uniformly distributed
    over an interval [a, b] and buckets have not significantly different
    number of elements.

    bucketsort sequential computational complexity using quicksort is
    = p × (n/p) log(n/p) = n log(n/p)

    bucket sort parallel computational complexity using quicksort
    = (n/p) log(n/p)

    Parallel BucketSort gave me also 3x scaling when sorting strings on a
    quad cores, it scales better than my parallel quicksort and it can be
    much faster than my parallel quicksort.

    I have thought about my parallel quicksort , and i have found
    that parallelquicksort is fine but the problem with it is that the
    partition function is not parallelizable , so i have thought about this
    and i have decided to write a parallel BucketSort,and this parallel
    bucketsort can give you much better performance than parallel quicksort
    when sorting 100000 strings or more, just test it yourself and see,
    parallel bucketsort can sort just strings at the moment, and it can use
    quicksort or mergesort, mergesort is faster.

    I have updated parallel bucketsort to version 1.01 , i have
    changed a little bit the interface, now you have to pass
    to the bucketsort method three parameters: the array,
    a TSortCompare function and a TSortCompare1 function.

    Here is a small example in Object Pascal:

    program test;

    uses parallelbucketsort,sysutils,timer;

    type

    TStudent = Class
    public
    mystring:string;
    end;

    var tab:Ttabpointer;
    myobj:TParallelSort;
    student:TStudent;
    i,J:integer;
    a:integer;


    function comp1(Item1:pointer): string;
    begin
    result:=TStudent(Item1).mystring ;
    end;
    ?
    function comp(Item1, Item2: Pointer): integer;
    begin
    if TStudent(Item1).mystring > TStudent(Item2).mystring
    then
    begin
    result:=1;
    exit;
    end;
    if TStudent(Item1).mystring < TStudent(Item2).mystring
    then
    begin
    result:=-1;
    exit;
    end;
    ?
    if TStudent(Item1).mystring = TStudent(Item2).mystring
    then
    begin
    result:=0;
    exit;
    end;
    end;
    ?
    begin

    myobj:=TParallelSort.create(1,ctQuicksort); // set to the number of cores...

    setlength(tab,1000000);
    ?
    for i:=low(tab) to high(tab)
    do
    begin
    student:=TStudent.create;
    student.mystring:= inttostr(i);
    tab[high(tab)-i]:= student;
    end;
    ?
    HPT.Timestart;

    myobj.bucketsort(tab,comp,comp1);
    //myobj.qsort(tab,comp);
    writeln('Time in microseconds: ',hpt.TimePeriod);
    ?
    writeln;
    writeln('Please press a key to continu...');
    readln;

    for i := LOW(tab) to HIGH(Tab)-1
    do
    begin
    if tstudent(tab).mystring > tstudent(tab[i+1]).mystring
    then
    begin
    writeln('sort has failed...');
    halt;
    end;
    end;

    for i := LOW(tab) to HIGH(Tab)
    do
    begin
    writeln(TStudent(tab).mystring,' ');
    end;

    for i := 0 to HIGH(Tab) do freeandnil(TStudent(tab));

    setlength(tab,0);
    myobj.free;

    end.


    You can download parallel bucketsort from:

    http://pages.videotron.com/aminer/




    Amine Moulay Ramdane.
     
    aminer, Sep 7, 2012
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. York
    Replies:
    5
    Views:
    1,069
    Micah Cowan
    Oct 25, 2003
  2. Ark

    STL sorting algorithms

    Ark, Feb 21, 2005, in forum: C++
    Replies:
    1
    Views:
    2,071
    Howard Hinnant
    Feb 21, 2005
  3. Replies:
    1
    Views:
    311
    ZZyZX
    Jul 1, 2006
  4. David
    Replies:
    3
    Views:
    1,182
    Patrice
    Dec 1, 2009
  5. aminer
    Replies:
    0
    Views:
    362
    aminer
    Sep 7, 2012
Loading...

Share This Page