Code review request: a better radix sort

N

none

I have written the following radixsort as part of an effort to learn
perl. I'd appreciate any comments, improvements, or suggestions on
making it more perlish.

Thanks,
-Mike

sub radix_sort {
my @nums = map { pack('d', $_); } @_;

for(my $mask = "C"; length($mask) <= 8; $mask = 'x' . $mask ) {
my @bins = ();
foreach(@nums) {
my $radix = unpack $mask, $_;
push @{$bins[$radix]}, $_;
}

@nums = map { @$_ } (grep { defined } @bins);
}


my @sorted;
foreach(@nums) {
my $num = unpack('d', $_);
if( $num < 0 ) {
unshift(@sorted, $num);
} else {
push(@sorted, $num);
}
}
return @sorted;
}
 
U

Uri Guttman

n> sub radix_sort {
n> my @nums = map { pack('d', $_); } @_;

n> for(my $mask = "C"; length($mask) <= 8; $mask = 'x' . $mask ) {

$mask .= 'x'

or

for my $mask_len ( 1 .. 8 ) :

my $mask = 'x' x $mask_len ;

n> my @bins = ();

no need to init that to (). my vars are empty to start
n> foreach(@nums) {

n> my $radix = unpack $mask, $_;
n> push @{$bins[$radix]}, $_;

explain the algorithm you are using. it just seems like you are going
through some complex hoops. also beware of endian ordering as
well. packed floats in native order are machine dependent.

n> }

n> @nums = map { @$_ } (grep { defined } @bins);

that could be done with just map:

@nums = map { defined ? @$_ : () } @bins ;

n> }

uri
 
N

none

Uri said:
for my $mask_len ( 1 .. 8 ) :

my $mask = 'x' x $mask_len ;

hrm...I think you meant:
$mask = 'x' x $mask_len . 'C'?
What's the advantage of doing it this way instead of the for loop?
n> my @bins = ();

no need to init that to (). my vars are empty to start

Gotcha...I didn't realize that Perl would re-init the value to empty
each time through the loop.

n> foreach(@nums) {

n> my $radix = unpack $mask, $_;
n> push @{$bins[$radix]}, $_;

explain the algorithm you are using. it just seems like you are going
through some complex hoops. also beware of endian ordering as
well. packed floats in native order are machine dependent.

I was sorting by bytes, starting from the leftmost/least significant
byte (unpack 'C') up to the rightmost/most significant byte
(unpack 'xxxxxxxC'). I have an array of 256 bins, and each number is
added to the bin corresponding to the value of the current radix.
Afterwards, the array of bins is unrolled into a linear array and I move
on to the next radix.

So, if I had two numbers, (02FF, 01FF), the first iteration of the loop
would be:
bins[255] = [02FF, 01FF]
@nums = (02FF, 01FF)

Then the second iteration would be:
bins[1] = [01FF]
bins[2] = [02FF]
@nums = (01FF, 02FF)

At the end, I have to reverse the negative numbers b/c they end up
sorted at the end in reverse order (since in IEEE FP notation, sign bit
= 1 is negative and sign bit = 0 is positive).

Does that makes sense? Is there a simpler way to do this?

that could be done with just map:

@nums = map { defined ? @$_ : () } @bins ;

Thanks! That is better! Although I think I need to use defined($_).
Otherwise it gives me an error saying "use of defined without
parenthesis is ambiguous."

So, now I have:
sub radix_sort2 {
my @nums = map { pack('d', $_); } @_;

for my $mask_len ( 0 .. 7 ) {
my $mask = 'x' x $mask_len . 'C';
my @bins;

foreach(@nums) {
my $radix = unpack($mask, $_);
push(@{$bins[$radix]}, $_);
}

@nums = map { defined($_) ? @$_ : () } @bins;
}


my @sorted;
foreach(@nums) {
my $num = unpack('d', $_);
if( $num < 0 ) {
unshift(@sorted, $num);
} else {
push(@sorted, $num);
}
}
return @sorted;
}
 
U

Uri Guttman

n> hrm...I think you meant:
n> $mask = 'x' x $mask_len . 'C'?
n> What's the advantage of doing it this way instead of the for loop?

yeah, i missed the 'C'. i just don't like c style loops and i avoid them
when possible. there are probably other ways to do that as well. as i
said before, if i understood the algorithm i could help with the code
more. munging floats at the low level is tricky. i had a fun time with
them in Sort::Maker (especially with -0!).

n> my @bins = ();
n> Gotcha...I didn't realize that Perl would re-init the value to empty
n> each time through the loop.

my has a compile time effect (declaration) and a run time effect (even
when not assigned something).

n> foreach(@nums) {
n> my $radix = unpack $mask, $_;
n> push @{$bins[$radix]}, $_;
n> I was sorting by bytes, starting from the leftmost/least significant
n> byte (unpack 'C') up to the rightmost/most significant byte
n> (unpack 'xxxxxxxC'). I have an array of 256 bins, and each number is
n> added to the bin corresponding to the value of the current
n> radix. Afterwards, the array of bins is unrolled into a linear array
n> and I move on to the next radix.

as i said before, endianess should be accounted for if you want a
portable solution. again, if it is just a learning thing, that can be
skipped.

n> So, if I had two numbers, (02FF, 01FF), the first iteration of the
n> loop would be:
n> bins[255] = [02FF, 01FF]
n> @nums = (02FF, 01FF)

n> Then the second iteration would be:
n> bins[1] = [01FF]
n> bins[2] = [02FF]
n> @nums = (01FF, 02FF)

n> At the end, I have to reverse the negative numbers b/c they end up
n> sorted at the end in reverse order (since in IEEE FP notation, sign
n> bit = 1 is negative and sign bit = 0 is positive).

as i said, i have recent and intimate experience with this area. :) you
could use separate bins for negative numbers and merge them later. but
why do you want to do a radix sort on floats? unless this is just an
exercise for you.

n> Does that makes sense? Is there a simpler way to do this?

i would have to study the algorithm a bit more (which i may not do). i
know radix sorts but there could be other ways to implement it.

n> Thanks! That is better! Although I think I need to use
n> defined($_). Otherwise it gives me an error saying "use of defined
n> without parenthesis is ambiguous."

defined() should be fine then. it uses $_ as its default argument.

n> So, now I have:
n> sub radix_sort2 {
n> my @nums = map { pack('d', $_); } @_;

i would rename @nums to something else as they aren't really numbers
anymore but the byte string that represents the (ieee) float format. it
gets confusing below when you convert back to a real num. maybe a better
name would be $float_bytes?

n> for my $mask_len ( 0 .. 7 ) {
n> my $mask = 'x' x $mask_len . 'C';
n> my @bins;

n> foreach(@nums) {
n> my $radix = unpack($mask, $_);
n> push(@{$bins[$radix]}, $_);
n> }

n> @nums = map { defined($_) ? @$_ : () } @bins;
n> }

ok, i see what the algorithm is. using unpack with a rotated mask is
more work than is needed. since you are just grabbing bytes, you can use
substr and ord to get each byte. something like this (untested):

for my $byte_num ( reverse 0 .. 7 ) {

# check if reverse is needed. it could be used to handle endian issues.

foreach my $num ( @nums ) {
my $radix = substr( $num, $byte_num, 1);
push( @{$bins[$radix]}, $num );
}

a bit shorter and more direct.

n> my @sorted;
n> foreach(@nums) {
n> my $num = unpack('d', $_);
n> if( $num < 0 ) {
n> unshift(@sorted, $num);
n> } else {
n> push(@sorted, $num);
n> }
n> }

gotta be a better way to handle negative numbers. maybe this:

my @real_nums = map unpack('d', $_), @nums ;
my @sorted = grep $_ < 0, @real_nums ;
push @sorted = grep $_ >= 0, @real_nums ;

return @sorted;

but maybe doing separate bins would be the best answer. also i think
choosing the bin might be different for negative numbers so check that
out with some test data.

uri
 
N

none

Uri said:
but
why do you want to do a radix sort on floats? unless this is just an
exercise for you.

It's mainly just for the hell of it. Also, I finished a bunch of other
sorts, and this is the only one I haven't done yet.

defined() should be fine then. it uses $_ as its default argument.

That worked! Thanks!

i would rename @nums to something else as they aren't really numbers
anymore but the byte string that represents the (ieee) float format.

Goot point.

ok, i see what the algorithm is. using unpack with a rotated mask is
more work than is needed. since you are just grabbing bytes, you can use
substr and ord to get each byte. something like this (untested):
my $radix = substr( $num, $byte_num, 1);

ord substr( $num, $byte_num, 1) worked, but I don't understand why. How
is it possible to get the numeric value directly from "pack," which is
the number binary encoded in some obscure format?

but maybe doing separate bins would be the best answer.

Maybe that would make more sense.

Thanks,
-Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top