Sorting based on existence of keys

R

Rasmus Villemoes

Hi experts

Say I have an array @ks containing some strings which may or may not
exist as keys in a hash %h. How do I sort @ks such that the
non-existent keys come last (in any order), while the existing keys
are sorted using, say, length($h{$a}) <=> length($h{$a}) (that is not
really the property of the values I'll sort by, but that's not
important).

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { exists $h{$b} cmp exists $h{$a} ||
length $h{$a} <=> length $h{$b}} @ks) { print "$_\n" }


This produces the intended result, but under "use warnings;" one also
gets the expected warnings when $h{two} and $h{four} are compared in
the second line. Is there some smarter way to do this? (Also, I don't
know if cmp is the right tool for comparing boolean values).
 
P

Peter Makholm

Rasmus Villemoes said:
Say I have an array @ks containing some strings which may or may not
exist as keys in a hash %h. How do I sort @ks such that the
non-existent keys come last (in any order), while the existing keys
are sorted using, say, length($h{$a}) <=> length($h{$a}) (that is not
really the property of the values I'll sort by, but that's not
important).

My first idea would be to split the array in the two parts, sort each
of them and then concatenate them:

for $key ( @ks ) {
if (exists %h{$key} ) {
push @existing, $key;
} else {
push @nonexisting, $key;
}
@existing = sort { ... } @existing;

for ( @existing, @nonexisting ) {
...
}

It might be a bit more effective to write an insert function to
replace the first push and by that eliminate the sorting.

It might enhance you job security to replace the loop with

push @{ exists $h{$_} ? \@exists : \@nonexists }, $_
for @ks;

//Makholm
 
J

Jürgen Exner

Rasmus Villemoes said:
Say I have an array @ks containing some strings which may or may not
exist as keys in a hash %h. How do I sort @ks such that the
non-existent keys come last (in any order), while the existing keys
are sorted using, say, length($h{$a}) <=> length($h{$a}) (that is not
really the property of the values I'll sort by, but that's not
important).

There are four different cases you have to consider in your custom
compare function:

$h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
$h{$a} exists but $h{$b} doesn't ===> -1
$h{$a} does't exist but $h{$b} does ===> 1
Neither $h{$a} nor $h{$b} exists ===> 0

jue
 
X

xhoster

Rasmus Villemoes said:
Hi experts

Say I have an array @ks containing some strings which may or may not
exist as keys in a hash %h. How do I sort @ks such that the
non-existent keys come last (in any order), while the existing keys
are sorted using, say, length($h{$a}) <=> length($h{$a}) (that is not
really the property of the values I'll sort by, but that's not
important).

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { exists $h{$b} cmp exists $h{$a} ||
length $h{$a} <=> length $h{$b}} @ks) { print "$_\n" }

This produces the intended result, but under "use warnings;" one also
gets the expected warnings when $h{two} and $h{four} are compared in
the second line.

You could turn off the uninitialized warning for that block.
Is there some smarter way to do this?

If you don't want to turn off the warning, you could detect when both are
nonexistent and short circuit on that. And since we already know their
existence status is equal (otherwise we would have short-circuited on the
previous op), then we really only need to detect if one of them is
nonexistent.

for (sort { exists $h{$b} cmp exists $h{$a} ||
exists $h{$b} &&
(Also, I don't
know if cmp is the right tool for comparing boolean values).

It seems to do what ought to be done, so in my book that makes it the right
tool for the job. I really doubt that future versions of Perl5 is going to
change the representations of true and false in a way that the outcome of
cmp is going to be different than it currently is. You could also use <=>
to get the same outcome, because the special true value is greater than the
special false value both stringly ('' lt '1') and numerically (0 < 1)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
U

Uri Guttman

JE> There are four different cases you have to consider in your custom
JE> compare function:

JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0

this is why doing a prefilter on the sort keys makes life much
simpler. then you only have 1 real case to worry about - whether a key
exists or not. you can replace a non-existant key with a fake one that
always sorts higher than normal keys. this can be done in an ST or in
the key extraction code in Sort::Maker. i leave how to do that as an
exercise but it is much simpler than all the double sided checking you
propose.

uri
 
J

Jim Gibson

Rasmus Villemoes said:
Hi experts

Say I have an array @ks containing some strings which may or may not
exist as keys in a hash %h. How do I sort @ks such that the
non-existent keys come last (in any order), while the existing keys
are sorted using, say, length($h{$a}) <=> length($h{$a}) (that is not
really the property of the values I'll sort by, but that's not
important).

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { exists $h{$b} cmp exists $h{$a} ||
length $h{$a} <=> length $h{$b}} @ks) { print "$_\n" }


This produces the intended result, but under "use warnings;" one also
gets the expected warnings when $h{two} and $h{four} are compared in
the second line. Is there some smarter way to do this? (Also, I don't
know if cmp is the right tool for comparing boolean values).

I would use grep:

#!/usr/bin/perl
use strict;
use warnings;

my @ks = qw(two three one four);
my %h = ( one => 1, three => 333 );

my @sorted = (
(sort {length $h{$a} <=> length $h{$b}} grep {exists $h{$_}} @ks),
(grep {!exists $h{$_}} @ks)
);

print "Keys: @sorted\n";

__OUTPUT__

Keys: one three two four
 
U

Uri Guttman

JG> my @ks = qw(two three one four);
JG> my %h = ( one => 1, three => 333 );

JG> my @sorted = (
JG> (sort {length $h{$a} <=> length $h{$b}} grep {exists $h{$_}} @ks),
JG> (grep {!exists $h{$_}} @ks)
JG> );

all this confusing redundant code is annoying me. i left an exercise on
doing this all one time but it seems i will have to answer it myself. i
will assume length of a key will always be less than some large number
which is how missing keys will sort late. this is an untested ST but
Sort::Maker can generate this for you too.

@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

there is only one call to exists and length and it is much clearer what
you mean to do. and Sort::Maker cleans up all the cruft and you only
need the exists expression.

the point is that you want to do key extraction/processing only one time
both for speed and for clarity. perl6's sort supports this style and
drops the $a/$b redundant stuff in general.

uri
 
P

Peter J. Holzer

i will assume length of a key will always be less than some large
number which is how missing keys will sort late. this is an untested
ST but Sort::Maker can generate this for you too.

@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

'Inf' is always larger than any real number, so you don't have to choose
an arbitrary large number.

hp
 
U

Uri Guttman

PJH> On 2009-02-19 19:11 said:
i will assume length of a key will always be less than some large
number which is how missing keys will sort late. this is an untested
ST but Sort::Maker can generate this for you too.

@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

PJH> 'Inf' is always larger than any real number, so you don't have to choose
PJH> an arbitrary large number.

i wanted to keep it an integer. in Sort::Maker that makes a real
difference - comparing 4 (or 8) byte ints vs 8 byte floats. maybe with
64 bit ints it doesn't matter, but you do need to specify integer v
float if you use the GRT style of sorting. the other styles use <=> so
it doesn't matter and Inf would work. and i don't think i have ever seen
a proper need for Inf before this. :)

uri
 
S

sln

Hi experts

Say I have an array @ks containing some strings which may or may not
exist as keys in a hash %h. How do I sort @ks such that the
non-existent keys come last (in any order), while the existing keys
are sorted using, say, length($h{$a}) <=> length($h{$a}) (that is not
really the property of the values I'll sort by, but that's not
important).

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { exists $h{$b} cmp exists $h{$a} ||
length $h{$a} <=> length $h{$b}} @ks) { print "$_\n" }


This produces the intended result, but under "use warnings;" one also
gets the expected warnings when $h{two} and $h{four} are compared in
the second line. Is there some smarter way to do this? (Also, I don't
know if cmp is the right tool for comparing boolean values).

I think cmp AND <=> were exactly the right tools.
This should work for you.

Thanks for posting this!
-sln

-----------------------------------------
## sort_junk.pl
## -sln
##
use strict;
use warnings;

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

## Could probably take advantage of an apparent bug/feature that exists() returns stringified '1' for true and '' for false.
## As in $vvv = ''; print "not '$vvv'\n" if (!$vvv);

print "\n1: Your equivalent -----\n";

for (sort {
my $Ea = exists $h{$a};
my $Eb = exists $h{$b};
my $comp = $Eb cmp $Ea;
print "a=$a , b=$b , Ea='$Ea'(".length($Ea)."), Eb='$Eb'(".length($Eb)."), cmp=$comp\n";
$Ea != $Eb ?
$Eb cmp $Ea :
length $h{$a} <=> length $h{$b}
} @ks)
{ print "$_\n" }

print "\n2: Your new -------\n";

for (sort {
my ($xb, $xa) = (exists $h{$b}, exists $h{$a});
!($xb & $xa) ? $xb cmp $xa : length $h{$a} <=> length $h{$b}
} @ks)
{ print "$_\n" }

__END__

output:

1: Your equivalent -----
a=two , b=three , Ea=''(0), Eb='1'(1), cmp=1
a=one , b=four , Ea='1'(1), Eb=''(0), cmp=-1
a=three , b=one , Ea='1'(1), Eb='1'(1), cmp=0
a=three , b=four , Ea='1'(1), Eb=''(0), cmp=-1
a=two , b=four , Ea=''(0), Eb=''(0), cmp=0
Use of uninitialized value in length at sort_junk.pl line 21.
Use of uninitialized value in length at sort_junk.pl line 21.
one
three
two
four

2: Your new -------
one
three
two
four
 
E

Eric Pozharski

JE> There are four different cases you have to consider in your custom
JE> compare function:

JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0

this is why doing a prefilter on the sort keys makes life much
simpler. then you only have 1 real case to worry about - whether a key
exists or not. you can replace a non-existant key with a fake one that
always sorts higher than normal keys. this can be done in an ST or in
the key extraction code in Sort::Maker. i leave how to do that as an
exercise but it is much simpler than all the double sided checking you
propose.

I beg to differ. Jurgen is right that OP must get realtional and
equality operators and boolean algebra straight first (as if mine is).

Fish (untested, subject to precednce issues)

sort {
(exists $h{$a} xor exists $h{$b}) ?
exists $h{$b} - exists $h{$a} :
0
} @ks
 
U

Uri Guttman

JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0
EP> I beg to differ. Jurgen is right that OP must get realtional and
EP> equality operators and boolean algebra straight first (as if mine is).

EP> sort {
EP> (exists $h{$a} xor exists $h{$b}) ?
EP> exists $h{$b} - exists $h{$a} :
EP> 0
EP> } @ks

and where does that sort the lengths? even though it isn't a real
requirement we all seem to have been using it. when you add that back
you get much more complicated comparisons. i haven't even brought up
speed for which the presort key extraction is needed. see my other post
for an example which should work if i typed it cleanly and is simpler,
clearer and faster.

uri
 
R

Rasmus Villemoes

Uri Guttman said:
@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

Let me see if I understand what's going on: The last two lines builds
an array of refs. Each ref is a reference to an anonymous array with
exactly two entries: a maybe-key and the length of $h{key} (or some
large number). Then this array of refs is sorted according to the
contents of the [1] entry of the ref'ed anonymous array, and finally
one extracts the [0] entry from each anonymous array. Smart.
the point is that you want to do key extraction/processing only one time
both for speed and for clarity.

Yes, that it is what I was trying to achieve. But I would expect that
the allocation of memory for all these small anonymous arrays would be
rather time-consuming, also.


Thanks to everybody for your answers. They confirm perl's slogan, and
it's nice to see the different approaches. For the time being I will
use &&-expressions and apply a decreasing function to length() so that
the overall comparison of existing keys ends up in ascending order. If
no elements are longer than 9 this actually works:

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { (exists $h{$b} && 10-length $h{$b}) <=>
(exists $h{$a} && 10-length $h{$a}) } @ks) { print "$_\n" }
 
U

Uri Guttman

RV> Uri Guttman said:
@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

RV> Let me see if I understand what's going on: The last two lines builds
RV> an array of refs. Each ref is a reference to an anonymous array with
RV> exactly two entries: a maybe-key and the length of $h{key} (or some
RV> large number). Then this array of refs is sorted according to the
RV> contents of the [1] entry of the ref'ed anonymous array, and finally
RV> one extracts the [0] entry from each anonymous array. Smart.

the concept is very old and this particular style is attributed to
randal schwarz and called the schwarzian transform. my module
Sort::Maker can generate this or the even faster GRT style.

RV> Yes, that it is what I was trying to achieve. But I would expect that
RV> the allocation of memory for all these small anonymous arrays would be
RV> rather time-consuming, also.

nope. you need to learn some algorithm theory before you make that
claim. the malloc and key extraction stuff is O(N) which increases
linearly. the actual sort comparisons are O(N log N) which increase much
faster. so for short data sets, your will be faster but as the set size
increases the ST will get faster until it blows away normal
sorting. again, Sort::Maker has an article about this and its docs cover
it too.

besides the speed gain, the removal of redundant and confusing code
makes this style much better for complex sorts. the map/sort/map is an
idiom which is not hard to get but you have only one set of key
extraction logic vs 2.

RV> Thanks to everybody for your answers. They confirm perl's slogan, and
RV> it's nice to see the different approaches. For the time being I will
RV> use &&-expressions and apply a decreasing function to length() so that
RV> the overall comparison of existing keys ends up in ascending order. If
RV> no elements are longer than 9 this actually works:

RV> my @ks = qw(two three one four);
RV> my %h = ( one => 1,
RV> three => 333 );

RV> for (sort { (exists $h{$b} && 10-length $h{$b}) <=>
RV> (exists $h{$a} && 10-length $h{$a}) } @ks) { print "$_\n" }

and you consider that clearer logic and code than the ST version? it
will break on longer strings as you say which is a flaw. data changes
and you have a bug waiting to happen. that is not good coding.

uri
 
S

sln

Uri Guttman said:
@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

Let me see if I understand what's going on: The last two lines builds
an array of refs. Each ref is a reference to an anonymous array with
[snip]
Thanks to everybody for your answers. They confirm perl's slogan, and
it's nice to see the different approaches. For the time being I will
use &&-expressions and apply a decreasing function to length() so that
the overall comparison of existing keys ends up in ascending order. If
no elements are longer than 9 this actually works:

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { (exists $h{$b} && 10-length $h{$b}) <=>
(exists $h{$a} && 10-length $h{$a}) } @ks) { print "$_\n" }

This is not a very clever idea.
First, using numeric context in the hash.
Given the first, second treating it as a string.
Given the first and second, third is to treat it as a 10 digit string.

Example:
two => 2e10 in the hash
becomes
two => 20000000000 , length = 11
when it hits the length() function

So as a general rule, sorting by length in numeric context is not a good
idea.

The fourth is that its not portable.
There is two parts to this, one is forcing non-existing elements to the
bottom, the other to sort the existing ones and bring to the top.
These two parts have to be kept separate for portablity.

for (sort { (exists $h{$b} && $h{$b}) <=>
(exists $h{$a} && $h{$a}) } @ks) { print "$_\n" }
^^^^
This wont work if you decide to sort by value instead of length.

The most portable approach is a scheme like this:

length: !($xb & $xa) ? $xb cmp $xa : length $h{$a} <=> length $h{$b}
!($xb & $xa) ? $xb cmp $xa : length $h{$b} <=> length $h{$a}
value: !($xb & $xa) ? $xb cmp $xa : $h{$a} <=> $h{$b}
!($xb & $xa) ? $xb cmp $xa : $h{$b} <=> $h{$a}
or more complex: !($xb & $xa) ? $xb cmp $xa : func($h{$a},$h{$b})

Any scheme will work as long as there is true separation between parts.


-sln
 
E

Eric Pozharski

*SKIP* [ since uri skipped attribution anyway ]
EP> I beg to differ. Jurgen is right that OP must get realtional and
EP> equality operators and boolean algebra straight first (as if mine is).

EP> sort {
EP> (exists $h{$a} xor exists $h{$b}) ?
EP> exists $h{$b} - exists $h{$a} :
EP> 0
EP> } @ks

and where does that sort the lengths? even though it isn't a real
requirement we all seem to have been using it. when you add that back
you get much more complicated comparisons.

Comparision on lengthes has nothing to do with Jurgen's point. That
comparision is a patch and nothing more.
i haven't even brought up
speed for which the presort key extraction is needed.

Speed? Great, now we are about speed. OP has no clue, talking about
speed with him leads to conclusion you teach him that rude thing of 23
letters (space included).
see my other post
for an example which should work if i typed it cleanly and is simpler,
clearer and faster.

That's your B<Sort::Maker>, your golf, and be it good for you. I'm
afraid that OP didn't get your comments anyway. But clpm is a
discussion group.
 
S

sln

Uri Guttman said:
@sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, exists($h{$_}) ? length $h{$_} : 9999999 ],
@unsorted ;

Let me see if I understand what's going on: The last two lines builds
an array of refs. Each ref is a reference to an anonymous array with
[snip]
Thanks to everybody for your answers. They confirm perl's slogan, and
it's nice to see the different approaches. For the time being I will
use &&-expressions and apply a decreasing function to length() so that
the overall comparison of existing keys ends up in ascending order. If
no elements are longer than 9 this actually works:

my @ks = qw(two three one four);
my %h = ( one => 1,
three => 333 );

for (sort { (exists $h{$b} && 10-length $h{$b}) <=>
(exists $h{$a} && 10-length $h{$a}) } @ks) { print "$_\n" }

This is not a very clever idea.
First, using numeric context in the hash.
Given the first, second treating it as a string.
Given the first and second, third is to treat it as a 10 digit string.
Actually, this is not why it's not clever, since you can pass length '$h{$a}' to
force string context
Example:
two => 2e10 in the hash
becomes
two => 20000000000 , length = 11
when it hits the length() function

So as a general rule, sorting by length in numeric context is not a good
idea.

But, this is true. Too many examples to show why. Floating point, exponential,
etc. It loses it's context in what length() is for... really strings.
The fourth is that its not portable.
There is two parts to this, one is forcing non-existing elements to the
bottom, the other to sort the existing ones and bring to the top.
These two parts have to be kept separate for portablity.

for (sort { (exists $h{$b} && $h{$b}) <=>
(exists $h{$a} && $h{$a}) } @ks) { print "$_\n" }

The above won't work in value context, numeric by nature with <=>.
On the other hand, 'exists $h{$b}' is NOT meant to be interpreted as
numeric, but as a string, since 0 can equal 0. But '' is less than '1'
in a compare, but equal length numerically. The two don't mix.
^^^^
This wont work if you decide to sort by value instead of length.

The most portable approach is a scheme like this:
$xa = exists $h{$a};
string context, returns '1' for true, '' for false
length: !($xb & $xa) ? $xb cmp $xa : length $h{$a} <=> length $h{$b}
!($xb & $xa) ? $xb cmp $xa : length $h{$b} <=> length $h{$a}
value: !($xb & $xa) ? $xb cmp $xa : $h{$a} <=> $h{$b}
!($xb & $xa) ? $xb cmp $xa : $h{$b} <=> $h{$a}
^^^^^^^^^^^^^^^
Haven't tried this one, %75 of my theory works, this is the %25 haven't tried.
My confidence is %100 the non-existent will decend to the bottom.
or more complex: !($xb & $xa) ? $xb cmp $xa : func($h{$a},$h{$b})

Any scheme will work as long as there is true separation between parts.


-sln

Hope I cleared some stuff up. I know people will find this knowledge in the
archives someday.

-sln
 
R

Rasmus Villemoes

Uri Guttman said:
RV> Yes, that it is what I was trying to achieve. But I would expect that
RV> the allocation of memory for all these small anonymous arrays would be
RV> rather time-consuming, also.

nope. you need to learn some algorithm theory before you make that
claim. the malloc and key extraction stuff is O(N) which increases
linearly. the actual sort comparisons are O(N log N) which increase much
faster.

Actually, I _do_ know some algorithm theory, and I also know that in
concrete situations with some upper bound on N an O(N log N) algorithm
may be faster than an O(N) algorithm, because of the unmentioned
constants hidden in the O-notation.
so for short data sets, your will be faster

Yes, that was what I thought. I didn't write this, but I know that my
data sets will never have more than around 40 elements (most probably
even shorter). But I will need to do this lots of times over, so I'm
interested in using the method which is fastest for short data
sets. Once I get some real data to work with, I will test the
different methods you and others have suggested.
RV> my @ks = qw(two three one four);
RV> my %h = ( one => 1,
RV> three => 333 );

RV> for (sort { (exists $h{$b} && 10-length $h{$b}) <=>
RV> (exists $h{$a} && 10-length $h{$a}) } @ks) { print "$_\n" }

and you consider that clearer logic and code than the ST version?

No, I didn't say it was clearer. The problem with string lengths is
non-existent in the real case I'm considering, because, as mentioned,
I am really looking at some other properties of $h{$a}.
 
J

Jürgen Exner

Uri Guttman said:
JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0 [...]
when you add that back
you get much more complicated comparisons. i haven't even brought up
speed for which the presort key extraction is needed. see my other post
for an example which should work if i typed it cleanly and is simpler,
clearer and faster.

Different approaches do the same problem.

You are favouring reducing/adjusting the data domain such that you can
use standard Perl operators while I favour adding a new comparison
operator to my data algebra, i.e. I have a given data domain and create
the proper comparison operator for that given domain.
To me my approach is much cleaner and simpler because I don't have to
tweak the data set just to make the comparison work. Also, I am not
convinced that your speed argument is correct, but it's really not
important enough to write a big benchmark test.
To everyone his own, I guess.

jue
 
U

Uri Guttman

JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0
this is why doing a prefilter on the sort keys makes life much
simpler.
JE> [...]
when you add that back
you get much more complicated comparisons. i haven't even brought up
speed for which the presort key extraction is needed. see my other post
for an example which should work if i typed it cleanly and is simpler,
clearer and faster.

JE> Different approaches do the same problem.

JE> You are favouring reducing/adjusting the data domain such that you can
JE> use standard Perl operators while I favour adding a new comparison
JE> operator to my data algebra, i.e. I have a given data domain and create
JE> the proper comparison operator for that given domain.
JE> To me my approach is much cleaner and simpler because I don't have to
JE> tweak the data set just to make the comparison work. Also, I am not
JE> convinced that your speed argument is correct, but it's really not
JE> important enough to write a big benchmark test.
JE> To everyone his own, I guess.

the prefilter design simplifies the logic no matter how you slice
it. one common bug in multi-key sorts is getting the extraction right
for each key and also keeping the proper order of
comparisons. prefiltering reduces bugs because the extraction is coded
one time and not twice with $a and $b. and it keeps the actual
comparison code shorter as well so it is easier to manage the key order
issues (sort up/down, etc.). as for speed, sort::maker comes with a
benchmark script and the ability to generate a typical sort block like
you have been doing as well as faster versions. it is easy to find where
the breakeven point is for speed. the prefilter design is well known to
be much faster for larger data sets and especially so for multi-key and
complex sorts. nuff said here, i don't need to defend my point as it has
been proven many times.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top