Sorting based on existence of keys

S

sln

JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0
this is why doing a prefilter on the sort keys makes life much
simpler. JE> [...]
when you add that back
you get much more complicated comparisons. i haven't even brought up
speed for which the presort key extraction is needed. see my other post
for an example which should work if i typed it cleanly and is simpler,
clearer and faster.

JE> Different approaches do the same problem.

JE> You are favouring reducing/adjusting the data domain such that you can
JE> use standard Perl operators while I favour adding a new comparison
JE> operator to my data algebra, i.e. I have a given data domain and create
JE> the proper comparison operator for that given domain.
JE> To me my approach is much cleaner and simpler because I don't have to
JE> tweak the data set just to make the comparison work. Also, I am not
JE> convinced that your speed argument is correct, but it's really not
JE> important enough to write a big benchmark test.
JE> To everyone his own, I guess.

the prefilter design simplifies the logic no matter how you slice
it. one common bug in multi-key sorts is getting the extraction right
for each key and also keeping the proper order of
comparisons. prefiltering reduces bugs because the extraction is coded
one time and not twice with $a and $b. and it keeps the actual
comparison code shorter as well so it is easier to manage the key order
issues (sort up/down, etc.). as for speed, sort::maker comes with a
benchmark script and the ability to generate a typical sort block like
you have been doing as well as faster versions. it is easy to find where
the breakeven point is for speed. the prefilter design is well known to
be much faster for larger data sets and especially so for multi-key and
complex sorts. nuff said here, i don't need to defend my point as it has
been proven many times.

uri

You keep saying 'Pre-Filter' as if it actually means anything in relation to sort.
This is a particular case of sort implentation that has specific logic. There are no
multiple sort keys/fields, whatever in this case.

The logic still has to be adhered to in as far as this boolean:
$h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
$h{$a} exists but $h{$b} doesn't ===> -1
$h{$a} does't exist but $h{$b} does ===> 1
Neither $h{$a} nor $h{$b} exists ===> 0
(I didn't check it, but it looks right.)

It doesen't matter if you do it as a pre-filter or all at once. It may look cleaner
as a pre-filter, but that don't count for squat as far as speed. The same number of
operations have to be performed no matter where the logic is.

As far as multi-key/fields sorting, doing it in one pass, regardless of the sort method
always is faster. Less runs through the function or block.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top