Sorting based on existence of keys

sln · Feb 23, 2009

JE> $h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
JE> $h{$a} exists but $h{$b} doesn't ===> -1
JE> $h{$a} does't exist but $h{$b} does ===> 1
JE> Neither $h{$a} nor $h{$b} exists ===> 0

this is why doing a prefilter on the sort keys makes life much
simpler. JE> [...]
when you add that back
you get much more complicated comparisons. i haven't even brought up
speed for which the presort key extraction is needed. see my other post
for an example which should work if i typed it cleanly and is simpler,
clearer and faster.

Click to expand...

Click to expand...

JE> Different approaches do the same problem.

JE> You are favouring reducing/adjusting the data domain such that you can
JE> use standard Perl operators while I favour adding a new comparison
JE> operator to my data algebra, i.e. I have a given data domain and create
JE> the proper comparison operator for that given domain.
JE> To me my approach is much cleaner and simpler because I don't have to
JE> tweak the data set just to make the comparison work. Also, I am not
JE> convinced that your speed argument is correct, but it's really not
JE> important enough to write a big benchmark test.
JE> To everyone his own, I guess.

the prefilter design simplifies the logic no matter how you slice
it. one common bug in multi-key sorts is getting the extraction right
for each key and also keeping the proper order of
comparisons. prefiltering reduces bugs because the extraction is coded
one time and not twice with $a and $b. and it keeps the actual
comparison code shorter as well so it is easier to manage the key order
issues (sort up/down, etc.). as for speed, sort::maker comes with a
benchmark script and the ability to generate a typical sort block like
you have been doing as well as faster versions. it is easy to find where
the breakeven point is for speed. the prefilter design is well known to
be much faster for larger data sets and especially so for multi-key and
complex sorts. nuff said here, i don't need to defend my point as it has
been proven many times.

uri

You keep saying 'Pre-Filter' as if it actually means anything in relation to sort.
This is a particular case of sort implentation that has specific logic. There are no
multiple sort keys/fields, whatever in this case.

The logic still has to be adhered to in as far as this boolean:
$h{$a} and $h{$b} exist ===> length($h{$a}) <=> length($h{$a})
$h{$a} exists but $h{$b} doesn't ===> -1
$h{$a} does't exist but $h{$b} does ===> 1
Neither $h{$a} nor $h{$b} exists ===> 0
(I didn't check it, but it looks right.)

It doesen't matter if you do it as a pre-filter or all at once. It may look cleaner
as a pre-filter, but that don't count for squat as far as speed. The same number of
operations have to be performed no matter where the logic is.

As far as multi-key/fields sorting, doing it in one pass, regardless of the sort method
always is faster. Less runs through the function or block.

-sln

READFILE sorting coding problem	3	Oct 25, 2013
Sorting keys of hash based on value	4	Mar 22, 2010
mixed cmp operator for sorting	22	Sep 22, 2013
Hash key types and equality of hash keys	2	Mar 1, 2012
Sorting hash of hashes	3	Nov 21, 2011
Sorting AofH over hash key(s)...	7	Oct 30, 2007
FAQ 4.60 How do I sort a hash (optionally by value instead of key)?	0	Mar 14, 2011
sorting a matrix based on values in an array	2	Feb 8, 2011

Sorting based on existence of keys

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads