domain name ordering

Ivan Shmakov · Feb 28, 2013

One more "domain name" problem: write an "inequality" predicate
to compare domain names "right to left". IOW:

my @ordered
= qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

A trivial solution is to split /\./ each of the names, reverse
the resulting lists, and then compare them elementwise, until
either one of the lists ends (which is then the lesser one), or
an inequality is found. Hence:

sub list_cmp (&$$) {
## BTW, how do I imitate sort's own signature here?
## (i. e., make $cmp truly optional.)
my ($cmp, $a, $b) = @_;
$cmp
//= sub { $_[0] cmp $_[1]; };
for (my $i = 0; $i <= $#$a; $i++) {
## .
return 1
if ($i > $#$b);
my $v
= &$cmp ($a->[$i], $b->[$i]);
## .
return $v
if ($v != 0);
}

## .
return ($#$a < $#$b ? -1 : 0);
}

sub dns_name_cmp ($$) {
my ($a, $b) = @_;

## .
list_cmp (undef,
[reverse (split (/\./, $a, -1))],
[reverse (split (/\./, $b, -1))]);
}

## print in reverse
print (join ("\n", sort { - dns_name_cmp ($a, $b); } (@ordered)), "\n");
# foo.example.org
# foo.bar.example.org
# bar.example.org
# qux.example.net

However, doing a split on every sort iteration doesn't seem all
that sensible (or does it?) Let's try with rindex () instead:

sub dns_name_cmp ($$) {
my ($a, $b) = @_;

my ($ta, $tb)
= (-1 + length ($a), -1 + length ($b));
while ($ta >= 0 && $tb >= 0) {
## NB: $[ is deprecated, thus rindex () >= -1
my ($i, $j)
= (rindex ($a, ".", $ta),
rindex ($b, ".", $tb));
## .
my $v
= (substr ($a, 1 + $i, $ta - $i)
cmp substr ($b, 1 + $j, $tb - $j));
return $v
if ($v != 0);
($ta, $tb)
= (-1 + $i, -1 + $j);
}

## .
return ($ta > 0 ? +1
: $tb > 0 ? -1
: 0);
}

## print in reverse
print (join ("\n", sort { - dns_name_cmp ($a, $b); } (@ordered)), "\n");
# foo.example.org
# foo.bar.example.org
# bar.example.org
# qux.example.net

Hopefully I didn't miss any corner case with these.

Now, it makes me wonder if there's an easier way to write this
function...

Dr.Ruud · Feb 28, 2013

One more "domain name" problem: write an "inequality" predicate
to compare domain names "right to left". IOW:

my @ordered
= qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

You can build a hash from that, for example:

my %dname = (
net => {
example => { qux => { '.' => -1 } },
},
org => {
example => {
bar => {
foo => { '.' => -1 },
'.' => -1,
},
foo => { '.' => -1 },
},
},
);

Or do
my @dnames= sort { length($b) <=> length($a) or $a cmp $b } @ordered;
to make it:
qw(foo.bar.x.org bar.x.org foo.x.org qux.x.net);
and use a 'for my $dname (@dnames) { ... } loop
that stops at the first match on
/(?:^|\.)\Q$dname$/

Bo Lindbergh · Feb 28, 2013

Transform each FQDN into something that will sort correctly
using the default string comparison; sort; reverse the transformation.

@ordered = map {
join ".", reverse split / /;
} sort map {
join " ", reverse split /\./;
} @unordered;

/Bo Lindbergh

Rainer Weikusat · Feb 28, 2013

Ivan Shmakov said:
One more "domain name" problem: write an "inequality" predicate
to compare domain names "right to left". IOW:

my @ordered
= qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

A trivial solution is to split /\./ each of the names, reverse
the resulting lists, and then compare them elementwise, until
either one of the lists ends (which is then the lesser one), or
an inequality is found. Hence:

sub list_cmp (&$$) {
[...]

However, doing a split on every sort iteration doesn't seem all
that sensible (or does it?) Let's try with rindex () instead:

sub dns_name_cmp ($$) {
my ($a, $b) = @_;

my ($ta, $tb)
= (-1 + length ($a), -1 + length ($b));

-1 + length($a) == length($a) - 1

while ($ta >= 0 && $tb >= 0) {
## NB: $[ is deprecated, thus rindex () >= -1
my ($i, $j)
= (rindex ($a, ".", $ta),
rindex ($b, ".", $tb));
## .
my $v
= (substr ($a, 1 + $i, $ta - $i)
cmp substr ($b, 1 + $j, $tb - $j));

Have you benchmarked that? You're still doing the structure analysis
and component extraction for ever comparison and out of my head, I
wouldn't be absolutely sure that doing it this way instead of with
split is actually faster.

The obvious other idea would be to split the domain names once and
leave them this way until the list is sorted. If you could afford to
lose some speed at the expense of code clarity, this could be put in a
class overloading cmp which either constructs the 'non-splitted,
non-reversed' representation 'on demand' (=> overloaded "") or keeps
both around or constructs the 'exploded' one on demand.

Instead of reversing the splitted lists, one could also compare from
the end, possibly using negative indices ($a[$#a] is the same as $a[-1]).

Rainer Weikusat · Feb 28, 2013

Bo Lindbergh said:
Transform each FQDN into something that will sort correctly
using the default string comparison; sort; reverse the transformation.

@ordered = map {
join ".", reverse split / /;
} sort map {
join " ", reverse split /\./;
} @unordered;

This is a neat idea. It could be made slightly more general by using
\000 instead of the space as alternate label separator.

Rainer Weikusat · Feb 28, 2013

Rainer Weikusat said:
This is a neat idea. It could be made slightly more general by using
\000 instead of the space as alternate label separator.

A 'better' ('faster') idea which needs even less Perl code would be to
build a hash of sort keys and sort the domain name array based on
comparing these instead of destroying and recreating the dataset which
is supposed to be sorted:

------------------
use Benchmark;

my @in = reverse(qw(qux.example.net bar.example.org foo.bar.example.org foo.example.org));

timethese(-5,
{
sort_x => sub {
return map {
join '.', reverse split /\000/
} sort map {
join "\000", reverse split /\./
} @in;
},

sort_y => sub {
my %keys;

$keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
return sort { $keys{$a} cmp $keys{$b} } @in;
}});

Ivan Shmakov · Feb 28, 2013

[...]

A 'better' ('faster') idea which needs even less Perl code would be
to build a hash of sort keys and sort the domain name array based on
comparing these instead of destroying and recreating the dataset
which is supposed to be sorted:

[...]

Unless I be mistaken, sort { } () with the second variant of
dns_name_cmp I've originally posted is still faster.

$ cat < 3zytr64ehn1aqckj5mk93da6zg.perl
use common::sense;

require Benchmark;

## Courtesy of /usr/share/dict/words
my @n1 = qw {
igloo.burgs.tuft leads.polyp.lakes doily.rude.corks rains.cuds.bilks
aces.mumps.colts pink.bebop.prune coeds.troop.sees tour.hulks.rungs
vent.dial.whelk logos.loped.nor mated.bran.cited pies.loaf.rune
sperm.yelp.harms how.scaly.ante dogma.lawns.plods dell.yaw.piano
caged.daub.gores weird.sofas.arias rave.gouty.tamp sics.avows.today
crass.execs.smuts dale.real.stack boots.dew.pool demur.fancy.stoop
gruff.aspen.memo gaze.homey.sorts new.slot.queue wiry.share.gulf
phase.canny.vogue grace.eddy.beaux juts.wands.doc helm.was.ladle
};
my @n2 = qw {
bums.example.com fools.example.com sic.example.org feel.example.com
wound.example.net stop.example.com diced.example.org
purse.example.com sip.example.net poll.example.com whole.example.net
kinky.example.net homer.example.org ulnas.example.com
won.example.org chase.example.com prong.example.net
hacks.example.net link.example.com monk.example.com sexy.example.net
bogs.example.net crest.example.org rays.example.com
solar.example.com click.example.net mixer.example.com
rib.example.org pinup.example.com gees.example.net fizzy.example.com
aisle.example.org
};

## sub dns_name_cmp { ... as per ... }

sub rw_sort (@) {
my %keys;

$keys{$_}
= join ("\000", reverse (split (/\./, $_)))
for @_;
## .
return
sort { $keys{$a} cmp $keys{$b} } @_;
}

Benchmark::timethese (1024 * 1024, {
"is-1" => sub { sort dns_name_cmp (@n1); },
"is-2" => sub { sort dns_name_cmp (@n2); },
"rw-1" => sub { rw_sort (@n1); },
"rw-2" => sub { rw_sort (@n1); }
});
$ perl 3zytr64ehn1aqckj5mk93da6zg.perl
Benchmark: timing 1048576 iterations of is-1, is-2, rw-1, rw-2...
is-1: -1 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 34952533.33/s (n=1048576)
(warning: too few iterations for a reliable count)
is-2: -1 wallclock secs ( 0.16 usr + 0.00 sys = 0.16 CPU) @ 6553600.00/s (n=1048576)
(warning: too few iterations for a reliable count)
rw-1: 69 wallclock secs (69.14 usr + 0.03 sys = 69.17 CPU) @ 15159.40/s (n=1048576)
rw-2: 69 wallclock secs (68.61 usr + 0.02 sys = 68.63 CPU) @ 15278.68/s (n=1048576)
$

Rainer Weikusat · Feb 28, 2013

Ben Morrow said:
Quoth Rainer Weikusat said:

Rainer Weikusat said:

Transform each FQDN into something that will sort correctly
using the default string comparison; sort; reverse the transformation.

@ordered = map {
join ".", reverse split / /;
} sort map {
join " ", reverse split /\./;
} @unordered;

This is a neat idea. It could be made slightly more general by using
\000 instead of the space as alternate label separator.

Click to expand...

A 'better' ('faster') idea which needs even less Perl code would be to
build a hash of sort keys and sort the domain name array based on
comparing these instead of destroying and recreating the dataset which
is supposed to be sorted: [...]
my %keys;

$keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
return sort { $keys{$a} cmp $keys{$b} } @in;

Click to expand...

Or just use a standard ST

map $$_[0],
sort { $$a[1] cmp $$b[1] }
map [ $_, join "\0", reverse split /\./ ],
@in;

I tried that. It is/ was slower for the data I tested with than the
algorithm originally posted by Bo Lindbergh.

Rainer Weikusat · Feb 28, 2013

Ivan Shmakov said:
Rainer Weikusat <[email protected]> writes:

Click to expand...

[...]

A 'better' ('faster') idea which needs even less Perl code would be
to build a hash of sort keys and sort the domain name array based on
comparing these instead of destroying and recreating the dataset
which is supposed to be sorted:

Click to expand...

[...]

Unless I be mistaken, sort { } () with the second variant of
dns_name_cmp I've originally posted is still faster.
[...]

sub rw_sort (@) {
my %keys;

$keys{$_}
= join ("\000", reverse (split (/\./, $_)))
for @_;
## .
return
sort { $keys{$a} cmp $keys{$b} } @_;
}

Benchmark::timethese (1024 * 1024, {
"is-1" => sub { sort dns_name_cmp (@n1); },
"is-2" => sub { sort dns_name_cmp (@n2); },
"rw-1" => sub { rw_sort (@n1); },
"rw-2" => sub { rw_sort (@n1); }
});

The obvious issue with this would be that you're adding a gratuitious
subroutine call per iteration for the 3rd and 4th test. But that
doesn't matter that much: Comparing the labels via substring and
stopping at the first mismatch is several orders of magnitude faster
then processing the complete data once per sort and sorting based on
the 'preprocessed' data.

Jürgen Exner · Feb 28, 2013

Ivan Shmakov said:
However, doing a split on every sort iteration doesn't seem all
that sensible (or does it?) [...]
Now, it makes me wonder if there's an easier way to write this
function...

Standard answer to this whole class of problem is to use a Schwartzian
Transformation: http://en.wikipedia.org/wiki/Schwartzian_transform

jue

Rainer Weikusat · Mar 1, 2013

[...]

Actually, now that I think about it, there's another way of doing this I
haven't seen anywhere else: use dualvars.

use Scalar::Util "dualvar";

sub ksort (&@) {
my $cb = shift;

map $_[$_],
sort
map dualvar($_, do {
local $_ = $_[$_];
$cb->();
}),
0..$#_;
}

my @out = ksort { join "\0", reverse split /\./ } @in;

This is a neat idea as well. I thought about attaching the indices of
the actual data items to the corresponding sort keys and generating
the sorted sequence by pulling data items from the input array based
on the index permutation in the sorted, intermediate array 'somehow'
but abandoned the idea in favor of using anonymous arrays to tack sort
key and data item together (I knew that something called 'a
Schwartzian transform' existed but had not real idea what it actually
was) because I wasn't aware of a sensible way to accomplish the other.

John W. Krahn · Mar 3, 2013

Ivan said:
One more "domain name" problem: write an "inequality" predicate
to compare domain names "right to left". IOW:

my @ordered
= qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

A trivial solution is to split /\./ each of the names, reverse
the resulting lists, and then compare them elementwise, until
either one of the lists ends (which is then the lesser one), or
an inequality is found. Hence:

sub list_cmp (&$$) {
## BTW, how do I imitate sort's own signature here?
## (i. e., make $cmp truly optional.)

You mean prototype, not signature.

$ perl -le'print prototype "CORE::sort"'

But sort doesn't have one so you can't imitate something that is not there.

John

Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
Calculate rang and derang of ordering of subsets	0	Feb 6, 2024
How to put loop result in csv file	1	Jan 2, 2023
Could you fix my code please? (i get no output after inputing the number)	1	Oct 16, 2023
The following palindrome number	2	Jan 10, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
How to multiply two matrices of size in using inline assembly in C++	3	Mar 3, 2024
Problems in creating libraries	0	Nov 3, 2024

domain name ordering

Ivan Shmakov

Dr.Ruud

Bo Lindbergh

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Ivan Shmakov

Rainer Weikusat

Rainer Weikusat

Jürgen Exner

Rainer Weikusat

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads