a new Tree::Interval-like module on top of Tree::RB?


I

Ivan Shmakov

So, I've crafted the base for a Tree::Interval-like module,
implemented on top of Tree::RB, and I'm open to suggestions of
what'd be a suitable name for it?

The obvious candidates are Tree::RB::Interval and
Tree::Interval::RB, but I'm uncertain if it'd be nice to
infiltrate someone's else CPAN namespace in this case.

FWIW, Tree::Interval::RB seems more appropriate, as the code as
it is requires only the compare, delete, insert and lookup
methods from the "backend," and thus could probably be not that
hard to generalize. (Although the lookup method is expected to
allow for "imprecise" matching, and then it demands a way to
iterate onwards from the node found...)

A nice feature of this code is that it relies on the tree's own
"compare" function, which could be rather arbitrary. Thus, it's
possible to, say:

my $map
= XXX->new (...);
$map->set ("from", "to", "value");
my $a
= $map->lookup ("fuzzy");
## => "value"

(provided that the tree is built around the "cmp" function.)

TIA.
 
Ad

Advertisements

I

Ivan Shmakov

Ivan Shmakov said:
So, I've crafted the base for a Tree::Interval-like module,
implemented on top of Tree::RB, and I'm open to suggestions of what'd
be a suitable name for it?
[...]

FWIW, Tree::Interval::RB seems more appropriate, as the code as it is
requires only the compare, delete, insert and lookup methods from the
"backend," and thus could probably be not that hard to generalize.

So, I've ended up with Tree::Range::RB, and Tree::Range::base
for the ->get_range () and ->range_set () methods. Both are now
released as Tree::Range 0.1. The CPAN page for the version is:

http://search.cpan.org/~onegray/Tree-Range-0.1/

[...]

PS. The namespace registration request is filed and pending approval.
 
I

Ivan Shmakov

The namespace registration process is pretty-much defunct at this
point. Both Tree::Range::RB and Tree::Range::base are already as
'properly registered' as they need to be; that is, they are listed in
02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
will find your distributions for those package names.

Yes.

Still, it's recommended by [1] (though feel free to file a bug
report if you think it shouldn't be):

[...] A registration is not a prerequisite for uploading. It is
just recommended for better searchability of the CPAN and to avoid
namespace clashes. [...]

Besides, it'll add some useful bits of information to [2].

[1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod
[2] http://search.cpan.org/~onegray/
However, it goes against convention to have a distribution called
Tree-Range that does not contain a module Tree::Range. You should
consider either renaming Tree::Range::base

Won't this go against the example set by Digest::base? (which I
tend to think as a good enough to stick to.)
or at least providing a documentation-only Tree::Range explaining why
there isn't a module there.

I guess I'd consider the latter for 0.2. (Which I hope'll be
the release fixing only the documentation and the metadata.)

The other option would be to have Tree::Range->new () behave
like Digest->new ().
 
I

Ivan Shmakov

Ben Morrow said:
[...]
Still, it's recommended by [1] (though feel free to file a bug
report if you think it shouldn't be):
[...] A registration is not a prerequisite for uploading. It is
just recommended for better searchability of the CPAN and to avoid
namespace clashes. [...]
See <https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/
master/lancaster-consensus.md#module-registration> (and indeed the
rest of that document). While this is a 'new' agreement, in the Perl
world agreements and specs usually come after implementation, so it
is describing the way things currently work in practice.

So, I guess [1] is going to be updated within a few months from
now. Thanks.
[...]
Won't this go against the example set by Digest::base? (which I tend
to think as a good enough to stick to.)
It's not a convention I've seen before, but there's nothing wrong
with it. (It would be more usual to be ::Base, at least; the analogy
with base.pm is not really relevant any more.)

AIUI, the "capital-case" names under Digest:: are reserved for
the actual message digest algorithms. For instance,
Digest->new ("SHA-1") ends up in a call to Digest::SHA->new (1).
or at least providing a documentation-only Tree::Range explaining
why there isn't a module there.
I guess I'd consider [it] for 0.2. (Which I hope'll be the release
fixing only the documentation and the metadata.)
The other option would be to have Tree::Range->new () behave like
Digest->new ().
Yeah, perhaps. I haven't really got a sense of what this module's
useful for, so I can't comment.

One of the possible uses of Tree::Range::RB specifically is to
operate on .newsrc data. Consider, e. g.:

### gxmj7tsetytsnbnp85yhxyjror.perl -*- Perl -*-

### Ivan Shmakov, 2013

### Code:

use common::sense;

require Tree::Range::RB;

sub ranges_parse {
my ($in, @ranges) = @_;
foreach my $range (@ranges) {
my ($a, $b)
= ($range =~ /^([0-9]+)(?:-([0-9]+))?$/)
or die ($range, ": is not a vaild range");
$in->range_set ($a, 1 + ($b // $a), 1);
}
## .
$in;
}

my @comp_lang_perl_misc_s = qw {
1-27551 27567 27575 27602-27606 27627-27629 27632 27634-27636
27638-27996
27998-28002 28004 28006-28008 28011-28013 28015-28027 28029 28032
28036-28093
};

my $comp_lang_perl_misc
= Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
"equal-p" => sub { $_[0] eq $_[1]; } });
ranges_parse ($comp_lang_perl_misc, @comp_lang_perl_misc_s);

{
local ($,, $\)
= (", ", "\n");

## check if 28007 was read? (i. e., what range 28007 belongs to?)
print ($comp_lang_perl_misc->get_range (28007));
## => 1, 28006, 28009

## now the user has marked some more articles as read
ranges_parse ($comp_lang_perl_misc, qw (28005 28009 28010));

## check if 28011 was read? (i. e., what range 28011 belongs to?)
print ($comp_lang_perl_misc->get_range (28011));
## => 1, 28004, 28014

## note how the range was extended as the gaps were filled
}

### gxmj7tsetytsnbnp85yhxyjror.perl ends here

Similarly, by utilizing the Tree::RB's own iterator, a simple
and efficient serializer for this kind of data can be built.
 
I

Ivan Shmakov

Ben Morrow said:
One of the possible uses of Tree::Range::RB specifically is to
operate on .newsrc data. Consider, e. g.:
[...]
my $comp_lang_perl_misc
= Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
"equal-p" => sub { $_[0] eq $_[1]; } });
[...]
print ($comp_lang_perl_misc->get_range (28007));
## => 1, 28006, 28009
[...]

Is this importantly different from Set::IntSpan/::Fast/::XS?

Yes, although not in the case of .newsrc.

First of all, each range (span, interval, etc.) may have an
associated value. It's "1" (or "undef", for the "complementary"
ranges) in the example above, but it needs not to be.

Moreover, unlike Set::IntSpan, Tree::Range::RB is applicable to
an arbitrary ordered set. (Note the "cmp" option above.)

Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
documentation) implemented on top of an ordered list seems to
imply the insertion and deletion times of O (N), while for a
red-black tree they're both O (log N). Though ::XS may still
outperform a pure-Perl tree-based implementation on reasonable
use cases.
 
Ad

Advertisements

I

Ivan Shmakov

[...]
Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
documentation) implemented on top of an ordered list seems to imply
the insertion and deletion times of O (N), while for a red-black tree
they're both O (log N). Though ::XS may still outperform a pure-Perl
tree-based implementation on reasonable use cases.

So, I've ran a simplistic benchmark over the three, which is,
roughly (the exact code is as shown below):

# my ($size, $iter) = (4194304, 5);
my @vec
= (map { int (rand ($size)); } (1 .. $size));

my $set
= Set::IntSpan::Fast->new ();
$set->add ($_)
foreach (@vec);

It took awhile to complete, but the results seem to suggest that
Tree::Range::RB is indeed faster than both Set::IntSpan::Fast
and Set::IntSpan, should the problem size be sufficiently large.

Size: 4194304 Iterations: 5 (negative is time in seconds)
Tree::Range::RB: 8003 wallclock secs (7944.34 usr 5.88 sys + 0.00 cusr 0.00 csys = 7950.22 CPU) @ 0.00/s (n=5)
Set::IntSpan: 17759 wallclock secs (17556.78 usr 10.12 sys + 0.00 cusr 0.00 csys = 17566.90 CPU) @ 0.00/s (n=5)
Set::IntSpan::Fast: 19318 wallclock secs (19060.33 usr 9.25 sys + 0.00 cusr 0.00 csys = 19069.58 CPU) @ 0.00/s (n=5)
s/iter Set::IntSpan::Fast Set::IntSpan Tree::Range::RB
Set::IntSpan::Fast 3814 -- -8% -58%
Set::IntSpan 3513 9% -- -55%
Tree::Range::RB 1590 140% 121% --
44565.74user 26.12system 12:31:28elapsed 98%CPU (0avgtext+0avgdata 972212maxresident)k
224inputs+0outputs (1major+316600minor)pagefaults 0swaps

Or did I make some silly mistake with that?

TIA.

### Code:

use common::sense;

use Benchmark qw (cmpthese timethis);
use Scalar::Util qw (looks_like_number);

require Set::IntSpan;
require Set::IntSpan::Fast;
require Tree::Range::RB;

my ($size, $iter)
= (map { (/^0/ ? oct ($_) : $_); } (@ARGV));
$size
//= 0x100000;
$iter
//= -521;
looks_like_number ($size)
or die ($size, ": Size given is not a number\n");
looks_like_number ($iter)
or die ($iter, ": Iterations count given is not a number\n");
warn ("Size: ", 0 + $size, " Iterations: ", 0 + $iter,
" (negative is time in seconds)\n");
my @vec
= (map { int (rand ($size)); } (1 .. $size));

sub xcmp {
## .
$_[0] <=> $_[1];
}

sub xeq {
## .
$_[0] eq $_[1];
}

my $trr
= timethis ($iter,
sub {
my $rat_opt = {
"cmp" => \&xcmp,
"equal-p" => \&xeq
};
my $value
= [ "*value*" ];
my $rat
= Tree::Range::RB->new ($rat_opt);
$rat->range_set ($_, 1 + $_, $value)
foreach (@vec);
},
"Tree::Range::RB", "all");
my $sis
= timethis ($iter,
sub {
my $set
= Set::IntSpan->new ();
$set->insert ($_)
foreach (@vec);
},
"Set::IntSpan", "all");
my $sisf
= timethis ($iter,
sub {
my $set
= Set::IntSpan::Fast->new ();
$set->add ($_)
foreach (@vec);
},
"Set::IntSpan::Fast", "all");

cmpthese ({
"Set::IntSpan" => $sis,
"Set::IntSpan::Fast" => $sisf,
"Tree::Range::RB" => $trr
},
"all");
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top