a new Tree::Interval-like module on top of Tree::RB?

Discussion in 'Perl Misc' started by Ivan Shmakov, Jun 13, 2013.

  1. Ivan Shmakov

    Ivan Shmakov Guest

    So, I've crafted the base for a Tree::Interval-like module,
    implemented on top of Tree::RB, and I'm open to suggestions of
    what'd be a suitable name for it?

    The obvious candidates are Tree::RB::Interval and
    Tree::Interval::RB, but I'm uncertain if it'd be nice to
    infiltrate someone's else CPAN namespace in this case.

    FWIW, Tree::Interval::RB seems more appropriate, as the code as
    it is requires only the compare, delete, insert and lookup
    methods from the "backend," and thus could probably be not that
    hard to generalize. (Although the lookup method is expected to
    allow for "imprecise" matching, and then it demands a way to
    iterate onwards from the node found...)

    A nice feature of this code is that it relies on the tree's own
    "compare" function, which could be rather arbitrary. Thus, it's
    possible to, say:

    my $map
    = XXX->new (...);
    $map->set ("from", "to", "value");
    my $a
    = $map->lookup ("fuzzy");
    ## => "value"

    (provided that the tree is built around the "cmp" function.)

    TIA.

    --
    FSF associate member #7257 np. faceday.mod
    Ivan Shmakov, Jun 13, 2013
    #1
    1. Advertising

  2. Ivan Shmakov

    Ivan Shmakov Guest

    Tree::Range 0.1 released

    >>>>> Ivan Shmakov <> writes:

    > So, I've crafted the base for a Tree::Interval-like module,
    > implemented on top of Tree::RB, and I'm open to suggestions of what'd
    > be a suitable name for it?


    [...]

    > FWIW, Tree::Interval::RB seems more appropriate, as the code as it is
    > requires only the compare, delete, insert and lookup methods from the
    > "backend," and thus could probably be not that hard to generalize.


    So, I've ended up with Tree::Range::RB, and Tree::Range::base
    for the ->get_range () and ->range_set () methods. Both are now
    released as Tree::Range 0.1. The CPAN page for the version is:

    http://search.cpan.org/~onegray/Tree-Range-0.1/

    [...]

    PS. The namespace registration request is filed and pending approval.

    --
    FSF associate member #7257
    Ivan Shmakov, Jun 27, 2013
    #2
    1. Advertising

  3. Ivan Shmakov

    Ivan Shmakov Guest

    Re: Tree::Range 0.1 released

    >>>>> Ben Morrow <> writes:
    >>>>> Quoth Ivan Shmakov <>:


    >> So, I've ended up with Tree::Range::RB, and Tree::Range::base for
    >> the ->get_range () and ->range_set () methods. Both are now
    >> released as Tree::Range 0.1. The CPAN page for the version is:


    >> http://search.cpan.org/~onegray/Tree-Range-0.1/


    >> PS. The namespace registration request is filed and pending
    >> approval.


    > The namespace registration process is pretty-much defunct at this
    > point. Both Tree::Range::RB and Tree::Range::base are already as
    > 'properly registered' as they need to be; that is, they are listed in
    > 02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
    > will find your distributions for those package names.


    Yes.

    Still, it's recommended by [1] (though feel free to file a bug
    report if you think it shouldn't be):

    [...] A registration is not a prerequisite for uploading. It is
    just recommended for better searchability of the CPAN and to avoid
    namespace clashes. [...]

    Besides, it'll add some useful bits of information to [2].

    [1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod
    [2] http://search.cpan.org/~onegray/

    > However, it goes against convention to have a distribution called
    > Tree-Range that does not contain a module Tree::Range. You should
    > consider either renaming Tree::Range::base


    Won't this go against the example set by Digest::base? (which I
    tend to think as a good enough to stick to.)

    > or at least providing a documentation-only Tree::Range explaining why
    > there isn't a module there.


    I guess I'd consider the latter for 0.2. (Which I hope'll be
    the release fixing only the documentation and the metadata.)

    The other option would be to have Tree::Range->new () behave
    like Digest->new ().

    --
    FSF associate member #7257
    Ivan Shmakov, Jun 27, 2013
    #3
  4. Ivan Shmakov

    Ivan Shmakov Guest

    Re: Tree::Range 0.1 released

    >>>>> Ben Morrow <> writes:
    >>>>> Quoth Ivan Shmakov <>:
    >>>>> Ben Morrow <> writes:


    [...]

    >> Still, it's recommended by [1] (though feel free to file a bug
    >> report if you think it shouldn't be):


    >> [...] A registration is not a prerequisite for uploading. It is
    >> just recommended for better searchability of the CPAN and to avoid
    >> namespace clashes. [...]


    > See <https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/
    > master/lancaster-consensus.md#module-registration> (and indeed the
    > rest of that document). While this is a 'new' agreement, in the Perl
    > world agreements and specs usually come after implementation, so it
    > is describing the way things currently work in practice.


    So, I guess [1] is going to be updated within a few months from
    now. Thanks.

    >> [1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod


    [...]

    >>> However, it goes against convention to have a distribution called
    >>> Tree-Range that does not contain a module Tree::Range. You should
    >>> consider either renaming Tree::Range::base


    >> Won't this go against the example set by Digest::base? (which I tend
    >> to think as a good enough to stick to.)


    > It's not a convention I've seen before, but there's nothing wrong
    > with it. (It would be more usual to be ::Base, at least; the analogy
    > with base.pm is not really relevant any more.)


    AIUI, the "capital-case" names under Digest:: are reserved for
    the actual message digest algorithms. For instance,
    Digest->new ("SHA-1") ends up in a call to Digest::SHA->new (1).

    >>> or at least providing a documentation-only Tree::Range explaining
    >>> why there isn't a module there.


    >> I guess I'd consider [it] for 0.2. (Which I hope'll be the release
    >> fixing only the documentation and the metadata.)


    >> The other option would be to have Tree::Range->new () behave like
    >> Digest->new ().


    > Yeah, perhaps. I haven't really got a sense of what this module's
    > useful for, so I can't comment.


    One of the possible uses of Tree::Range::RB specifically is to
    operate on .newsrc data. Consider, e. g.:

    ### gxmj7tsetytsnbnp85yhxyjror.perl -*- Perl -*-

    ### Ivan Shmakov, 2013

    ### Code:

    use common::sense;

    require Tree::Range::RB;

    sub ranges_parse {
    my ($in, @ranges) = @_;
    foreach my $range (@ranges) {
    my ($a, $b)
    = ($range =~ /^([0-9]+)(?:-([0-9]+))?$/)
    or die ($range, ": is not a vaild range");
    $in->range_set ($a, 1 + ($b // $a), 1);
    }
    ## .
    $in;
    }

    my @comp_lang_perl_misc_s = qw {
    1-27551 27567 27575 27602-27606 27627-27629 27632 27634-27636
    27638-27996
    27998-28002 28004 28006-28008 28011-28013 28015-28027 28029 28032
    28036-28093
    };

    my $comp_lang_perl_misc
    = Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
    "equal-p" => sub { $_[0] eq $_[1]; } });
    ranges_parse ($comp_lang_perl_misc, @comp_lang_perl_misc_s);

    {
    local ($,, $\)
    = (", ", "\n");

    ## check if 28007 was read? (i. e., what range 28007 belongs to?)
    print ($comp_lang_perl_misc->get_range (28007));
    ## => 1, 28006, 28009

    ## now the user has marked some more articles as read
    ranges_parse ($comp_lang_perl_misc, qw (28005 28009 28010));

    ## check if 28011 was read? (i. e., what range 28011 belongs to?)
    print ($comp_lang_perl_misc->get_range (28011));
    ## => 1, 28004, 28014

    ## note how the range was extended as the gaps were filled
    }

    ### gxmj7tsetytsnbnp85yhxyjror.perl ends here

    Similarly, by utilizing the Tree::RB's own iterator, a simple
    and efficient serializer for this kind of data can be built.

    --
    FSF associate member #7257
    Ivan Shmakov, Jun 27, 2013
    #4
  5. Ivan Shmakov

    Ivan Shmakov Guest

    Re: Tree::Range 0.1 released

    >>>>> Ben Morrow <> writes:
    >>>>> Quoth Ivan Shmakov <>:


    >> One of the possible uses of Tree::Range::RB specifically is to
    >> operate on .newsrc data. Consider, e. g.:


    [...]

    >> my $comp_lang_perl_misc
    >> = Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
    >> "equal-p" => sub { $_[0] eq $_[1]; } });


    [...]

    >> print ($comp_lang_perl_misc->get_range (28007));
    >> ## => 1, 28006, 28009


    [...]

    > Is this importantly different from Set::IntSpan/::Fast/::XS?


    Yes, although not in the case of .newsrc.

    First of all, each range (span, interval, etc.) may have an
    associated value. It's "1" (or "undef", for the "complementary"
    ranges) in the example above, but it needs not to be.

    Moreover, unlike Set::IntSpan, Tree::Range::RB is applicable to
    an arbitrary ordered set. (Note the "cmp" option above.)

    Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
    documentation) implemented on top of an ordered list seems to
    imply the insertion and deletion times of O (N), while for a
    red-black tree they're both O (log N). Though ::XS may still
    outperform a pure-Perl tree-based implementation on reasonable
    use cases.

    --
    FSF associate member #7257
    Ivan Shmakov, Jun 27, 2013
    #5
  6. Ivan Shmakov

    Ivan Shmakov Guest

    Benchmark-ing Tree::Range::RB?

    >>>>> Ivan Shmakov <> writes:

    [...]

    > Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
    > documentation) implemented on top of an ordered list seems to imply
    > the insertion and deletion times of O (N), while for a red-black tree
    > they're both O (log N). Though ::XS may still outperform a pure-Perl
    > tree-based implementation on reasonable use cases.


    So, I've ran a simplistic benchmark over the three, which is,
    roughly (the exact code is as shown below):

    # my ($size, $iter) = (4194304, 5);
    my @vec
    = (map { int (rand ($size)); } (1 .. $size));

    my $set
    = Set::IntSpan::Fast->new ();
    $set->add ($_)
    foreach (@vec);

    It took awhile to complete, but the results seem to suggest that
    Tree::Range::RB is indeed faster than both Set::IntSpan::Fast
    and Set::IntSpan, should the problem size be sufficiently large.

    Size: 4194304 Iterations: 5 (negative is time in seconds)
    Tree::Range::RB: 8003 wallclock secs (7944.34 usr 5.88 sys + 0.00 cusr 0.00 csys = 7950.22 CPU) @ 0.00/s (n=5)
    Set::IntSpan: 17759 wallclock secs (17556.78 usr 10.12 sys + 0.00 cusr 0.00 csys = 17566.90 CPU) @ 0.00/s (n=5)
    Set::IntSpan::Fast: 19318 wallclock secs (19060.33 usr 9.25 sys + 0.00 cusr 0.00 csys = 19069.58 CPU) @ 0.00/s (n=5)
    s/iter Set::IntSpan::Fast Set::IntSpan Tree::Range::RB
    Set::IntSpan::Fast 3814 -- -8% -58%
    Set::IntSpan 3513 9% -- -55%
    Tree::Range::RB 1590 140% 121% --
    44565.74user 26.12system 12:31:28elapsed 98%CPU (0avgtext+0avgdata 972212maxresident)k
    224inputs+0outputs (1major+316600minor)pagefaults 0swaps

    Or did I make some silly mistake with that?

    TIA.

    ### Code:

    use common::sense;

    use Benchmark qw (cmpthese timethis);
    use Scalar::Util qw (looks_like_number);

    require Set::IntSpan;
    require Set::IntSpan::Fast;
    require Tree::Range::RB;

    my ($size, $iter)
    = (map { (/^0/ ? oct ($_) : $_); } (@ARGV));
    $size
    //= 0x100000;
    $iter
    //= -521;
    looks_like_number ($size)
    or die ($size, ": Size given is not a number\n");
    looks_like_number ($iter)
    or die ($iter, ": Iterations count given is not a number\n");
    warn ("Size: ", 0 + $size, " Iterations: ", 0 + $iter,
    " (negative is time in seconds)\n");
    my @vec
    = (map { int (rand ($size)); } (1 .. $size));

    sub xcmp {
    ## .
    $_[0] <=> $_[1];
    }

    sub xeq {
    ## .
    $_[0] eq $_[1];
    }

    my $trr
    = timethis ($iter,
    sub {
    my $rat_opt = {
    "cmp" => \&xcmp,
    "equal-p" => \&xeq
    };
    my $value
    = [ "*value*" ];
    my $rat
    = Tree::Range::RB->new ($rat_opt);
    $rat->range_set ($_, 1 + $_, $value)
    foreach (@vec);
    },
    "Tree::Range::RB", "all");
    my $sis
    = timethis ($iter,
    sub {
    my $set
    = Set::IntSpan->new ();
    $set->insert ($_)
    foreach (@vec);
    },
    "Set::IntSpan", "all");
    my $sisf
    = timethis ($iter,
    sub {
    my $set
    = Set::IntSpan::Fast->new ();
    $set->add ($_)
    foreach (@vec);
    },
    "Set::IntSpan::Fast", "all");

    cmpthese ({
    "Set::IntSpan" => $sis,
    "Set::IntSpan::Fast" => $sisf,
    "Tree::Range::RB" => $trr
    },
    "all");

    --
    FSF associate member #7257 np. ybjsaf.it
    Ivan Shmakov, Jul 2, 2013
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Saikrishna
    Replies:
    0
    Views:
    623
    Saikrishna
    Apr 12, 2004
  2. Stub

    B tree, B+ tree and B* tree

    Stub, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    10,087
  3. Jacob Page

    ANN: interval module 0.2.0 (alpha)

    Jacob Page, Jul 10, 2005, in forum: Python
    Replies:
    0
    Views:
    285
    Jacob Page
    Jul 10, 2005
  4. Replies:
    1
    Views:
    308
    Diez B. Roggisch
    Oct 29, 2006
  5. Vít Ondruch

    Subclassing in module from top module?

    Vít Ondruch, Oct 12, 2009, in forum: Ruby
    Replies:
    3
    Views:
    108
    Vít Ondruch
    Oct 12, 2009
Loading...

Share This Page