building generators in Perl

Discussion in 'Perl Misc' started by Rainer Weikusat, Feb 27, 2013.

  1. NB: This text is partially motivated by the fairly usesless (in this
    particular respect) coroutine, closure and generator articles in
    Wikipedia ("Perl supports this *if* you download the following 10E24
    modules ...").

    In case someone's not familiar with the concept, a 'generator' is
    something which, when invoked, returns the 'next value' from some
    ordered set of values. An extremly simple-minded example would be
    (using the 'state' feature)

    sub powers_of_2
    {
    state $next = 1;
    my $cur;

    $cur = $next;
    $next *= 2;

    return $cur;
    }

    which will return all members of the set of 'powers of 2' from first
    to last provided it is called an infinite number of times (or any
    finite subset of this set for a finite number of calls). This is
    obviously similar to a lazily evaluated sequence (a term I've read
    here and there).

    This is, of course, not very useful, as contrived examples are wont to
    be (and it is also less efficient than calculating the values in a
    loop).

    The most interesting feature of generators is that they can be
    used to arrange independent subroutines in a 'processing pipeline' in
    order to perform a multi-step transformation of some set of 'input
    data items' to a set of 'output data items' while isolating the code
    for each step in its own subroutine. A real-word example I had to deal
    with yesterday was in some code supposed to preprocess a set of 'port
    numbers' (as in 'TCP' or 'UDP') including port ranges such that the
    output set is composed of a set of distinct ports or port ranges which
    is equivalent (contains the same ports) to the input set. Further,
    while the input set (produced by some other, intermediate processing
    step) represented individual ports as 'ranges' with an identical start
    and end value, the output set should contain bare numbers for these
    cases and 'ranges' (represented as anonymous arrays with two elements)
    only when the corresponding input element was actually a range
    encompassing more than one port number.

    The first implementation of that looked like this:

    sub dedupe_ports
    {
    my ($cur, @in, @out);

    @in = sort { $a->[0] <=> $b->[0] } @{$_[0]};
    $cur = shift(@in);
    for (@in) {
    if ($_->[0] <= $cur->[1] + 1) {
    $cur->[1] = $_->[1] if $_->[1] > $cur->[1];
    next;
    }

    push(@out, $cur->[1] > $cur->[0] ? $cur : $cur->[1]);

    $cur = $_;
    }

    push(@out, $cur->[1] > $cur->[0] ? $cur : $cur->[1]);
    return \@out;
    }

    But I really didn't like the way the two separate processing steps of
    'merging overlapping ranges' and 'collapsing single-port ranges' where
    intermingled here, especially as I originally forgot the collapsing step
    for the duplicated final push.

    One way to deal with that is to use a generator to create a sequence
    of merged ranges and apply the 'collapsing' step to the set of output
    values produced by that. The subroutine included below creates a
    closure which is such a generator:

    sub range_merger
    {
    my (@in, $cur);

    @in = sort { $a->[0] <=> $b->[0] } @{$_[0]};
    $cur = shift(@in);
    return sub {
    my ($next, $res);

    $next->[1] > $cur->[1] and $cur->[1] = $next->[1]
    while ($next = shift(@in)) && $next->[0] <= $cur->[1] + 1;

    $res = $cur;
    $cur = $next;

    return $res;
    };
    }

    Whenever the returned closure is invoked, it will return the next
    isolated, distinct range (my English feels somewhat lacking here)
    aggregated from the elements of the anonymous array passed as input to
    the ranger_merger subroutine, until all elements of that have been
    processed. Afterwards, it will return an undefined value. This
    generator can now be used by the collapsing subroutine,

    sub dedupe_ports
    {
    my (@out, $merge_ranges, $range);

    $merge_ranges = &range_merger;

    push(@out, $range->[1] > $range->[0] ? $range : $range->[0])
    while $range = $merge_ranges->();
    return \@out;
    }

    in order to calculate the desired set of output values without code
    duplication/ special-casing the final element case.
     
    Rainer Weikusat, Feb 27, 2013
    #1
    1. Advertisements

  2. This is a slightly different way to represent a lazily evaluated
    infinite sequence (which was not the point of my text) and actually, I
    have 'Higher Order Perl' (I assume you'll understand the reference)
    sitting on a bookshelf behind me. But that's another book full of
    contrived examples and usually, badly written ones as well. Also, at
    least judgeing from the index, it doesn't refer to generators at all.
    Lastly, I strongly suspect that most people neither own the book nor
    know what happened to be published in the Perl Journal sixteen years
    ago (taking Wikipedia as measure of common knowledge, this seems to be
    a certainty).

    Apart from that, what's the point of your posting? This stuff was by
    no means new by the time Mr Dominus got paid for writing articles
    about it.
     
    Rainer Weikusat, Feb 27, 2013
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.