V
Veli-Pekka Tätilä
Hi,
I've been asking a number of pretty generic but newbie questions on
achieving various things in Perl. Here's yet another one. I checked the faq
in perldoc but haven't Googled the archives yet:
What would be a good way to do an iterator for a sequence of values to be
returned, also called a generator, in Perl? That is for a sub-routine
returning a list of values in list context, have it return each element of
the list in scalar context followed by undef. For an analogy, you could
think of each in scalar context Vs. the list returned by keys. Iterators
would save a lot of memory and might be faster, too, if you'll need to
allocate huge lists. Not to mention infinite sequences of which you'd like
to iterate to some finite length determined at runtime. I think some Perl
primitives, such as a list of values generated by the range operator for a
foreach loop, are already optimized for memory efficiency. Python has the
Xrange construct, as well.
One choice that came to mind very first would be to have a block around a
sub-routine, and declare the state you need to maintain between sub calls
for generating the next value, in this block. By refering to the outerblock
in the inner, the variables in the outer are kept alive as long as the
program itself, kind of like static variables in C. The trouble I have with
this approach is that the contents remain even after n subroutine cals
eating memory. Could one simply undef the array or hash in question once the
sub hits the end of the list it's generating? I suppose so but havent'
tested it yet.
Another way of which I'm aware would be to use closures. Return a reference
to a hash of references to anonymous sub-routines. If these routines refer
to constructs in the outer sub-routine block, Perl will keep a set of the
values alive for each hash of subs returned. I feel like I don't really
understand how this works, that is is derived from Perl's scoping rules, but
I just know that this is what happens and have used it myself. A slight
niggle I have with the closure approach is that the syntax is a bit querky
and neither looks like a basic method or subroutine calle, because you'll
have to index to a hash and derefrence the value as a sub. By the hway,
could someone explain closures in detail or point me to good docs on the
Web? A tutorial, down-to-Earth approach would be best, I suppose.
The third choice would be to do the same thing as in the second case but as
an object. YOu could create the data in the constructor, have some kind of
iterator method and the object would be destroyed once there are no
references to it as usual. This is not a simpel sub-routine any more,
however.
Now that I started thinking of this, another problem with returning one
value at a time is that you cannot use this kind of iterator transparently
as a list. Even if a sub handling a list would need to look at only one or
two subsequent list items at once, you'll have to feed it the whole list in
the beginning of the sub-routine call, though you could pass a reference,
however. As to example functions, map needs only one list item at a time
while reduce needs two. Again, are there any good work-arounds for getting
around this limitation of iterators?
I recall reading somewhere that Perl 6 is going to have lazily expanding
lists which might be the answer. But howabout Perl 5, then? Of course you
could feed the sub-routine an iterator in stead but many subs, especially
the built-ins, like to deal with long lists of values.
Any help appreciated as usual. Feel free to point me to earlier threads as
well, though please give as direct links as you can.
I've been asking a number of pretty generic but newbie questions on
achieving various things in Perl. Here's yet another one. I checked the faq
in perldoc but haven't Googled the archives yet:
What would be a good way to do an iterator for a sequence of values to be
returned, also called a generator, in Perl? That is for a sub-routine
returning a list of values in list context, have it return each element of
the list in scalar context followed by undef. For an analogy, you could
think of each in scalar context Vs. the list returned by keys. Iterators
would save a lot of memory and might be faster, too, if you'll need to
allocate huge lists. Not to mention infinite sequences of which you'd like
to iterate to some finite length determined at runtime. I think some Perl
primitives, such as a list of values generated by the range operator for a
foreach loop, are already optimized for memory efficiency. Python has the
Xrange construct, as well.
One choice that came to mind very first would be to have a block around a
sub-routine, and declare the state you need to maintain between sub calls
for generating the next value, in this block. By refering to the outerblock
in the inner, the variables in the outer are kept alive as long as the
program itself, kind of like static variables in C. The trouble I have with
this approach is that the contents remain even after n subroutine cals
eating memory. Could one simply undef the array or hash in question once the
sub hits the end of the list it's generating? I suppose so but havent'
tested it yet.
Another way of which I'm aware would be to use closures. Return a reference
to a hash of references to anonymous sub-routines. If these routines refer
to constructs in the outer sub-routine block, Perl will keep a set of the
values alive for each hash of subs returned. I feel like I don't really
understand how this works, that is is derived from Perl's scoping rules, but
I just know that this is what happens and have used it myself. A slight
niggle I have with the closure approach is that the syntax is a bit querky
and neither looks like a basic method or subroutine calle, because you'll
have to index to a hash and derefrence the value as a sub. By the hway,
could someone explain closures in detail or point me to good docs on the
Web? A tutorial, down-to-Earth approach would be best, I suppose.
The third choice would be to do the same thing as in the second case but as
an object. YOu could create the data in the constructor, have some kind of
iterator method and the object would be destroyed once there are no
references to it as usual. This is not a simpel sub-routine any more,
however.
Now that I started thinking of this, another problem with returning one
value at a time is that you cannot use this kind of iterator transparently
as a list. Even if a sub handling a list would need to look at only one or
two subsequent list items at once, you'll have to feed it the whole list in
the beginning of the sub-routine call, though you could pass a reference,
however. As to example functions, map needs only one list item at a time
while reduce needs two. Again, are there any good work-arounds for getting
around this limitation of iterators?
I recall reading somewhere that Perl 6 is going to have lazily expanding
lists which might be the answer. But howabout Perl 5, then? Of course you
could feed the sub-routine an iterator in stead but many subs, especially
the built-ins, like to deal with long lists of values.
Any help appreciated as usual. Feel free to point me to earlier threads as
well, though please give as direct links as you can.