python generators to ruby closures, help

Z

zuzu

can anyone help me decode the following python example using
generators into The Ruby Way using closures?

thanks,
-z

http://c2.com/cgi/wiki?GeneratorsInPython

You can emulate the unix-style shell piping with generators.

from __future__ import generators

class GenPipe?:
def __init__(self, generator):
self.gen = generator #sanity check omitted
def __or__(self,nextPipe):
self.gen=nextPipe._run(self.gen)
return self
def __iter__(self):
return self.gen



class PipeFilter?:
def __init__(self):
self._followingPipes=[]
def __or__(self,nextPipe):
self._followingPipes+=(nextPipe,)
return self
def _run(self,gen):
gen=self.run(gen)
while self._followingPipes:
gen=self._followingPipes[0].run(gen)
self._followingPipes.pop(0)
return gen
def run(self,gen):
raise NotImplementedError?



class Test(PipeFilter?):
def __init__(self,test):
PipeFilter?.__init__(self)
self.test=test
def run(self,gen):
for x in gen:
if self.test(x):
yield x



class Quit(PipeFilter?):
def __init__(self,test):
PipeFilter?.__init__(self)
self.test=test
def run(self,gen):
for x in gen:
if self.test(x):
raise StopIteration?
else:
yield x



class Count(PipeFilter?):
def __init__(self,n):
PipeFilter?.__init__(self)
self.n=n
def run(self,gen):
for x in gen:
yield x
self.n -= 1
if self.n == 0 :
break



def Ints():
n = 0
while 1:
yield n
n += 1



def Files( start ):
import os
for file in os.listdir( start ):
file = os.path.join( start, file )
if os.path.isfile( file ): yield file
elif os.path.isdir(file):
for more in Files( file ):
yield more




isPyFile=Test(lambda s:s.split('.')[-1].lower()=='py')
tenPyFiles=GenPipe?(Files('/'))|isPyFile|Count(10)
for each in tenPyFiles:
print each

-- JuneKim

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/21900

Subject: [ruby-talk:21900] Re: Python generators
From: Will Conant <WillC webmiles.com>
Date: Tue, 2 Oct 2001 08:18:10 +0900

Generators are a nifty feature for Python 2.2, but they are a solution to
Python problems... problems that Ruby doesn't have.

In Ruby, it is trivial to pass a closure that interacts with the scope that
created it to another function. Thus, callbacks in Ruby are sufficiently
useful to solve the problems that Python generators were designed to solve.

The example that I read somewhere on the Python site was the tokenizer
problem. The original approach to the tokenizer problem was to have a
function that took two parameters as input: the text source and a function
pointer to a callback that processed the tokens. The problem with this
approach was that the callback function, designed by the user, didn't have a
very good way of keeping state between calls. It could use globals, but that
would suck.

The other approach was to write a tokenizer object that kept its state and
had a "nextToken" function that looked for the next token in the stream and
returned it to the caller. This solution hoisted the state problem off on
the tokenizer instead of the user. Still a problem especially considering
the difficulty of unrolling very complex loops of logic.

For Python, where closures aren't that easy to use, the generator is the
ideal solution. This way, you write your tokenizer function as if were to
use a callback so you don't have to unroll your loop. However, your function
gets wrapped up in an iterator object, so the user uses it as if it were
implemented as a Tokenizer object with a "nextToken" function.

In Ruby, you could probably emulate this behavior directly, but you really
wouldn't need to. Instead, you would just write your tokenizer to use a
callback, and your callback method would be a local closure with local state
persistant between calls. Problem solved.

Anyway, that's the story. Generators are a whitespace friendly solution to a
few of the problems that Ruby's super-duper closure mechanism solves just
fine.

-Will Conant
 
M

Michael Neumann

zuzu said:
can anyone help me decode the following python example using
generators into The Ruby Way using closures?

I'm sure the Ruby version must be human-readable... maybe you can
describe in short words, what the Python example tries to achive?

You might have a look at generator.rb which ships with Ruby and
implements generators or "streams" with continuations.

Regards,

Michael
 
Z

zuzu

I'm sure the Ruby version must be human-readable... maybe you can
describe in short words, what the Python example tries to achive?

actually i'm not much of a python programmer; stopped learning it once
i sunk my teeth into ruby. so other than the generic "to implement
unix-style pipes and filters", i'm not really sure *how* that was
working (and hoping someone here was python-ruby bilingual).
You might have a look at generator.rb which ships with Ruby and
implements generators or "streams" with continuations.

however, armed with this newly acquired knowledge, perhaps that's all
i needed to know. thanks!
Regards,

Michael

peace,
-z
 
M

Michael Neumann

zuzu said:
can anyone help me decode the following python example using
generators into The Ruby Way using closures?

Okay, here is it ;-)

One problem is, that we can't detect the usage of "break" inside the
proc, as we're using proc's and not blocks. My workaround is, that a
return value of nil of a proc.call breaks the generator at this position
(so make sure to use false and not nil, e.g. "val.nil?" instead just
"val"!).

Regards,

Michael


#######
require 'generator'
require 'find'

alias filter lambda

class Generator
def |(filter)
self.class.new do |g|
self.each { |elem|
val = filter.call(elem)
break if val.nil?
g.yield(elem) if val
}
end
end

def inspect
to_a.inspect
end
end

class Proc
def |(otherProc)
proc {|*a|
x, y = self.call(*a), otherProc.call(*a)
break nil if x.nil? or y.nil?
x & y
}
end
end

def count(n)
filter {|i|
break if n == 0
n -= 1
i
}
end

def quit(&block)
filter {|*a|
break if block.call(*a)
true
}
end

def gen(&block)
Generator.new(&block)
end

def ints
gen {|g|
n = 0
loop do
g.yield n
n += 1
end
}
end


##### Test

odd = filter {|x| x % 2 == 1}
notDivBy3 = filter {|x| x % 3 != 0}
oddNotDivBy3 = odd|notDivBy3
firstTen = count(10)
result = ints | firstTen | oddNotDivBy3
p result

oddLowerThanTen = odd | quit {|y| y >= 10}
p3 = ints | oddLowerThanTen | notDivBy3
p p3

isRubyFile = filter {|file| File.extname(file).downcase == '.rb'}
files = gen {|g| Find.find('.') {|file| g.yield file} }
tenRubyFiles = files | isRubyFile | count(10)
p tenRubyFiles
 
W

William Morgan

Excerpts (reformatted) from zuzu's mail of 26 Aug 2004 (EDT):
http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html

am i correct in assuming that "makes an internal iterator external" is
just another way of saying "turns a subroutine into a continuation",
"using yield rather than return"? or aka, a function to continuation
translator?

I don't think that's right.T he goal of Generator is something much more
specific: to turn an internal iterator, i.e. a method that takes a block
argument, into an external iterator, i.e. an object with explicit #next?
and #next methods.

Think about trying to interlace two Enumerables, e.g. turn ["a","b","c"]
and [1,2,3] into ["a",1,"b",2,"c",3]. You can't use Enumerable#each (an
internal iterator) to do this in a natural way. Instead, you'd use
Generator to create an object for each Enumerable and explicitly have a
loop that called #next alternately on both objects.

Ruby's internal iterators are far, far friendlier to use than, say,
Java's external iterators (writing for loops really is a pain) but there
are some problems they just don't yield (hah hah) natural solutions to.

Sorry if that isn't what you were asking. I don't know whether this is
related to Python's "generator" objects at all.
 
Z

zuzu

Excerpts (reformatted) from zuzu's mail of 26 Aug 2004 (EDT):
http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html

am i correct in assuming that "makes an internal iterator external" is
just another way of saying "turns a subroutine into a continuation",
"using yield rather than return"? or aka, a function to continuation
translator?

I don't think that's right.T he goal of Generator is something much more
specific: to turn an internal iterator, i.e. a method that takes a block
argument, into an external iterator, i.e. an object with explicit #next?
and #next methods.

Think about trying to interlace two Enumerables, e.g. turn ["a","b","c"]
and [1,2,3] into ["a",1,"b",2,"c",3]. You can't use Enumerable#each (an
internal iterator) to do this in a natural way. Instead, you'd use
Generator to create an object for each Enumerable and explicitly have a
loop that called #next alternately on both objects.

Ruby's internal iterators are far, far friendlier to use than, say,
Java's external iterators (writing for loops really is a pain) but there
are some problems they just don't yield (hah hah) natural solutions to.

Sorry if that isn't what you were asking. I don't know whether this is
related to Python's "generator" objects at all.

really my big question is how to use ruby for concurrent execution of
unix-style pipes and filters chaining objects together as in the
actors model (flow-based programming, aka dataflow). by definition of
the actors model, this requires the use of continuations and bounded
buffers (partial ordering) rather than threads; the idea being that
you never have to manually manage (dead)locks much as a GC means never
manually managing memory. (even the best programmers tend to suck at
handcrafting memory management and thread locks; 99% of the time it's
best left to the computer.)

i've got data being generated all over the place. logfiles, web
pages, file transfers, encryption/decryption of data, compression of
data, etc. etc. some objects will generate data, with their outputs
named by their methods. other objects will input data, filter it, and
output the results, also named (as "ports) with methods. in the end
some objects will consume that data, such as a framebuffer to display
a gui or perhaps an IP:port on the internet to transfer files to. i
would like ultimately to simply pipe these objects together and have
them concurrently process their dataflow. this is basically what the
actors model is all about. (it's also what unix streams / pipes &
filters are all about.)

hence my interest in generators and streams in ruby.

does that present a clearer contextual picture?

peace,
-z
 
Z

zuzu


this looks quite straight-forward, thank you. as with Generator, i've
long suspected this would involve some inherit tweaking of Enumerable,
and all the implementations of this genre seem to do just that.

in the quote below, this is basically "lazy evaluation", right?
(which i think is "by definition" with asynchronous continuations
rather than synchronous returns.)



Using #pipe has potential performance advantages. The iteration

e.collect { |x| x.m }.each { |y| ... }


can be rewritten as

e.pipe:)m).each { |y| ... }


which doesn't generate an intermediate array, and uses a send instead
of a proc call. Of course, it could also be written as

e.each { |x| y = x.m ... }


but that may be undesirable for readability or because the block is to
be taken from a proc contained in a variable:

pr = proc { ... }
e.pipe:)m).each &pr
 
W

William Morgan

Excerpts (reformatted) from zuzu's mail of 26 Aug 2004 (EDT):
hence my interest in generators and streams in ruby.

does that present a clearer contextual picture?

Yes. I don't think you want the Generator class at all then. As the
email you quoted said, Python generators are a solution to a problem
that Ruby doesn't have. Generators have a place in Ruby but this ain't
it.

From what I can tell, all the functionality they build up in the Python
example is just so that they can do thing that in Ruby can be done
neatly with iterators, like this:

odd = proc { |x| (x % 3) == 0 }
div_by_3 = proc { |x| (x % 3) == 0 }
result = (1 .. 10).reject(odd).reject(div_by_3).each { |x| print x }

which has a "pipey" feel to it already, and with no preexisting work.

If you really want all the syntactic sugar that they have, I would say
something like this:

class Pipe
def initialize(&proc)
@proc = proc
end

attr_reader :proc

def +(other)
Pipe.new { |x| self.proc[x] || other.proc[x] }
end
end

module Enumerable
def drain(pipe)
self.reject { |x| pipe.proc[x] }
end
end

Then you can do cool stuff like this:

odd = Pipe.new { |x| (x % 2) == 0 }
not_div_3 = Pipe.new { |x| (x % 3) == 0 }
gt_5 = Pipe.new { |x| x <= 5 }

p = odd + not_div_3 + gt_5
(1 .. 20).drain p # => [7, 11, 13, 17, 19]

is_rb_file = Pipe.new { |s| s !~ /\.rb$/ }
first_ten = proc do # here we have to use a closure
i = 0
Pipe.new { (i += 1) > 10 }
end.call

Dir.entries(".").drain is_rb_file + first_ten # => the right thing

... which replicates the Python functionality, but without being quite
so ugly. I personally find "positive pipes" a little easier than
"negative pipes", but I've kept with their schemes for now.

HTH.
 
J

Joel VanderWerf

zuzu said:
this looks quite straight-forward, thank you. as with Generator, i've
long suspected this would involve some inherit tweaking of Enumerable,
and all the implementations of this genre seem to do just that.

in the quote below, this is basically "lazy evaluation", right?
(which i think is "by definition" with asynchronous continuations
rather than synchronous returns.)

I guess it's lazy evaluation, in the sense that no intermediate array is
generated. The upstream iteration step happens only when the downstream
iteration step asks for the next object.

Note that this #pipe method has the advantage of not having the
(constant time) overhead of creating a continuation, but it has the
disadvantage of being an internal (did I get that right?) iterator, just
like #each or any other of the usual Enumerable methods.

Oddly, I wrote this code several years ago, but have never used it for
some reason. I have to thank Martin for even remembering that it existed.
 
J

Jim Weirich

--------------080802090605060207000206
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

William said:
Yes. I don't think you want the Generator class at all then. As the
email you quoted said, Python generators are a solution to a problem
that Ruby doesn't have. Generators have a place in Ruby but this ain't
it.

From what I can tell, all the functionality they build up in the Python
example is just so that they can do thing that in Ruby can be done
neatly with iterators, like this:

As William pointed out, enumerator pipes can be implemented in Ruby
without resorting to continuations, and that the current enumerator
syntax is very pipe-like in many ways.

However, one thing the Python version handles that enumerator chaining
does not is an infinite enumerator. In the python examples, we have
the following construct:

Ints() | firstTen | oddNotDivBy3

Ints() is a generator that will generate all the integers (given
enough time and space). The next filter "firstTen" only takes the
first 10 of those numbers. Since the generator chain only takes a
number from a generator when it is needed, the Ints() generator is
never asked for any more than 10 number.

We can do the same with Ruby enumerators. Imagine a OddNumberFilter
object that implements a each method like this ...

class OddNumberFilter
include Enumerable
attr_accessor :source
def each
@source.each { |n| yield(n) if (n % 2) != 0 }
end
end

It uses a source enumerable to generate numbers and only yields when
it finds an odd number. We can use it like this ...

odds_only = OddNumberFilter.new
odds_only.source = (1..10)
odds_only.to_a # => [1, 3, 5, 7, 9]

Since a filter can take _any_ enumerable as a source, it can even take
other filters. Let's generalize our OddNumberFilter class to a
CondFilter class that takes a condition and only passes items that
pass the condition.

class CondFilter
include Enumerable
attr_accessor :source
def initialize(&block)
@block = block
end
def each
@source.each { |it| yield(it) if @block.call(it) }
end
end

Now we create a chain of filters ...

odds_only = CondFilter.new { |n| (n % 2) != 0 }
odds_only.source = (1..10)

divisible_by_three = CondFilter.new { |n| (n % 3) == 0 }
divisible_by_three.source = odds_only

divisible_by_three.to_a #=> [3, 9]

It works, but it is ugly. So lets create a little syntactic sugar to
get this to work ...

module Enumerable
def |(filter)
filter.source = self
filter
end
end

odds_only = CondFilter.new { |n| (n%2) != 0 }
divisible_by_three = CondFilter.new { |n| (n%3) == 0 }

pipe = (1..10) | odds_only | divisible_by_three
pipe.to_a # => [3, 9]

With this foundation, handling infinite enumerations are easy ...

class Ints
include Enumerable
def initialize
@n = 0
end
def each
loop do
yield @n
@n += 1
end
end
end

And a class to take only so many items ...

class Take
include Enumerable
attr_accessor :source

def initialize(limit)
@limit = limit
end
def each
n = 0
@source.each do |it|
n += 1
break if n > @limit
yield it
end
end
end

( Ints.new | Take.new(5) ).to_a # => [0, 1, 2, 3, 4]

From this point the File generator and the other stuff in the Python
example should be obvious.

Here's a fun puzzle. Using filters, create a Sieve of Erasothanes
filter that, given an input of Ints.new, will generate a series of
prime numbers. In other words, the code ...

( Ints.new | Primes.new | Take.new(100) ).to_a

will create an array of the first 100 prime numbers.

I've attached full examples and test suites for all the the above
(although factored a bit differently), so don't peek if you want to work
out the Primes puzzle ...

--
-- Jim Weirich (e-mail address removed) http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

--------------080802090605060207000206
Content-Type: text/plain;
name="enumpipe.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="enumpipe.rb"

#!/usr/bin/env ruby

require 'find'

module Enumerable
def |(filter)
filter.source = self
filter
end
end

class EmptyEnumClass
include Enumerable
def each() end
end

EmptyEnum = EmptyEnumClass.new

class EnumFilter
include Enumerable
attr_reader :source
def initialize(source=nil)
@source = source || EmptyEnum
end
def source=(new_source)
case @source
when EnumFilter
@source.source = new_source
else
@source = new_source
end
end
def each(&block)
@source.each(block)
end
end

class CondFilter < EnumFilter
def initialize(&block)
super()
@block = block
end
def each
@source.each do |it| yield it if @block[it] end
end
end

class Ints
include Enumerable
def initialize
@n = 0
end
def each
loop do
yield @n
@n += 1
end
end
end

class Quit < EnumFilter
def initialize(&block)
@block = block
end
def each
@source.each do |it|
break if @block[it]
yield it
end
end
end

class Start < EnumFilter
def initialize(start)
@start = start
end
def each
@source.each do |it|
yield it if it >= @start
end
end
end

class Take < EnumFilter
def initialize(limit)
@limit = limit
end
def each
n = 0
@source.each do |it|
break if n >= @limit
yield it
n += 1
end
end
end

class EnumFiles
include Enumerable
def initialize(fn)
@file_name = fn
end
def each
Find.find(@file_name) do |fn| yield fn end
end
end

class Primes < EnumFilter
def initialize
super
@source = Start.new(2)
end
def each
loop do
n = (@source | Take.new(1)).to_a[0]
yield n
@source = @source | new_filter(n)
end
end
def new_filter(n)
CondFilter.new { |it| (it%n) != 0 }
end
end

--------------080802090605060207000206
Content-Type: text/plain;
name="testenumpipe.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="testenumpipe.rb"

#!/usr/bin/env ruby

require 'test/unit'
require 'enumpipe'

class TestEnumPipe < Test::Unit::TestCase

def test_create
ep = CondFilter.new
assert ep
end

def test_odd
odd_filter = CondFilter.new { |it| (it % 2) != 0 }
odd_filter.source = (1..10)
assert_equal [1,3,5,7,9], odd_filter.to_a
end

def test_no_source
odd_filter = CondFilter.new { |it| (it % 2) != 0 }
assert_equal [], odd_filter.to_a
end

def test_two_filters
odd_filter = CondFilter.new { |it| (it % 2) != 0 }
filter_thirds = CondFilter.new { |it| (it % 3) == 0 }
odd_filter.source = (1..10)
filter_thirds.source = odd_filter
assert_equal [3, 9], filter_thirds.to_a
end

def test_left_pipeing
odd_filter = CondFilter.new { |it| (it % 2) != 0 }
filter_thirds = CondFilter.new { |it| (it % 3) == 0 }
filter_thirds.source = odd_filter
filter_thirds.source = (1..10)
assert_equal [3, 9], filter_thirds.to_a
end

def test_pipe_syntax
pipe = (1..10) | CondFilter.new { |it| (it%2) != 0 }
assert_equal [1,3,5,7,9], pipe.to_a
end

def test_pipe_three
src = (1..10)
odd = CondFilter.new { |it| (it%2) != 0 }
thirds = CondFilter.new { |it| (it%3) == 0 }
pipe = src | odd | thirds
assert_equal [3, 9], pipe.to_a
end

def test_pipe_right_associative
src = (1..10)
odd = CondFilter.new { |it| (it%2) != 0 }
thirds = CondFilter.new { |it| (it%3) == 0 }
pipe = src | (odd | thirds)
assert_equal [3, 9], pipe.to_a
end

def test_infinite_takes
pipe = Ints.new | Take.new(10)
assert_equal (0...10).to_a, pipe.to_a
end

def test_multiple_takes_on_ints
pipe = Ints.new | Take.new(2)
assert_equal [0,1], pipe.to_a
assert_equal [2,3], pipe.to_a
assert_equal [4,5], pipe.to_a
end

def test_quit_filter
pipe = Ints.new | Quit.new { |it| it > 5 }
assert_equal [0,1,2,3,4,5], pipe.to_a
end

def test_start
pipe = Ints.new | Start.new(5) | Take.new(3)
assert_equal [5,6,7], pipe.to_a
end

def test_files
files = EnumFiles.new(".")
filelist = files.to_a
assert filelist.member?("./enumpipe.rb")
assert filelist.member?("./testenumpipe.rb")
end

def test_file_filter
testfiles = EnumFiles.new('.') | CondFilter.new { |fn| File.basename(fn) =~ /^test.*\.rb$/ }
filelist = testfiles.to_a
assert filelist.member?("./testenumpipe.rb")
end

def test_primes
primes = Ints.new | Primes.new | Take.new(10)
assert_equal [2,3,5,7,11,13,17,19,23,29], primes.to_a
end

end

--------------080802090605060207000206--
 
F

Florian Gross

William said:
Think about trying to interlace two Enumerables, e.g. turn ["a","b","c"]
and [1,2,3] into ["a",1,"b",2,"c",3]. You can't use Enumerable#each (an
internal iterator) to do this in a natural way.

But you can use Enumerable#zip -- but you're still right in that
internal iterators aren't the most natural solution for everything.

Here's the #zip solution which shows that you can use it for iterating
over multiple Enumerables in parallel:

irb(main):002:0> ["a", "b", "c"].zip([1, 2, 3]) do |(string, number)|
irb(main):003:1* puts "#{string} => #{number}"
irb(main):004:1> end
a => 1
b => 2
c => 3

What this won't let you do however is skipping elements from only one of
the Enumerables.

Regards,
Florian Gross
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,001
Latest member
Kendra00E1

Latest Threads

Top