[SUMMARY] SerializableProc (#38)

R

Ruby Quiz

The solutions this time show some interesting differences in approach, so I want
to walk through a handful of them below. The very first solution was from Robin
Stocker and that's a fine place to start. Here's the class:

class SerializableProc

def initialize( block )
@block = block
# Test if block is valid.
to_proc
end

def to_proc
# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end

def method_missing( *args )
to_proc.send( *args )
end

end

It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source is
just a String so we can serialize that with ease and we can always create a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:

Proc.new {
puts "Hello world!"
}

To this:

SerializableProc.new %q{
puts "Hello world!"
}

In the first pure Ruby version we're building a Proc with the block of code to
define the body. In the second SerializableProc version, we're just passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:

Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :p)...

We'll get back to the lack of closures issue later, but I found the "inverse
quality of your editor" claim interesting. The meaning is that a poor editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a String is
being entered, it may continue to syntax highlight the code inside. Of course,
you could always remove the %q whenever you want to see the code highlighting,
but that's tedious.

Getting back to Robin's class, initialize() just stores the String and creates a
Proc from it so an Exception will be thrown at construction time if fed invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing makes
SerializableProc behave close to a Proc. Anytime it sees a method call that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.

We don't see anything specific to Serialization in Robin's code, because both
Marshal (PStore uses Marshal) and YAML can handle a custom class with String
instance data. Like magic, it all just works.

Robin had a complaint though:

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.

The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.

My own solution (and others), do cache the Proc and define some custom dump
methods. Let's have a look at how something like that comes out:

class SerializableProc
def self._load( proc_string )
new(proc_string)
end

def initialize( proc_string )
@code = proc_string
@proc = nil
end

def _dump( depth )
@code
end

def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end
end

def to_proc( )
return @proc unless @proc.nil?

if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end
end

def to_yaml( )
@proc = nil
super
end
end

My initialize() is the same, save that I create a variable to hold the Proc
object and I wasn't clever enough to trigger the early Exception when the code
is bad. My to_proc() looks scary but I just try to accept a wider range of
Strings, wrapping them in only what they need. The end result is the same.
Note that any Proc created is cached. My method_missing() is also very similar.
If the Proc object responds to the method, it is forwarded. The first line of
method_missing() calls to_proc() to ensure we've created one. After that, it
can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to support
Marshal. First, _dump() is expected to return a String that could be used to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in this
case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected by the
Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor and
never has to recreate it:

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

attr_reader :proc_src

def initialize(proc_src)
super(eval("Proc.new { #{proc_src} }"))
@proc_src = proc_src
end

def ==(other)
@proc_src == other.proc_src rescue false
end

def inspect
"#<SProc: #{@proc_src.inspect}>"
end
alias :to_s :inspect

def marshal_dump
@proc_src
end

def marshal_load(proc_src)
initialize(proc_src)
end

def to_yaml(opts = {})
YAML::quick_emit(self.object_id, opts) { |out|
out.map("!rubyquiz.com,2005/SProc" ) { |map|
map.add("proc_src", @proc_src)
}
}
end

end

YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
SProc.new(val["proc_src"])
}

Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is defined to
inherit from DelegateClass(Proc), which sets a type for the object so delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass. That's the
instance that will receive forwarded messages. Dominik also defined a custom
==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the same
thing I did, as you can see. The YAML support is different. SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in, whenever
it is encountered. Here's what the class looks like when it's resting in a YAML
file:

!rubyquiz.com,2005/SProc
proc_src: |2-
|*args|
puts "Hello world"
print "Args: "
p args

The advantage here is that the YAML export procedure never touches the Proc so
it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely different
road to solving the problem. Time and space don't allow me to recreate and
annotate the code here, but Florian described the premise well in the submission
message:

I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.

Usage:

code = lambda { puts "Hello World" }
puts code.source
Marshal.load(Marshal.dump(code)).call
YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that
source, and even implements a partial Ruby parser with standard libraries. I'm
telling you, that code reads like a good mystery novel for programmers. Don't
miss it!

One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43

An excellent point. These toys we're creating have serious limitations to be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly serialized.

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...
 
W

why the lucky stiff

Ruby said:
My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.
Good stuff, JEGII, Robin, Chris2, Dave.

I can also really sympathize with Chris' disgust over the
YAML.add_ruby_type methods... It is undergoing deprecation in favor of:

class SerializableProc
yaml_type "tag:rubyquiz.org,2005:SerializableProc"
end

_why
 
W

why the lucky stiff

Christian said:
And then #yaml_dump and #yaml_load? That would rule.

Class.yaml_new or Object.yaml_initialize. And Object.to_yaml.

If folks prefer the Marshal setup, though, I'll change it. It's only
been like this for a handful of minor releases.

_why
 
J

Jeffrey Moss

Has anybody thought about serialized enclosures? I was thinking of a way to
use enclosures across multiple apache requests, and came to the conclusion
that it was too much trouble. In this case I just use a standard proc object
and it gets re-initialized on each requests and don't serialize it, but I
always thought it would be nice to maintain some sort of persistent state
across requests.

Wouldn't it be possible to write a C extension for serializable closures?

-Jeff

----- Original Message -----
From: "Ruby Quiz" <[email protected]>
To: "ruby-talk ML" <[email protected]>
Sent: Thursday, July 14, 2005 6:51 AM
Subject: [SUMMARY] SerializableProc (#38)

The solutions this time show some interesting differences in approach, so
I want
to walk through a handful of them below. The very first solution was from
Robin
Stocker and that's a fine place to start. Here's the class:

class SerializableProc

def initialize( block )
@block = block
# Test if block is valid.
to_proc
end

def to_proc
# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end

def method_missing( *args )
to_proc.send( *args )
end

end

It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source
is
just a String so we can serialize that with ease and we can always create
a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:

Proc.new {
puts "Hello world!"
}

To this:

SerializableProc.new %q{
puts "Hello world!"
}

In the first pure Ruby version we're building a Proc with the block of
code to
define the body. In the second SerializableProc version, we're just
passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:

Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :p)...

We'll get back to the lack of closures issue later, but I found the
"inverse
quality of your editor" claim interesting. The meaning is that a poor
editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a
String is
being entered, it may continue to syntax highlight the code inside. Of
course,
you could always remove the %q whenever you want to see the code
highlighting,
but that's tedious.

Getting back to Robin's class, initialize() just stores the String and
creates a
Proc from it so an Exception will be thrown at construction time if fed
invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing
makes
SerializableProc behave close to a Proc. Anytime it sees a method call
that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.

We don't see anything specific to Serialization in Robin's code, because
both
Marshal (PStore uses Marshal) and YAML can handle a custom class with
String
instance data. Like magic, it all just works.

Robin had a complaint though:

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.

The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.

My own solution (and others), do cache the Proc and define some custom
dump
methods. Let's have a look at how something like that comes out:

class SerializableProc
def self._load( proc_string )
new(proc_string)
end

def initialize( proc_string )
@code = proc_string
@proc = nil
end

def _dump( depth )
@code
end

def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end
end

def to_proc( )
return @proc unless @proc.nil?

if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end
end

def to_yaml( )
@proc = nil
super
end
end

My initialize() is the same, save that I create a variable to hold the
Proc
object and I wasn't clever enough to trigger the early Exception when the
code
is bad. My to_proc() looks scary but I just try to accept a wider range
of
Strings, wrapping them in only what they need. The end result is the
same.
Note that any Proc created is cached. My method_missing() is also very
similar.
If the Proc object responds to the method, it is forwarded. The first
line of
method_missing() calls to_proc() to ensure we've created one. After that,
it
can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to
support
Marshal. First, _dump() is expected to return a String that could be used
to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in
this
case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the
super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected
by the
Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor
and
never has to recreate it:

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

attr_reader :proc_src

def initialize(proc_src)
super(eval("Proc.new { #{proc_src} }"))
@proc_src = proc_src
end

def ==(other)
@proc_src == other.proc_src rescue false
end

def inspect
"#<SProc: #{@proc_src.inspect}>"
end
alias :to_s :inspect

def marshal_dump
@proc_src
end

def marshal_load(proc_src)
initialize(proc_src)
end

def to_yaml(opts = {})
YAML::quick_emit(self.object_id, opts) { |out|
out.map("!rubyquiz.com,2005/SProc" ) { |map|
map.add("proc_src", @proc_src)
}
}
end

end

YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
SProc.new(val["proc_src"])
}

Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is
defined to
inherit from DelegateClass(Proc), which sets a type for the object so
delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass.
That's the
instance that will receive forwarded messages. Dominik also defined a
custom
==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the
same
thing I did, as you can see. The YAML support is different.
SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in,
whenever
it is encountered. Here's what the class looks like when it's resting in
a YAML
file:

!rubyquiz.com,2005/SProc
proc_src: |2-
|*args|
puts "Hello world"
print "Args: "
p args

The advantage here is that the YAML export procedure never touches the
Proc so
it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely
different
road to solving the problem. Time and space don't allow me to recreate
and
annotate the code here, but Florian described the premise well in the
submission
message:

I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.

Usage:

code = lambda { puts "Hello World" }
puts code.source
Marshal.load(Marshal.dump(code)).call
YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture
that
source, and even implements a partial Ruby parser with standard libraries.
I'm
telling you, that code reads like a good mystery novel for programmers.
Don't
miss it!

One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are
"closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43

An excellent point. These toys we're creating have serious limitations to
be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly
serialized.

My thanks to all who committed such wonderful code and discussion to this
week's
quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top