Marshal.load does not create new instances?

I

Ian Trudel

Marshal does not seem to instantiate given class(es) on load. Moreover,
it will absolutely work as long as the class is define but even if it
has no attributes. The following code snippet creates an array with
filled with objects:

class Data
end

data = File.open("data.bin", "rb") { |f| Marshal.load(f) }

This absolutely works fine even if the Data class used to dump (e.g.
from another program) has many instance variables. They just won't be
accessible directly (but still can be accessible using reflection).
Inspect would show something like <Data:0x2b24088 @name="Account"
@expanded=true ..>. Furthermore, it doesn't seem to use the defined
class if an IO is fed in Marshal.load, thus overriding Data._load just
won't work. Is that the expected behaviour?

This is really annoying considering that I would like to load data onto
one (among many) version of Data class. The version of the class to be
used is determined and set at run-time according to the file loaded;
this is my ultimate goal. There is obviously no cast feature in Ruby.
Even using a proxy won't cut it, if only for the fact that Marshal
doesn't instantiate loaded data.

Any suggestions?

Ian
 
7

7stud --

Ian said:
Marshal does not seem to instantiate given class(es) on load. Moreover,
it will absolutely work as long as the class is define

Seems pretty standard across programming languages.
This is really annoying considering that I would like to load data onto
one (among many) version of Data class. The version of the class to be
used is determined and set at run-time according to the file loaded;
this is my ultimate goal.

Any suggestions?

How about something like this:

-----------
#a program that dumps an object:

class MyData
def greet
puts "hello"
end
end

d = MyData.new
File.open("def1.txt", "w") do |f|
Marshal.dump(d, f)
end
------------

#program that loads the objects:

def1 = <<-def1
class MyData
def greet
puts "hello"
end
end
def1

def2 = <<-def2
class MyData
def greet
puts "goodbye"
end
def shout
puts "HEY"
end
end
def2

def3 = <<-def3
class MyData
def greet
puts "last class"
end
def cry
"Wahhhh wahhh"
end
end
def3

data_classes = {
"def1" => def1,
"def2" => def2,
"def3" => def3
}

print "Enter file name: "
fname = gets.chomp
defname = fname.split(".")[0]
eval(data_classes[defname])

begin
File.open(fname) do |f|
d = Marshal.load(f)
d.greet
d.shout
d.cry
end
rescue NoMethodError
#do nothing
ensure
f.close unless f.nil?
end
 
I

Ian Trudel

7stud said:
How about something like this:
#program that loads the objects:

def1 = <<-def1
class MyData
def greet
puts "hello"
end
end
def1

Your solution seems great (and it works). However, my problem with it is
the necessity to load and eval a class from a heredoc. It would be fine
as long as it is small classes but it won't cut it for lengthy and
numerous classes. I am afraid that it will make development and testing
cycle somehow harder, if only for the fact that I won't have the support
of my favourite IDE since it is treated as text.

I was hoping to have an object-oriented solution, for example, where I
could have a proxy, forwarder/delegator, or even subclass delegation.
These actually did work as long as I don't Marshal.load. Your neat trick
with heredoc and eval would be better used for smaller needs, I think.

Any more suggestion?
 
P

Pit Capitain

2009/2/28 Ian Trudel said:
(...)
Any more suggestion?

Ian, I'm not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

Regards,
Pit
 
R

Robert Klemme

2009/2/28 Pit Capitain said:
Ian, I'm not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

I believe he wants to evolve the class and be able to load data
written with an older (or just different) version of the class. And
now Ian hits the usual problems of schema migration.

Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this. And there
are probably good reasons (security, efficiency probably).

You can use tricks as 7stud suggested although I feel wary about this.
I would probably choose a different solution based on the
requirements (which are not fully clear to me). If you just need
changing sets of attributes then these options might work:

1. use OpenStruct
2. use Hash
3. change your class Data to store attributes in a single Hash only

There might be other and if you provide more of your requirements we
might come up with other solutions.

Kind regards

robert
 
I

Ian Trudel

Ian, I'm not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

This is actually how I use Marshal. It works fine if I have only one
version in one given Ruby program. The problem resides in loading
different files which may contain one version or another of the given
class definition. There are sometimes additional (or less) instance
variables and methods, different implementation of certain methods, etc.
depending on the version of the class. Mmm. collision problems,
Capitain!

My initial hope was on defining the main class in such way to delegate
to other classes (named and implemented according to its version). In my
twisted mind, I had imagined something that I could set the delegator to
a certain class before loading the data, just like any other proxy, and
then use it; or at least before using methods or accessors.
Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this. And there
are probably good reasons (security, efficiency probably).

Understandably. :)
1. use OpenStruct
2. use Hash
3. change your class Data to store attributes in a single Hash only

Once again a good idea! Unfortunately, it is not just about data but
also about class and instance methods and their specific implementation.
Would it mean that I could mixin the instance of OStruct with my
specific version of a class (as a module) at that point?

You can use tricks as 7stud suggested although I feel wary about this.
I would probably choose a different solution based on the
requirements (which are not fully clear to me). If you just need
changing sets of attributes then these options might work:

I have data files generated by different softwares. These files are
generated according to a given class but the implementation (accessors,
methods, etc.) are slightly different according to the software. They
share the same name, basic functionalities and data though they have
differences according to their version. I would like to be able to load
and use them within my Ruby program, any or many of these generated
files at the same time without collision. Requirement was that I do not
have access to the original source of the softwares and I do have to
reimplement and test each version all by myself.

We should perhaps see the problem as if it was extreme: let's imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods, nothing
in common at all). No access to those programs and yet have to load all
the files within a single Ruby program. What one would do?

Thanks for your help, guys!

Regards,
Ian
 
S

Sean O'Halpin

We should perhaps see the problem as if it was extreme: let's imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods, nothing
in common at all). No access to those programs and yet have to load all
the files within a single Ruby program. What one would do?

Well, you could dynamically extend the loaded instances with modules
that add the specific required behaviour.
Something like this:

First file represents whatever created the data in the first place:

# file1
class MyData
attr_accessor :kind
attr_accessor :name
def initialize(kind, name)
@kind = kind
@name = name
end
end

instance = MyData.new("Greeting", "World")
data = Marshal.dump(instance)
File.open("data.dat", "wb") do |file|
file.write(data)
end
# end of file1

Second file shows how you could load this data and dynamically decide
how it should behave as an instance:

# file2
# these modules will be used to extend the loaded instance depending
# on its @kind
module Hello
def run
puts "Hello #{ @name }"
end
end

module Goodbye
def run
puts "Goodbye #{ @name }"
end
end

# You need to define this if you're unmarshalling data that has been
# saved as MyData - no way round it as Marshal embeds the class name
# in the data
class MyData
end

# unmarshall data and extend depending on the @kind
data = File.read("data.dat")
instance = Marshal.load(data)
# this is shorthand for determining the nature of the data
if instance.instance_variable_defined?("@kind")
kind = instance.instance_variable_get("@kind")
if Object.const_defined?(kind)
extension = Object.const_get(kind)
instance.extend(extension)
instance.run
else
puts "@kind not known: #{instance.inspect}"
end
else
puts "@kind not defined for: #{instance.inspect}"
end
# end of file2

I'm using @kind as shorthand to stand for something that distinguishes
between instances of your data. (BTW, you can't use Data as a class
name in Ruby - it's reserved for use with C extensions).

HTH,
Regards,
Sean
 
G

Gary Wright

We should perhaps see the problem as if it was extreme: let's imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods,
nothing
in common at all). No access to those programs and yet have to load
all
the files within a single Ruby program. What one would do?

You are establishing ground rules that can't be followed.

If you have two programs that want to exchange data then they've got
to have some pre-existing *shared* understanding of the structure
of the data. You can't migrate the state of an object from one
arbitrary class to another arbitrary class without constraining the
form of that state in some way.

Ruby's marshal has a built-in assumption that the class that loads
the object state is the *same* (for some reasonable definition of
"same") as the class that dumps the object state.

It's sounds to me like you need to abstract out the state into its
own class and use Marshal to serialize/deserialize that and then
devise import/export methods for the various 'versions' of your Data
class. Use an intermediate class to act as the adapter between
all the versions of your Data class.

Gary Wright
 
S

Sean O'Halpin

Oops. That should be:

instance = MyData.new("Hello", "World")

in the first file.
 
B

Brian Candler

Ian said:
This is actually how I use Marshal. It works fine if I have only one
version in one given Ruby program. The problem resides in loading
different files which may contain one version or another of the given
class definition. There are sometimes additional (or less) instance
variables

Instance variables are not part of the class definition at all - even
when you're only talking about a single version of the class. Instance
variables are dynamically set within each object instance. For example:

class Foo
def bar
@xyz = 123
end
end

f = Foo.new # no instance variables set at all

g = Foo.new
g.instance_variable_set:)@baz, 999) # only @baz is set

Given this: it makes sense that serializing or deserializing an instance
of Foo only takes into account what instance variables are set in that
particular object, making no reference to the class definition.
 
M

Michael Fellinger

Marshal does not seem to instantiate given class(es) on load. Moreover,
it will absolutely work as long as the class is define but even if it
has no attributes. The following code snippet creates an array with
filled with objects:

class Data
end

data = File.open("data.bin", "rb") { |f| Marshal.load(f) }

This absolutely works fine even if the Data class used to dump (e.g.
from another program) has many instance variables. They just won't be
accessible directly (but still can be accessible using reflection).
Inspect would show something like <Data:0x2b24088 @name="Account"
@expanded=true ..>. Furthermore, it doesn't seem to use the defined
class if an IO is fed in Marshal.load, thus overriding Data._load just
won't work. Is that the expected behaviour?

This is really annoying considering that I would like to load data onto
one (among many) version of Data class. The version of the class to be
used is determined and set at run-time according to the file loaded;
this is my ultimate goal. There is obviously no cast feature in Ruby.
Even using a proxy won't cut it, if only for the fact that Marshal
doesn't instantiate loaded data.

http://eigenclass.org/R2/writings/extprot-vs-ruby-marshal

^ manveru
 
M

Mike Gold

Robert said:
I believe he wants to evolve the class and be able to load data
written with an older (or just different) version of the class. And
now Ian hits the usual problems of schema migration.

Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this.

... except languages in which code and data are equivalent!

Sorry, I had to bite. This is a great example of the power of code-data
equivalence. If you store the definitions, things will just work. I
see no immediate reason not to do it, other than the language not
letting you (short of awkward contrivances like heredoc-ing all your
code).

If you expect the definition to change, you can write adapters which
examine the definition (since it's data!) to detect new or incompatible
changes then adjust accordingly.
And there are probably good reasons (security, efficiency probably).

There are a variety of reasons for both doing it and not doing it. One
reason for not doing it is that the language you chose does not allow
you to do it. That may or may not be a good reason.

In case there was any confusion from a previous thread, I do use ruby,
as is obvious from my previous posts. I would only suggest that working
around the limitations of a language is not necessarily the best
approach, even though it is typically the default course of action. In
some cases it might be better to use a language without those
limitations.
 
R

Robert Klemme

.. except languages in which code and data are equivalent!

Sorry, I had to bite.

Ouch! ;-)
This is a great example of the power of code-data
equivalence. If you store the definitions, things will just work.

Well, *certain* things will just work. But you'll trade this for
different issues. For example, all of a sudden you can have different
implementations of the same class coexist. I wouldn't say that one or
the other solution is necessarily easier. They both do not change the
complexity of the underlying problem (evolution of code with data
artifacts belonging to different versions). Both approaches (i.e.
storing code and not storing code) make certain things easy and other
things hard.
If you expect the definition to change, you can write adapters which
examine the definition (since it's data!) to detect new or incompatible
changes then adjust accordingly.

I'd rather say you _must_ write adapters - otherwise chances are that
something will break uncontrollably.
In case there was any confusion from a previous thread, I do use ruby,
as is obvious from my previous posts. I would only suggest that working
around the limitations of a language is not necessarily the best
approach, even though it is typically the default course of action. In
some cases it might be better to use a language without those
limitations.

As I understand the particular situation a set of programs written in
Ruby was given and their output (marshaled data) needs to be worked
with. In this case, choosing a different language does not look like a
feasible option. But I generally agree that you should pick the right
tool for the job.

Kind regards

robert
 
R

Rick DeNatale

[Note: parts of this message were removed to make it a legal post.]

Ouch! ;-)

This is a great example of the power of code-data equivalence. If you

Well, *certain* things will just work. But you'll trade this for different
issues. For example, all of a sudden you can have different implementations
of the same class coexist. I wouldn't say that one or the other solution is
necessarily easier. They both do not change the complexity of the
underlying problem (evolution of code with data artifacts belonging to
different versions). Both approaches (i.e. storing code and not storing
code) make certain things easy and other things hard.


Ruby has some subtleties in this area when compared to other OO languages.

Mike Gold introduced the idea that this was a problem in dealing with schema
migration. To me this implies dealing with layout changes to the object.
This is a problem in most languages like Java, C++ and Smalltalk where
classes, along with whatever other language specific roles they play, act as
a template for understanding which instance variable goes where in a reified
instance.

This means that if you marshal an object then match it up to a class with
the same template, you run into the danger of misinterpreting the state of
the object. In systems written in these languages, you might be able to get
away with two different versions of a Class which have the same instance
layout template but vary in method implementations, or even have slightly
different method repertoires.

Ruby falls into the class of languages where classes DON'T act as templates,
instead instance variables are dynamically bound to each instance with a
run-time lookup used to map instance variable names to location.

So the OP's case shows that you can marshal Ruby objects and the 'schema' is
carried with each object. It's just that accessor methods don't go along.

As it turns out, the MagLev project is trying to figure out how to deal with
a similar problem right now. In Gemstone Smalltalk, which is the code base
on which MagLev is being built, classes and instances are all held in a
shared persistant store. When a process changes a class, and commits a
transaction, other processes see the change when the results of the
transaction become visible to them (i.e. when they start up, or commit or
abort a transaction of their own).

Now, this was apparently the same model they were planning to follow for
MagLev. However, we had some discussions in the beta-testers forum about how
this might or might not work with many Ruby programs because the Ruby
execution model builds up classes at run time from a known initial state,
and classes change as the code executes, either through 'normal' class
method definition (both of which are execution time events in Ruby) or
through various levels of metaprogramming sophistication.

Because Ruby classes get built incrementally at run-time, the order of
execution can be important, so starting with a persisted initial set of
class definitions can be problematic at times.

So currently MagLev allows independent control over whether or not a
transaction commit persists changes to class definitions. A process needs to
explicitly indicate that it want's to put class definition changes into the
state to be committed before committing.

We'll see how this evolves.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
R

Robert Klemme

[Note: parts of this message were removed to make it a legal post.]

Ouch! ;-)

This is a great example of the power of code-data equivalence. If you
Well, *certain* things will just work. But you'll trade this for different
issues. For example, all of a sudden you can have different implementations
of the same class coexist. I wouldn't say that one or the other solution is
necessarily easier. They both do not change the complexity of the
underlying problem (evolution of code with data artifacts belonging to
different versions). Both approaches (i.e. storing code and not storing
code) make certain things easy and other things hard.

Ruby has some subtleties in this area when compared to other OO languages.

Don't they all have? ;-)
Mike Gold introduced the idea that this was a problem in dealing with schema
migration. To me this implies dealing with layout changes to the object.
This is a problem in most languages like Java, C++ and Smalltalk where
classes, along with whatever other language specific roles they play, act as
a template for understanding which instance variable goes where in a reified
instance.

This means that if you marshal an object then match it up to a class with
the same template, you run into the danger of misinterpreting the state of
the object. In systems written in these languages, you might be able to get
away with two different versions of a Class which have the same instance
layout template but vary in method implementations, or even have slightly
different method repertoires.

Ruby falls into the class of languages where classes DON'T act as templates,
instead instance variables are dynamically bound to each instance with a
run-time lookup used to map instance variable names to location.

So the OP's case shows that you can marshal Ruby objects and the 'schema' is
carried with each object. It's just that accessor methods don't go along.

Yes, this is true and it allows to cope with at least some migrations
which might be enough for many practical purposes. But strictly
speaking this situation is not really better than that of other
languages: while this property of Ruby allows for successful
deserialization, you can break a class's invariant (as manifested in the
implementation of methods) with this, rendering deserialized instances
completely unusable.
As it turns out, the MagLev project is trying to figure out how to deal with
a similar problem right now. In Gemstone Smalltalk, which is the code base
on which MagLev is being built, classes and instances are all held in a
shared persistant store. When a process changes a class, and commits a
transaction, other processes see the change when the results of the
transaction become visible to them (i.e. when they start up, or commit or
abort a transaction of their own).

Now, this was apparently the same model they were planning to follow for
MagLev. However, we had some discussions in the beta-testers forum about how
this might or might not work with many Ruby programs because the Ruby
execution model builds up classes at run time from a known initial state,
and classes change as the code executes, either through 'normal' class
method definition (both of which are execution time events in Ruby) or
through various levels of metaprogramming sophistication.

Because Ruby classes get built incrementally at run-time, the order of
execution can be important, so starting with a persisted initial set of
class definitions can be problematic at times.

So currently MagLev allows independent control over whether or not a
transaction commit persists changes to class definitions. A process needs to
explicitly indicate that it want's to put class definition changes into the
state to be committed before committing.

We'll see how this evolves.

Thank you for the abstract, Rick. This sounds interesting. Your
explanation is a nice demonstration of the complexity of the problem I
was talking about. :)

Kind regards

robert
 
E

Eric Hodel

This is actually how I use Marshal. It works fine if I have only one
version in one given Ruby program. The problem resides in loading
different files which may contain one version or another of the given
class definition. There are sometimes additional (or less) instance
variables and methods, different implementation of certain methods,
etc.
depending on the version of the class. Mmm. collision problems,
Capitain!

My initial hope was on defining the main class in such way to delegate
to other classes (named and implemented according to its version).
In my
twisted mind, I had imagined something that I could set the
delegator to
a certain class before loading the data, just like any other proxy,
and
then use it; or at least before using methods or accessors.

Easy, dump and load an Array:

class MyObject
# ...
def marshal_dump
[@ivar1, @ivar2, ...]
end

def marshal_load(data)
@ivar1 = data.shift
@ivar2 = data.shift
# ...
end
end

All versions of MyObject should store compatible ivars in compatible
positions in the Array. For a fancier implementation of this idea,
see Gem::Specification in the rubygems source.

PS: Data is a built-in class:

$ ruby -e 'p Data'
Data
$
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top