memoize to a file

B

Brian Buckley

------=_Part_19822_26300124.1138764763824
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache =3D Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache =3D Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in `load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)
2 What is the purpose of the rescue{} suppressing the error info in the
first place?
3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

Thanks.

-- Brian Buckley

------------------------------------------
require 'memoize'
include Memoize
def fib(n)
puts "running... n is #{n}"
return n if n < 2
fib(n-1) + fib(n-2)
end
h =3D memoize:)fib,"fib.cache")
puts fib(10)

------=_Part_19822_26300124.1138764763824--
 
D

Daniel Berger

Brian said:
Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in `load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)

That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.
2 What is the purpose of the rescue{} suppressing the error info in the
first place?

The assumption (whoops!) was that if Hash.new.update failed it was
because there was no cache (i.e. first run), so just return an empty
hash.
3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

It will be slower, but it would work.

Regards,

Dan
 
L

Logan Capaldo

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working
for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in
`load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)
2 What is the purpose of the rescue{} suppressing the error info
in the
first place?
3 Instead of using Marshall would using yaml be a reasonable
alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

Thanks.

-- Brian Buckley

------------------------------------------
require 'memoize'
include Memoize
def fib(n)
puts "running... n is #{n}"
return n if n < 2
fib(n-1) + fib(n-2)
end
h = memoize:)fib,"fib.cache")
puts fib(10)

Basically it's using exceptions as flow control:

begin
cache = Hash.new.update(Marshal.load(File.read(file)))
rescue
cache = {} # empty hash
end

So for whatever reason, if loading the file fails (eg, this is the
first time the program has been run) it just starts with an empty
cache. I don't know why its failing to read the file.
 
T

Timothy Goddard

Just a thought, but you might like to load this file using the binary
option on Windows. Marshall uses a binary format and Windows does wierd
things to binary files loaded without the binary option.
 
M

Mauricio Fernandez

My questions:
1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}
2 What is the purpose of the rescue{} suppressing the error info in the
first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).
3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

I wouldn't do that:
* Marshal is faster than Syck (especially when dumping data)
* YAML takes more space than Marshal'ed data
* there are still more bugs in Syck than in Marshal (the nastiest memory
issues are believed to be fixed, but there is still occasional data
corruption)
* Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open("cache.yaml", "w") do |out|
YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end
 
R

Robert Klemme

Daniel said:
That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.


The assumption (whoops!) was that if Hash.new.update failed it was
because there was no cache (i.e. first run), so just return an empty
hash.


It will be slower, but it would work.

As you and others have pointed out this is lilely a problem caused by not
opening the file in binary mode. IMHO lib code that uses Marshal should
ensure to open files in binary mode (regardless of platform). Advantages
are twofold: we won't see these kind of erros (i.e. it's cross platform)
and documentation (you know from reading the code that the file is
expected to contain binary data).

Also, the line looks a bit strange to me. Creating a new hash and
updating it with a hash read from disk seems superfluous. I'd rather do
something like this:

cache = File.open(file, "rb") {|io| Marshal.load(io)} rescue {}

Marshal.load and Marshal.dump can actually read from and write to an IO
object. This seems most efficient because the file contents do not have
read into mem before demarshalling and it's fail safe the same way as the
old impl.

Kind regards

robert
 
B

Brian Buckley

------=_Part_23877_23052295.1138798754130
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}


Perfect. Changing

cache =3D Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache =3D Hash.new.update(Marshal.load(File.open(file, "rb"){|f| f.read}))
rescue { }

and it works. Should this edit go into the gem (Daniel if you're
listening)?

2 What is the purpose of the rescue{} suppressing the error info in the

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

Got it. The error supression here is just about always the correct way to
handle the situation.

As for editing the cache, you can always do
File.open("cache.yaml", "w") do |out|
YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end


Ahhh. Populate that Marshal formatted file using YAML. Good thought.

------=_Part_23877_23052295.1138798754130--
 
A

ara.t.howard

IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}


setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).


I wouldn't do that:
* Marshal is faster than Syck (especially when dumping data)
* YAML takes more space than Marshal'ed data
* there are still more bugs in Syck than in Marshal (the nastiest memory
issues are believed to be fixed, but there is still occasional data
corruption)
* Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open("cache.yaml", "w") do |out|
YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end

why not pstore - it's done all that already and is built-in?

-a
 
J

James Edward Gray II

why not pstore - it's done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file
storage. If you need transactions, it's great. Otherwise, you might
as well just use Marshal.

James Edward Gray II
 
A

ara.t.howard

PStore is just a wrapper on top of Marshal for transactional file storage.
If you need transactions, it's great. Otherwise, you might as well just use
Marshal.

it's not quite only that. it also

- does some simple checks when creating the file (readability, etc)
- allows db usage to be multi-processed
- supports deletion
- rolls backs writes on exceptions / commits using ensure to avoid corrupt
data file
- handles read vs write actions using shared/excl locks to boost concurrency
- uses md5 check to avoid un-needed writes
- opens in correct modes for all platforms

with no offense meant towards memoize authors - at least of few of the bugs
posted regarding that package would have been addressed by using a built-in
lib rather that rolling one's own. and, of course, that's the big thing - why
not use something already written and tested from the core instead of
re-inventing the wheel?

in any case, i think the pstore lib, simple as it is, is a very underated
library since it provides simple transactional and concurrent persistence to
ruby apps in such an incredibly simply way. now if we could just get joels
fsdb in the core! ;-)

kind regards.

-a
 
J

James Edward Gray II

it's not quite only that. it also

- does some simple checks when creating the file (readability, etc)
- allows db usage to be multi-processed
- supports deletion
- rolls backs writes on exceptions / commits using ensure to
avoid corrupt
data file
- handles read vs write actions using shared/excl locks to boost
concurrency
- uses md5 check to avoid un-needed writes
- opens in correct modes for all platforms

These are all great points. Thanks for the lesson. ;)

James Edward Gray II
 
B

Brian Buckley

------=_Part_3492_18303778.1138835706194
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.


I have been on 1.8.2 on Windows straight through. Mauricio's suggestion of
File.open instead of File.read made it work for me (see other posts).

Brian

------=_Part_3492_18303778.1138835706194--
 
M

Mauricio Fernandez

That is odd. I've run it on Windows with no trouble in the past.

(FTR: file not opened in binary mode, [177651])
Is it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

The Marshal format hasn't changed for a while:

batsman@tux-chan:~/Anime$ ruby182 -v -e 'p [Marshal::MAJOR_VERSION, Marshal::MINOR_VERSION]'
ruby 1.8.2 (2004-12-25) [i686-linux]
[4, 8]
batsman@tux-chan:~/Anime$ ruby -v -e 'p [Marshal::MAJOR_VERSION, Marshal::MINOR_VERSION]'
ruby 1.8.4 (2005-12-24) [i686-linux]
[4, 8]

Also note that ruby can read Marshal data in older formats if the
MAJOR_VERSION hasn't changed (i.e. if only the MINOR_VERSION was increased):


if (major != MARSHAL_MAJOR || minor > MARSHAL_MINOR) {
rb_raise(rb_eTypeError, "incompatible marshal file format (can't be read)\n\
\tformat version %d.%d required; %d.%d given",
MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
}
if (RTEST(ruby_verbose) && minor != MARSHAL_MINOR) {
rb_warn("incompatible marshal file format (can be read)\n\
\tformat version %d.%d required; %d.%d given",
MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
}


(after some searching...)

Back in Apr. 2001, matz said that "Marshal should not change too much
(unless in upper compatible way)" [14063]. The last minor change
happened after 1.6.8 (6 -> 8), and MARSHAL_MAJOR was already 4 in v1_0_1,
7 years, 2 months ago (at which point I got tired of CVSweb).

Marshal's format is more stable than we think.
 
J

James Edward Gray II

--Apple-Mail-2--379837572
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed

it's not quite only that. it also

- does some simple checks when creating the file (readability, etc)
- allows db usage to be multi-processed
- supports deletion
- rolls backs writes on exceptions / commits using ensure to
avoid corrupt
data file
- handles read vs write actions using shared/excl locks to boost
concurrency
- uses md5 check to avoid un-needed writes
- opens in correct modes for all platforms

I've made a file caching example using PSTore for my toy Memoizable
library. I just thought I would post it here, in case it helps/
inspires others.

#!/usr/local/bin/ruby -w

# pstore_caching.rb
#
# Created by James Edward Gray II on 2006-02-03.
# Copyright 2006 Gray Productions. All rights reserved.

require "memoizable"
require "pstore"

#
# A trivial implementation of a custom cache. This cache uses PStore
to provide
# a multi-processing safe disk cache. The downside is that the
entire cache
# must be loaded for a key check. This can require significant
memory for a
# large cache.
#
class PStoreCache
def initialize( path )
@cache = PStore.new(path)
end

def []( key )
@cache.transaction(true) { @cache[key] }
end

def []=( key, value )
@cache.transaction { @cache[key] = value }
end
end

class Fibonacci
extend Memoizable

def fib( num )
return num if num < 2
fib(num - 1) + fib(num - 2)
end
memoize :fib, PStoreCache.new("fib_cache.pstore")
end

puts "This method is memoized using a file-based cache..."
start = Time.now
puts "fib(100): #{Fibonacci.new.fib(100)}"
puts "Run time: #{Time.now - start} seconds"

puts
puts "Run again to see the file cache at work."

__END__

James Edward Gray II


--Apple-Mail-2--379837572
Content-Transfer-Encoding: 7bit
Content-Type: text/x-ruby-script;
x-unix-mode=0644;
name="memoizable.rb"
Content-Disposition: attachment;
filename=memoizable.rb

#!/usr/local/bin/ruby -w

# memoizable.rb
#
# Created by James Edward Gray II on 2006-01-21.
# Copyright 2006 Gray Productions. All rights reserved.

#
# Have your class or module <tt>extend Memoizable</tt> to gain access to the
# #memoize method.
#
module Memoizable
#
# This method is used to replace a computationally expensive method with an
# equivalent method that will answer repeat calls for indentical arguments
# from a _cache_. To use, make sure the current class extends Memoizable,
# then call by passing the _name_ of the method you wish to cache results for.
#
# The _cache_ object can be any object supporting both #[] and #[]=. The keys
# used for the _cache_ are an Array of the arguments the method was called
# with and the values are just the returned results of the original method
# call. The default _cache_ is a simple Hash, providing in-memory storage.
#
def memoize( name, cache = Hash.new )
original = "__unmemoized_#{name}__"

#
# <tt>self.class</tt> is used for the top level, to modify Object, otherwise
# we just modify the Class or Module directly
#
([Class, Module].include?(self.class) ? self : self.class).class_eval do
alias_method original, name
private original
define_method(name) { |*args| cache[args] ||= send(original, *args) }
end
end
end

--Apple-Mail-2--379837572--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top