Problem with ensuring consistency .. Finalization??

C

Charles Hixson

I want to have a class which occasionally updates a file, but I want to
ensure that it always flushes it's data to the file before the program
quits.

The class is called Words

What's the best approach to take here? I could just continually run a
flush cycle, but that seems an awful waste of resources, I'd rather
batch the updates, and only flush occasionally.


class Words

def initialize ()
@@words = Hash.new
@@words.default = nil
@@wtable = WordTable.instance #<<<=================
@@maxWord = @@wtable.maxWord
end # initialize
....
end # Words

WordTable is a singleton class, and it's the one that does the actual
writing, but it needs to get the data to do the updates from Words.

I'm considering:
def initialize ()
...
ObjectSpace.define_finalizer(@@words, proc { flush })

But I don't know how to evaluate whether it's a good idea or not...or
even how to tell afterwards (presuming it doesn't throw a "compile-time"
error).
 
K

Kaspar Schiess

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Charles Hixson wrote:

| I want to have a class which occasionally updates a file, but I want to
| ensure that it always flushes it's data to the file before the program
| quits.

Not quite sure if I really understand your problem, but I might offer
some things to consider here:

1. stream.sync= true for enabling autoflush. Is this not possible ? This
seems to be the simplest solution.

2. Use Ruby's block syntax to 'ensure' flush after update.

3. At exit of ruby program, all Files should be flushed anyhow, correct
me on this if I am mistaken.

best regards,
kaspar

semantics & semiotics
code manufacture

www.tua.ch/ruby
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBQCGkFifl4CA0ImQRArN9AKCoPSKxMYt276EOmI/XtNmwLkF5/wCdESTk
in4wAn0tKmQxECSkRP5DX3o=
=EnM7
-----END PGP SIGNATURE-----
 
R

Robert Klemme

Kaspar Schiess said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Charles Hixson wrote:

| I want to have a class which occasionally updates a file, but I want to
| ensure that it always flushes it's data to the file before the program
| quits.

Not quite sure if I really understand your problem, but I might offer
some things to consider here:

1. stream.sync= true for enabling autoflush. Is this not possible ? This
seems to be the simplest solution.

2. Use Ruby's block syntax to 'ensure' flush after update.

Yeah, but it really depends on the usage pattern: I'm not sure whether a)
the file is overwritten on every access and b) how often these updates
take place. Charles, can you clarify?
3. At exit of ruby program, all Files should be flushed anyhow, correct
me on this if I am mistaken.

Hm, I would have guessed otherwise but apparently you are right:

12:00:34 [source]: ruby -e 'io=File.open("x", "w");io.sync =
false;io.print "hello"'
12:00:53 [source]: cat x
hello12:00:55 [source]:

Kind regards

robert
 
A

Austin Ziegler

I want to have a class which occasionally updates a file, but I want to
ensure that it always flushes it's data to the file before the program
quits.

The class is called Words

What's the best approach to take here? I could just continually run a
flush cycle, but that seems an awful waste of resources, I'd rather
batch the updates, and only flush occasionally.

class Words

def initialize ()
@@words = Hash.new
@@words.default = nil
@@wtable = WordTable.instance #<<<=================
@@maxWord = @@wtable.maxWord
end # initialize
....
end # Words

WordTable is a singleton class, and it's the one that does the actual
writing, but it needs to get the data to do the updates from Words.

I'm considering:
def initialize ()
...
ObjectSpace.define_finalizer(@@words, proc { flush })

But I don't know how to evaluate whether it's a good idea or not...or
even how to tell afterwards (presuming it doesn't throw a "compile-time"
error).

Perhaps an END {} block?

-austin
 
C

Charles Hixson

Austin said:
Perhaps an END {} block?

-austin
That looks like a good approach (well, the best suggested). I'd rather
have it tied into the class, so that the class could be moved from
application to application, but if I can't, I can't. Having it a part
of the containing file is certainly a "next best" viable approach.

(I didn't even remember that END blocks existed, though I know I've read
that page before!)

Thanks loads.
 
C

Charles Hixson

Robert said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Charles Hixson wrote:

| I want to have a class which occasionally updates a file, but I want

to


| ensure that it always flushes it's data to the file before the program
| quits.

Not quite sure if I really understand your problem, but I might offer
some things to consider here:

1. stream.sync= true for enabling autoflush. Is this not possible ? This
seems to be the simplest solution.

2. Use Ruby's block syntax to 'ensure' flush after update.

Yeah, but it really depends on the usage pattern: I'm not sure whether a)
the file is overwritten on every access and b) how often these updates
take place. Charles, can you clarify?


3. At exit of ruby program, all Files should be flushed anyhow, correct
me on this if I am mistaken.

Hm, I would have guessed otherwise but apparently you are right:

12:00:34 [source]: ruby -e 'io=File.open("x", "w");io.sync =
false;io.print "hello"'
12:00:53 [source]: cat x
hello12:00:55 [source]:

Kind regards

robert
What's being done here is updating a database (an Sqlite database
actually). So what I want to do is accumulate a bunch of changes, and
then periodically add them either when things would be idle or when the
number of changes starts to use too much ram. But I don't want to loose
them when the program terminates, and the class doesn't terminate itself
(notice that the table is referred to, indirectly, via a class variable
@@wtable). Now the data accumulation happens in a class separate from
the class that manipulates the database table, etc.

The stream.sync approach doesn't seem to apply here at all. (Note that
I want to be flushing data in a Hash Table to the file...so I can't use
any automatic file flushing.)

The suggestion of the END block of the file is a plausible approach,
which I had forgotten existed. What I really want is a class finalizer,
but lacking that I should be able to make the END block work, with a bit
of redesign. It will drastically decrease the portability of the class,
but as each file can have it's own END block, it shouldn't decrease the
portability of the file.

Thanks for the help,
Charles
 
J

Joel VanderWerf

Charles said:
Austin Ziegler wrote: ...
That looks like a good approach (well, the best suggested). I'd rather
have it tied into the class, so that the class could be moved from
application to application, but if I can't, I can't. Having it a part
of the containing file is certainly a "next best" viable approach.

(I didn't even remember that END blocks existed, though I know I've read
that page before!)

IIRC, Kernel#at_exit has the same functionality, but you can call it
from a method in your class.

class C
def foo
at_exit do
puts "Done!"
end
end
end
 
F

Florian Gross

Charles said:
I'm considering:
def initialize ()
...
ObjectSpace.define_finalizer(@@words, proc { flush })

But I don't know how to evaluate whether it's a good idea or not...or
even how to tell afterwards (presuming it doesn't throw a "compile-time"
error).

What about using at_exit?

Regards,
Florian Gross
 
C

Charles Hixson

Joel said:
...
...
IIRC, Kernel#at_exit has the same functionality, but you can call it
from a method in your class.

class C
def foo
at_exit do
puts "Done!"
end
end
end
Thanks! That's JUST what I was looking for.
 
R

Robert Klemme

What's being done here is updating a database (an Sqlite database
actually). So what I want to do is accumulate a bunch of changes, and
then periodically add them either when things would be idle or when the
number of changes starts to use too much ram. But I don't want to loose
them when the program terminates, and the class doesn't terminate itself
(notice that the table is referred to, indirectly, via a class variable
@@wtable). Now the data accumulation happens in a class separate from
the class that manipulates the database table, etc.

The stream.sync approach doesn't seem to apply here at all. (Note that
I want to be flushing data in a Hash Table to the file...so I can't use
any automatic file flushing.)

I don't know how big your hash will grow, but did you try to just marshal
the hash like this after every change you want to preserve. Marshal is
quite fast, so it might be worth a try:

File.open("storage", "wb"){|io| Marshal.dump( hash, io )}
hash = File.open("storage", "b"){|io| Marshal.load(io)}
The suggestion of the END block of the file is a plausible approach,
which I had forgotten existed. What I really want is a class finalizer,

As mentioned ruby seems to flush all open handles on exit.
but lacking that I should be able to make the END block work, with a bit
of redesign. It will drastically decrease the portability of the class,
but as each file can have it's own END block, it shouldn't decrease the
portability of the file.

If you want to save for safety reasons (i.e. to avoid data loss on a crash
of the Ruby interpreter) you must flush after every write anyway. So
there would be no need for END block or whatever other means.

Kind regards

robert
 
C

Charles Hixson

Robert said:
I don't know how big your hash will grow, but did you try to just marshal
the hash like this after every change you want to preserve. Marshal is
quite fast, so it might be worth a try:

File.open("storage", "wb"){|io| Marshal.dump( hash, io )}
hash = File.open("storage", "b"){|io| Marshal.load(io)}




As mentioned ruby seems to flush all open handles on exit.




If you want to save for safety reasons (i.e. to avoid data loss on a crash
of the Ruby interpreter) you must flush after every write anyway. So
there would be no need for END block or whatever other means.

Kind regards

robert

Marshall is the wrong answer. The hash will be limited to around 1500
items by flushing. The database of which it is a partial updated mirror
will likely grow to around 5,000,000 items. The flushing process
detects all dirty items in the hash and passes them to another routine
which either updates an existing item or adds a new one.

Actually, there will likely be several (?) hash tables simultaneously in
the eventual implementation, and each one will need to implement a
different version of this procedure. Fortunately, the records
structures are both orderly and consistent, so I won't need the kind of
flexibility that marshall implies. I may even eventually translate this
into a compileable language after I get everything working, for the
increase in speed that's available. (My plan is 1) first get it
working, 2) second, speed it up.) If this is adopted I'll probably use
D (DMD), as that seems the best of the current compileable languages.
(I wonder if Ruby-inline could handle D code? Well, no rush. That's a
long ways off yet.)
 
R

Robert Klemme

Marshall is the wrong answer. The hash will be limited to around 1500
items by flushing. The database of which it is a partial updated mirror
will likely grow to around 5,000,000 items. The flushing process
detects all dirty items in the hash and passes them to another routine
which either updates an existing item or adds a new one.

So if I understand you correctly it's like this: you have a hash data
structure in mem that keeps some data among that data that is not yet
present in the DB. You want to store all dirty data in a file to make
sure that in case of a crash you don't loose anything. From time to time
you write the dirty stuff into the database and if that succeeds you clear
the temp disk storage. I'll attach something that shows how I image this
could work.
Actually, there will likely be several (?) hash tables simultaneously in
the eventual implementation, and each one will need to implement a
different version of this procedure. Fortunately, the records
structures are both orderly and consistent, so I won't need the kind of
flexibility that marshall implies. I may even eventually translate this
into a compileable language after I get everything working, for the
increase in speed that's available. (My plan is 1) first get it
working, 2) second, speed it up.)

That's the way how it should be ("premature optimization..."). :)
If this is adopted I'll probably use
D (DMD), as that seems the best of the current compileable languages.
(I wonder if Ruby-inline could handle D code? Well, no rush. That's a
long ways off yet.)

Just a wild idea:

functor = D::compile <<EOF
D code here
EOF

functor.call( "foo", "bar" )

Of course you'd have to compile the code into a shared lib and dynamically
load it. But it sounds feasible IMHO. Maybe it's a good idea to provide
a framework for this, so integration of other languages becomes easier.

Kind regards

robert
 
C

Charles Hixson

Robert said:
So if I understand you correctly it's like this: you have a hash data
structure in mem that keeps some data among that data that is not yet
present in the DB. You want to store all dirty data in a file to make
sure that in case of a crash you don't loose anything. From time to time
you write the dirty stuff into the database and if that succeeds you clear
the temp disk storage. I'll attach something that shows how I image this
could work.
Sort of. The disk storage is permanent, it's the hash that's
temporary. I certainly wouldn't want to eat up my ram by holding the
entire database in ram, when at any one time it didn't need most of it.
OTOH, a persistent hash would be a reasonable answer...well, I haven't
looked at your code yet, so I shouldn't comment. It sounds like it
would be a reasonable answer. I haven't been planning to persist the
hash itself, but if I can, without excessive cycle use, then that would
just by itself solve the current problem. (OTOH, it also looks like
...
at_exit { flush } # this is run after the files are
opened and the hash is initialized
end # initializer
...
end # class

will solve the problem. I've read the description of what it does
three times, and I still can't be *sure* that the file and class
variables will still be extant when I run it, but it looks like that's
the intent.

That's the way how it should be ("premature optimization..."). :)




Just a wild idea:

functor = D::compile <<EOF
D code here
EOF

functor.call( "foo", "bar" )

Of course you'd have to compile the code into a shared lib and dynamically
load it. But it sounds feasible IMHO. Maybe it's a good idea to provide
a framework for this, so integration of other languages becomes easier.

Kind regards

robert
Thanks for your assistance.
 
R

Robert Klemme

Charles Hixson said:
Sort of. The disk storage is permanent, it's the hash that's temporary.
I certainly wouldn't want to eat up my ram by holding the entire database
in ram, when at any one time it didn't need most of it.

Ah, ok, that sounds as if you needed a cache. An LRU cache is relatively
straightforward. There might be one on RAA. I remember I did an
experimental implementation once, if you're interested I can check whether I
still have it.
OTOH, a persistent hash would be a reasonable answer...well, I haven't
looked at your code yet, so I shouldn't comment. It sounds like it would
be a reasonable answer. I haven't been planning to persist the hash
itself, but if I can, without excessive cycle use, then that would just by
itself solve the current problem. (OTOH, it also looks like
..
at_exit { flush } # this is run after the files are opened
and the hash is initialized
end # initializer
..
end # class

will solve the problem. I've read the description of what it does three
times, and I still can't be *sure* that the file and class variables will
still be extant when I run it, but it looks like that's the intent.

The only thing that at_exit can't help you with is a crash. If for some
reason (e.g. buggy extension) Ruby crashes at_exit won't help you here. The
best (i.e safest) is probably to directly store new data in the DB and keep
a cache of recent used data in mem.
Thanks for your assistance.

You're welcome!

Kind regards

robert
 
C

Charles Hixson

Robert said:
Ah, ok, that sounds as if you needed a cache. An LRU cache is
relatively straightforward. There might be one on RAA. I remember I
did an experimental implementation once, if you're interested I can
check whether I still have it.

The cache seems to be working now... my current problem has to do with
the db, which appears to be confusing variable name with variable
value. I may need to rename my db variables, which seems an altogether
silly requirement, but that's what it looks like may be needed.
..

The only thing that at_exit can't help you with is a crash. If for
some reason (e.g. buggy extension) Ruby crashes at_exit won't help you
here. The best (i.e safest) is probably to directly store new data in
the DB and keep a cache of recent used data in mem.

If it's that corrupt, perhaps I'm better off not saving the cache. (It
should only cost me a few minutes work, and better stale data than corrupt.)
Thanks for your assistance.

You're welcome!

Kind regards

robert

Thanks again,
Charles
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,054
Latest member
LucyCarper

Latest Threads

Top