Problem with ensuring consistency .. Finalization??

Discussion in 'Ruby' started by Charles Hixson, Sep 9, 2004.

  1. I want to have a class which occasionally updates a file, but I want to
    ensure that it always flushes it's data to the file before the program
    quits.

    The class is called Words

    What's the best approach to take here? I could just continually run a
    flush cycle, but that seems an awful waste of resources, I'd rather
    batch the updates, and only flush occasionally.


    class Words

    def initialize ()
    @@words = Hash.new
    @@words.default = nil
    @@wtable = WordTable.instance #<<<=================
    @@maxWord = @@wtable.maxWord
    end # initialize
    ....
    end # Words

    WordTable is a singleton class, and it's the one that does the actual
    writing, but it needs to get the data to do the updates from Words.

    I'm considering:
    def initialize ()
    ...
    ObjectSpace.define_finalizer(@@words, proc { flush })

    But I don't know how to evaluate whether it's a good idea or not...or
    even how to tell afterwards (presuming it doesn't throw a "compile-time"
    error).
     
    Charles Hixson, Sep 9, 2004
    #1
    1. Advertising

  2. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Charles Hixson wrote:

    | I want to have a class which occasionally updates a file, but I want to
    | ensure that it always flushes it's data to the file before the program
    | quits.

    Not quite sure if I really understand your problem, but I might offer
    some things to consider here:

    1. stream.sync= true for enabling autoflush. Is this not possible ? This
    seems to be the simplest solution.

    2. Use Ruby's block syntax to 'ensure' flush after update.

    3. At exit of ruby program, all Files should be flushed anyhow, correct
    me on this if I am mistaken.

    best regards,
    kaspar

    semantics & semiotics
    code manufacture

    www.tua.ch/ruby
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (MingW32)
    Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

    iD8DBQFBQCGkFifl4CA0ImQRArN9AKCoPSKxMYt276EOmI/XtNmwLkF5/wCdESTk
    in4wAn0tKmQxECSkRP5DX3o=
    =EnM7
    -----END PGP SIGNATURE-----
     
    Kaspar Schiess, Sep 9, 2004
    #2
    1. Advertising

  3. "Kaspar Schiess" <> schrieb im Newsbeitrag
    news:chp75o$4gp$...
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > Charles Hixson wrote:
    >
    > | I want to have a class which occasionally updates a file, but I want

    to
    > | ensure that it always flushes it's data to the file before the program
    > | quits.
    >
    > Not quite sure if I really understand your problem, but I might offer
    > some things to consider here:
    >
    > 1. stream.sync= true for enabling autoflush. Is this not possible ? This
    > seems to be the simplest solution.
    >
    > 2. Use Ruby's block syntax to 'ensure' flush after update.


    Yeah, but it really depends on the usage pattern: I'm not sure whether a)
    the file is overwritten on every access and b) how often these updates
    take place. Charles, can you clarify?

    > 3. At exit of ruby program, all Files should be flushed anyhow, correct
    > me on this if I am mistaken.


    Hm, I would have guessed otherwise but apparently you are right:

    12:00:34 [source]: ruby -e 'io=File.open("x", "w");io.sync =
    false;io.print "hello"'
    12:00:53 [source]: cat x
    hello12:00:55 [source]:

    Kind regards

    robert
     
    Robert Klemme, Sep 9, 2004
    #3
  4. On Thu, 9 Sep 2004 16:38:07 +0900, Charles Hixson
    <> wrote:
    > I want to have a class which occasionally updates a file, but I want to
    > ensure that it always flushes it's data to the file before the program
    > quits.
    >
    > The class is called Words
    >
    > What's the best approach to take here? I could just continually run a
    > flush cycle, but that seems an awful waste of resources, I'd rather
    > batch the updates, and only flush occasionally.
    >
    > class Words
    >
    > def initialize ()
    > @@words = Hash.new
    > @@words.default = nil
    > @@wtable = WordTable.instance #<<<=================
    > @@maxWord = @@wtable.maxWord
    > end # initialize
    > ....
    > end # Words
    >
    > WordTable is a singleton class, and it's the one that does the actual
    > writing, but it needs to get the data to do the updates from Words.
    >
    > I'm considering:
    > def initialize ()
    > ...
    > ObjectSpace.define_finalizer(@@words, proc { flush })
    >
    > But I don't know how to evaluate whether it's a good idea or not...or
    > even how to tell afterwards (presuming it doesn't throw a "compile-time"
    > error).


    Perhaps an END {} block?

    -austin
    --
    Austin Ziegler *
    * Alternate:
    : as of this email, I have [ 3 ] Gmail invitations
     
    Austin Ziegler, Sep 9, 2004
    #4
  5. Austin Ziegler wrote:

    >On Thu, 9 Sep 2004 16:38:07 +0900, Charles Hixson
    ><> wrote:
    >
    >
    >>I want to have a class which occasionally updates a file, but I want to
    >>ensure that it always flushes it's data to the file before the program
    >>quits.
    >>
    >>The class is called Words
    >>
    >>What's the best approach to take here? I could just continually run a
    >>flush cycle, but that seems an awful waste of resources, I'd rather
    >>batch the updates, and only flush occasionally.
    >>
    >>class Words
    >>
    >> def initialize ()
    >> @@words = Hash.new
    >> @@words.default = nil
    >> @@wtable = WordTable.instance #<<<=================
    >> @@maxWord = @@wtable.maxWord
    >> end # initialize
    >>....
    >>end # Words
    >>
    >>WordTable is a singleton class, and it's the one that does the actual
    >>writing, but it needs to get the data to do the updates from Words.
    >>
    >>I'm considering:
    >> def initialize ()
    >> ...
    >> ObjectSpace.define_finalizer(@@words, proc { flush })
    >>
    >>But I don't know how to evaluate whether it's a good idea or not...or
    >>even how to tell afterwards (presuming it doesn't throw a "compile-time"
    >>error).
    >>
    >>

    >
    >Perhaps an END {} block?
    >
    >-austin
    >
    >

    That looks like a good approach (well, the best suggested). I'd rather
    have it tied into the class, so that the class could be moved from
    application to application, but if I can't, I can't. Having it a part
    of the containing file is certainly a "next best" viable approach.

    (I didn't even remember that END blocks existed, though I know I've read
    that page before!)

    Thanks loads.
     
    Charles Hixson, Sep 9, 2004
    #5
  6. Robert Klemme wrote:

    >"Kaspar Schiess" <> schrieb im Newsbeitrag
    >news:chp75o$4gp$...
    >
    >
    >>-----BEGIN PGP SIGNED MESSAGE-----
    >>Hash: SHA1
    >>
    >>Charles Hixson wrote:
    >>
    >>| I want to have a class which occasionally updates a file, but I want
    >>
    >>

    >to
    >
    >
    >>| ensure that it always flushes it's data to the file before the program
    >>| quits.
    >>
    >>Not quite sure if I really understand your problem, but I might offer
    >>some things to consider here:
    >>
    >>1. stream.sync= true for enabling autoflush. Is this not possible ? This
    >>seems to be the simplest solution.
    >>
    >>2. Use Ruby's block syntax to 'ensure' flush after update.
    >>
    >>

    >
    >Yeah, but it really depends on the usage pattern: I'm not sure whether a)
    >the file is overwritten on every access and b) how often these updates
    >take place. Charles, can you clarify?
    >
    >
    >
    >>3. At exit of ruby program, all Files should be flushed anyhow, correct
    >>me on this if I am mistaken.
    >>
    >>

    >
    >Hm, I would have guessed otherwise but apparently you are right:
    >
    >12:00:34 [source]: ruby -e 'io=File.open("x", "w");io.sync =
    >false;io.print "hello"'
    >12:00:53 [source]: cat x
    >hello12:00:55 [source]:
    >
    >Kind regards
    >
    > robert
    >

    What's being done here is updating a database (an Sqlite database
    actually). So what I want to do is accumulate a bunch of changes, and
    then periodically add them either when things would be idle or when the
    number of changes starts to use too much ram. But I don't want to loose
    them when the program terminates, and the class doesn't terminate itself
    (notice that the table is referred to, indirectly, via a class variable
    @@wtable). Now the data accumulation happens in a class separate from
    the class that manipulates the database table, etc.

    The stream.sync approach doesn't seem to apply here at all. (Note that
    I want to be flushing data in a Hash Table to the file...so I can't use
    any automatic file flushing.)

    The suggestion of the END block of the file is a plausible approach,
    which I had forgotten existed. What I really want is a class finalizer,
    but lacking that I should be able to make the END block work, with a bit
    of redesign. It will drastically decrease the portability of the class,
    but as each file can have it's own END block, it shouldn't decrease the
    portability of the file.

    Thanks for the help,
    Charles
     
    Charles Hixson, Sep 9, 2004
    #6
  7. Charles Hixson wrote:
    > Austin Ziegler wrote:

    ...
    >> Perhaps an END {} block?
    >>
    >> -austin
    >>
    >>

    > That looks like a good approach (well, the best suggested). I'd rather
    > have it tied into the class, so that the class could be moved from
    > application to application, but if I can't, I can't. Having it a part
    > of the containing file is certainly a "next best" viable approach.
    >
    > (I didn't even remember that END blocks existed, though I know I've read
    > that page before!)


    IIRC, Kernel#at_exit has the same functionality, but you can call it
    from a method in your class.

    class C
    def foo
    at_exit do
    puts "Done!"
    end
    end
    end
     
    Joel VanderWerf, Sep 9, 2004
    #7
  8. Charles Hixson wrote:

    > I'm considering:
    > def initialize ()
    > ...
    > ObjectSpace.define_finalizer(@@words, proc { flush })
    >
    > But I don't know how to evaluate whether it's a good idea or not...or
    > even how to tell afterwards (presuming it doesn't throw a "compile-time"
    > error).


    What about using at_exit?

    Regards,
    Florian Gross
     
    Florian Gross, Sep 9, 2004
    #8
  9. Joel VanderWerf wrote:

    > Charles Hixson wrote:
    >
    >> Austin Ziegler wrote:

    >
    > ...
    > ...
    > IIRC, Kernel#at_exit has the same functionality, but you can call it
    > from a method in your class.
    >
    > class C
    > def foo
    > at_exit do
    > puts "Done!"
    > end
    > end
    > end
    >
    >

    Thanks! That's JUST what I was looking for.
     
    Charles Hixson, Sep 10, 2004
    #9
  10. "Charles Hixson" <> schrieb im Newsbeitrag
    news:...

    > What's being done here is updating a database (an Sqlite database
    > actually). So what I want to do is accumulate a bunch of changes, and
    > then periodically add them either when things would be idle or when the
    > number of changes starts to use too much ram. But I don't want to loose
    > them when the program terminates, and the class doesn't terminate itself
    > (notice that the table is referred to, indirectly, via a class variable
    > @@wtable). Now the data accumulation happens in a class separate from
    > the class that manipulates the database table, etc.
    >
    > The stream.sync approach doesn't seem to apply here at all. (Note that
    > I want to be flushing data in a Hash Table to the file...so I can't use
    > any automatic file flushing.)


    I don't know how big your hash will grow, but did you try to just marshal
    the hash like this after every change you want to preserve. Marshal is
    quite fast, so it might be worth a try:

    File.open("storage", "wb"){|io| Marshal.dump( hash, io )}
    hash = File.open("storage", "b"){|io| Marshal.load(io)}

    > The suggestion of the END block of the file is a plausible approach,
    > which I had forgotten existed. What I really want is a class finalizer,


    As mentioned ruby seems to flush all open handles on exit.

    > but lacking that I should be able to make the END block work, with a bit
    > of redesign. It will drastically decrease the portability of the class,
    > but as each file can have it's own END block, it shouldn't decrease the
    > portability of the file.


    If you want to save for safety reasons (i.e. to avoid data loss on a crash
    of the Ruby interpreter) you must flush after every write anyway. So
    there would be no need for END block or whatever other means.

    Kind regards

    robert
     
    Robert Klemme, Sep 10, 2004
    #10
  11. Robert Klemme wrote:

    >"Charles Hixson" <> schrieb im Newsbeitrag
    >news:...
    >
    >
    >
    >>What's being done here is updating a database (an Sqlite database
    >>actually). So what I want to do is accumulate a bunch of changes, and
    >>then periodically add them either when things would be idle or when the
    >>number of changes starts to use too much ram. But I don't want to loose
    >>them when the program terminates, and the class doesn't terminate itself
    >>(notice that the table is referred to, indirectly, via a class variable
    >>@@wtable). Now the data accumulation happens in a class separate from
    >>the class that manipulates the database table, etc.
    >>
    >>The stream.sync approach doesn't seem to apply here at all. (Note that
    >>I want to be flushing data in a Hash Table to the file...so I can't use
    >>any automatic file flushing.)
    >>
    >>

    >
    >I don't know how big your hash will grow, but did you try to just marshal
    >the hash like this after every change you want to preserve. Marshal is
    >quite fast, so it might be worth a try:
    >
    >File.open("storage", "wb"){|io| Marshal.dump( hash, io )}
    >hash = File.open("storage", "b"){|io| Marshal.load(io)}
    >
    >
    >
    >>The suggestion of the END block of the file is a plausible approach,
    >>which I had forgotten existed. What I really want is a class finalizer,
    >>
    >>

    >
    >As mentioned ruby seems to flush all open handles on exit.
    >
    >
    >
    >>but lacking that I should be able to make the END block work, with a bit
    >>of redesign. It will drastically decrease the portability of the class,
    >>but as each file can have it's own END block, it shouldn't decrease the
    >>portability of the file.
    >>
    >>

    >
    >If you want to save for safety reasons (i.e. to avoid data loss on a crash
    >of the Ruby interpreter) you must flush after every write anyway. So
    >there would be no need for END block or whatever other means.
    >
    >Kind regards
    >
    > robert
    >


    Marshall is the wrong answer. The hash will be limited to around 1500
    items by flushing. The database of which it is a partial updated mirror
    will likely grow to around 5,000,000 items. The flushing process
    detects all dirty items in the hash and passes them to another routine
    which either updates an existing item or adds a new one.

    Actually, there will likely be several (?) hash tables simultaneously in
    the eventual implementation, and each one will need to implement a
    different version of this procedure. Fortunately, the records
    structures are both orderly and consistent, so I won't need the kind of
    flexibility that marshall implies. I may even eventually translate this
    into a compileable language after I get everything working, for the
    increase in speed that's available. (My plan is 1) first get it
    working, 2) second, speed it up.) If this is adopted I'll probably use
    D (DMD), as that seems the best of the current compileable languages.
    (I wonder if Ruby-inline could handle D code? Well, no rush. That's a
    long ways off yet.)
     
    Charles Hixson, Sep 10, 2004
    #11
  12. "Charles Hixson" <> schrieb im Newsbeitrag
    news:...

    > Marshall is the wrong answer. The hash will be limited to around 1500
    > items by flushing. The database of which it is a partial updated mirror
    > will likely grow to around 5,000,000 items. The flushing process
    > detects all dirty items in the hash and passes them to another routine
    > which either updates an existing item or adds a new one.


    So if I understand you correctly it's like this: you have a hash data
    structure in mem that keeps some data among that data that is not yet
    present in the DB. You want to store all dirty data in a file to make
    sure that in case of a crash you don't loose anything. From time to time
    you write the dirty stuff into the database and if that succeeds you clear
    the temp disk storage. I'll attach something that shows how I image this
    could work.

    > Actually, there will likely be several (?) hash tables simultaneously in
    > the eventual implementation, and each one will need to implement a
    > different version of this procedure. Fortunately, the records
    > structures are both orderly and consistent, so I won't need the kind of
    > flexibility that marshall implies. I may even eventually translate this
    > into a compileable language after I get everything working, for the
    > increase in speed that's available. (My plan is 1) first get it
    > working, 2) second, speed it up.)


    That's the way how it should be ("premature optimization..."). :)

    > If this is adopted I'll probably use
    > D (DMD), as that seems the best of the current compileable languages.
    > (I wonder if Ruby-inline could handle D code? Well, no rush. That's a
    > long ways off yet.)


    Just a wild idea:

    functor = D::compile <<EOF
    D code here
    EOF

    functor.call( "foo", "bar" )

    Of course you'd have to compile the code into a shared lib and dynamically
    load it. But it sounds feasible IMHO. Maybe it's a good idea to provide
    a framework for this, so integration of other languages becomes easier.

    Kind regards

    robert
     
    Robert Klemme, Sep 10, 2004
    #12
  13. Robert Klemme wrote:

    >"Charles Hixson" <> schrieb im Newsbeitrag
    >news:...
    >
    >
    >
    >>Marshall is the wrong answer. The hash will be limited to around 1500
    >>items by flushing. The database of which it is a partial updated mirror
    >>will likely grow to around 5,000,000 items. The flushing process
    >>detects all dirty items in the hash and passes them to another routine
    >>which either updates an existing item or adds a new one.
    >>
    >>

    >
    >So if I understand you correctly it's like this: you have a hash data
    >structure in mem that keeps some data among that data that is not yet
    >present in the DB. You want to store all dirty data in a file to make
    >sure that in case of a crash you don't loose anything. From time to time
    >you write the dirty stuff into the database and if that succeeds you clear
    >the temp disk storage. I'll attach something that shows how I image this
    >could work.
    >
    >

    Sort of. The disk storage is permanent, it's the hash that's
    temporary. I certainly wouldn't want to eat up my ram by holding the
    entire database in ram, when at any one time it didn't need most of it.
    OTOH, a persistent hash would be a reasonable answer...well, I haven't
    looked at your code yet, so I shouldn't comment. It sounds like it
    would be a reasonable answer. I haven't been planning to persist the
    hash itself, but if I can, without excessive cycle use, then that would
    just by itself solve the current problem. (OTOH, it also looks like
    ...
    at_exit { flush } # this is run after the files are
    opened and the hash is initialized
    end # initializer
    ...
    end # class

    will solve the problem. I've read the description of what it does
    three times, and I still can't be *sure* that the file and class
    variables will still be extant when I run it, but it looks like that's
    the intent.


    >>.... (My plan is 1) first get it
    >>working, 2) second, speed it up.)
    >>
    >>

    >
    >That's the way how it should be ("premature optimization..."). :)
    >
    >
    >
    >> If this is adopted I'll probably use
    >>D (DMD), as that seems the best of the current compileable languages.
    >>(I wonder if Ruby-inline could handle D code? Well, no rush. That's a
    >>long ways off yet.)
    >>
    >>

    >
    >Just a wild idea:
    >
    >functor = D::compile <<EOF
    > D code here
    >EOF
    >
    >functor.call( "foo", "bar" )
    >
    >Of course you'd have to compile the code into a shared lib and dynamically
    >load it. But it sounds feasible IMHO. Maybe it's a good idea to provide
    >a framework for this, so integration of other languages becomes easier.
    >
    >Kind regards
    >
    > robert
    >
    >

    Thanks for your assistance.
     
    Charles Hixson, Sep 11, 2004
    #13
  14. "Charles Hixson" <> schrieb im Newsbeitrag
    news:...
    > Sort of. The disk storage is permanent, it's the hash that's temporary.
    > I certainly wouldn't want to eat up my ram by holding the entire database
    > in ram, when at any one time it didn't need most of it.


    Ah, ok, that sounds as if you needed a cache. An LRU cache is relatively
    straightforward. There might be one on RAA. I remember I did an
    experimental implementation once, if you're interested I can check whether I
    still have it.

    > OTOH, a persistent hash would be a reasonable answer...well, I haven't
    > looked at your code yet, so I shouldn't comment. It sounds like it would
    > be a reasonable answer. I haven't been planning to persist the hash
    > itself, but if I can, without excessive cycle use, then that would just by
    > itself solve the current problem. (OTOH, it also looks like
    > ..
    > at_exit { flush } # this is run after the files are opened
    > and the hash is initialized
    > end # initializer
    > ..
    > end # class
    >
    > will solve the problem. I've read the description of what it does three
    > times, and I still can't be *sure* that the file and class variables will
    > still be extant when I run it, but it looks like that's the intent.


    The only thing that at_exit can't help you with is a crash. If for some
    reason (e.g. buggy extension) Ruby crashes at_exit won't help you here. The
    best (i.e safest) is probably to directly store new data in the DB and keep
    a cache of recent used data in mem.

    > Thanks for your assistance.


    You're welcome!

    Kind regards

    robert
     
    Robert Klemme, Sep 12, 2004
    #14
  15. Robert Klemme wrote:

    >
    > "Charles Hixson" <> schrieb im Newsbeitrag
    > news:...
    >
    >> Sort of. The disk storage is permanent, it's the hash that's
    >> temporary. I certainly wouldn't want to eat up my ram by holding the
    >> entire database in ram, when at any one time it didn't need most of it.

    >
    >
    > Ah, ok, that sounds as if you needed a cache. An LRU cache is
    > relatively straightforward. There might be one on RAA. I remember I
    > did an experimental implementation once, if you're interested I can
    > check whether I still have it.


    The cache seems to be working now... my current problem has to do with
    the db, which appears to be confusing variable name with variable
    value. I may need to rename my db variables, which seems an altogether
    silly requirement, but that's what it looks like may be needed.

    >> at_exit { flush } # this is run after the files are
    >> opened and the hash is initialized
    >> end # initializer
    >> ..
    >> end # class
    >>
    >> will solve the problem. I've read the description of what it does
    >> three times, and I still can't be *sure* that the file and class
    >> variables will still be extant when I run it, but it looks like
    >> that's the intent.

    >
    > ..
    >
    > The only thing that at_exit can't help you with is a crash. If for
    > some reason (e.g. buggy extension) Ruby crashes at_exit won't help you
    > here. The best (i.e safest) is probably to directly store new data in
    > the DB and keep a cache of recent used data in mem.


    If it's that corrupt, perhaps I'm better off not saving the cache. (It
    should only cost me a few minutes work, and better stale data than corrupt.)

    > Thanks for your assistance.
    >
    > You're welcome!
    >
    > Kind regards
    >
    > robert


    Thanks again,
    Charles
     
    Charles Hixson, Sep 13, 2004
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?VkIgQ29kZXI=?=

    Maintain Consistency With ASP.NET Templates

    =?Utf-8?B?VkIgQ29kZXI=?=, Apr 18, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    450
    aruntjose
    Feb 13, 2006
  2. Replies:
    1
    Views:
    2,066
    slareau
    Nov 17, 2006
  3. mtp
    Replies:
    11
    Views:
    669
    Robert Klemme
    May 10, 2006
  4. Tobbi

    Object finalization

    Tobbi, Dec 5, 2003, in forum: Python
    Replies:
    0
    Views:
    278
    Tobbi
    Dec 5, 2003
  5. robert
    Replies:
    0
    Views:
    263
    robert
    Dec 9, 2006
Loading...

Share This Page