Multiple processes and tie'd files

Discussion in 'Perl Misc' started by Tuc, Aug 31, 2008.

  1. Tuc

    Tuc Guest

    Hi,

    I'm running into an issue when using a file I've tied, and there
    are multiple long term running processes. I first ran into it with
    Squid as a redirection program (Never resolved it), and now with
    MimeDefang.

    When I tie to a DB_File, if one of the processes or even an
    external process updates the file, the persistent processes aren't
    seeing the update. I have to stop them and restart them for that to
    happen. Sorta defeats the whole reason for using a tie'd file, I could
    just put it into a hash.

    I've tried using the "sync" method on the handle for the tie,
    before and after every read, still with no luck.

    Short of going to mysql (Which is like trying to swat a fly with
    the supercollider) is there another option?

    Thanks, Tuc
    Tuc, Aug 31, 2008
    #1
    1. Advertising

  2. Tuc

    Guest

    Tuc <> wrote:
    > Hi,
    >
    > I'm running into an issue when using a file I've tied, and there
    > are multiple long term running processes. I first ran into it with
    > Squid as a redirection program (Never resolved it), and now with
    > MimeDefang.
    >
    > When I tie to a DB_File, if one of the processes or even an
    > external process updates the file, the persistent processes aren't
    > seeing the update. I have to stop them and restart them for that to
    > happen.


    Have you read the documentation for DB_File?

    > Sorta defeats the whole reason for using a tie'd file, I could
    > just put it into a hash.


    If that is the "whole" reason you are using DB_File, then you shouldn't
    be using DB_File in the first place.

    >
    > I've tried using the "sync" method on the handle for the tie,
    > before and after every read, still with no luck.


    sync syncs up memory changes to the disk. I don't think it is supposed to
    sync disk changes back to memory.

    >
    > Short of going to mysql (Which is like trying to swat a fly with
    > the supercollider) is there another option?


    Mysql is not a super-collider, it is a very light-weight fly swatter. What
    you are trying to doing with DB_File is like trying to swat a fly with a
    pencil sharpener.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Aug 31, 2008
    #2
    1. Advertising

  3. Tuc

    Tuc Guest

    On Aug 30, 10:20 pm, wrote:
    > Tuc <> wrote:
    > > When I tie to a DB_File, if one of the processes or even an
    > > external process updates the file, the persistent processes aren't
    > > seeing the update. I have to stop them and restart them for that to
    > > happen.

    >
    > Have you read the documentation for DB_File?
    >

    I did, way back. Then 10 minutes after I posted I read it again
    and found the section that said "Hey, Tuc, you can't do that with
    DB_File".
    >
    > > Sorta defeats the whole reason for using a tie'd file, I could
    > > just put it into a hash.

    >
    > If that is the "whole" reason you are using DB_File, then you shouldn't
    > be using DB_File in the first place.
    >

    What should I be using then? I need something that I can ask it
    by a key, and get data back. It needs to be accessible from multiple
    programs, and easily updated without modifying the program. I need it
    to be fast/lightweight/not require any additional processes running.
    >
    >
    > > I've tried using the "sync" method on the handle for the tie,
    > > before and after every read, still with no luck.

    >
    > sync syncs up memory changes to the disk. I don't think it is supposed to
    > sync disk changes back to memory.
    >

    Had hoped.
    >
    >
    > > Short of going to mysql (Which is like trying to swat a fly with
    > > the supercollider) is there another option?

    >
    > Mysql is not a super-collider, it is a very light-weight fly swatter. What
    > you are trying to doing with DB_File is like trying to swat a fly with a
    > pencil sharpener.
    >

    Doesn't make sense to start an instance of Mysql for a table
    that will probably be 75-100 entries.

    So what do you suggest to be able to do this? Just "open, while,
    close" a text file?

    I was also trying to keep with DB_File since another program
    actually was generating it, DB_File the only available format. I might
    be able to (and it looks like might have to, unless I want to keep 2
    copies) remove the usage of the file from the other program.

    Tuc
    Tuc, Aug 31, 2008
    #3
  4. Tuc

    Guest

    Tuc <> wrote:
    > On Aug 30, 10:20 pm, wrote:
    > > Tuc <> wrote:
    > > > When I tie to a DB_File, if one of the processes or even an
    > > > external process updates the file, the persistent processes aren't
    > > > seeing the update. I have to stop them and restart them for that to
    > > > happen.

    > >
    > > Have you read the documentation for DB_File?
    > >

    > I did, way back. Then 10 minutes after I posted I read it again
    > and found the section that said "Hey, Tuc, you can't do that with
    > DB_File".


    You might be able to use DB_File, you would just need to untie and retie
    each time you want to sync. But, if you have multiple concurrent accesses,
    which you do otherwise the problem wouldn't exist, then you need to do
    locking as well or your database file will be corrupted.

    From the DB_File docs, it sound like Tie::DB_LockFile might be just
    what you need, except that no module by that name actually seems to exist
    on CPAN or anywhere else I can find.


    > >
    > > > Sorta defeats the whole reason for using a tie'd file, I could
    > > > just put it into a hash.

    > >
    > > If that is the "whole" reason you are using DB_File, then you shouldn't
    > > be using DB_File in the first place.
    > >

    > What should I be using then? I need something that I can ask it
    > by a key, and get data back. It needs to be accessible from multiple
    > programs, and easily updated without modifying the program. I need it
    > to be fast/lightweight/not require any additional processes running.


    You will probably have to compromise somewhere along that list. But
    without knowing your usage patterns, it is hard to say where.


    ....
    > >
    > > > Short of going to mysql (Which is like trying to swat a fly with
    > > > the supercollider) is there another option?

    > >
    > > Mysql is not a super-collider, it is a very light-weight fly swatter.
    > > What you are trying to doing with DB_File is like trying to swat a fly
    > > with a pencil sharpener.
    > >

    > Doesn't make sense to start an instance of Mysql for a table
    > that will probably be 75-100 entries.


    Database servers aren't just about size. Allowing multiple connections to
    access data quickly and concurrently without causing corruption or needless
    slowness is the very reason that database servers exist. Saying "I don't
    need a database because it is only 100 rows" is like saying "I don't need
    to put engine oil in my engine because I'm only going to drive 30 mph".

    > So what do you suggest to be able to do this? Just "open, while,
    > close" a text file?


    I don't see how this would get the job done. There would have to be a
    "print" in there someplace, or else the whole premise of your question
    would be void. And then there would have to be locking, or corruption
    would happen.


    >
    > I was also trying to keep with DB_File since another program
    > actually was generating it, DB_File the only available format. I might
    > be able to (and it looks like might have to, unless I want to keep 2
    > copies) remove the usage of the file from the other program.


    If this other program doesn't do locking and can't be made to do it
    in a way compatible with your program, then you are already playing with
    fire by having them touch the same DB_File file.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Aug 31, 2008
    #4
  5. Op Sat, 30 Aug 2008 20:22:27 -0700, schreef Tuc:
    > What should I be using then? I need something that I can ask it
    > by a key, and get data back. It needs to be accessible from multiple
    > programs, and easily updated without modifying the program. I need it to
    > be fast/lightweight/not require any additional processes running.


    > Doesn't make sense to start an instance of Mysql for a table
    > that will probably be 75-100 entries.
    >
    > So what do you suggest to be able to do this? Just "open, while,
    > close" a text file?
    >


    My advice would be to either use the BerkeleyDB module or SQLite,
    depending on your exact needs.

    Regards,

    Leon Timmermans
    Leon Timmermans, Aug 31, 2008
    #5
  6. Tuc

    Tuc Guest

    On Aug 31, 12:15 am, wrote:
    >
    > You might be able to use DB_File, you would just need to untie and retie
    > each time you want to sync. But, if you have multiple concurrent accesses,
    > which you do otherwise the problem wouldn't exist, then you need to do
    > locking as well or your database file will be corrupted.
    >
    > From the DB_File docs, it sound like Tie::DB_LockFile might be just
    > what you need, except that no module by that name actually seems to exist
    > on CPAN or anywhere else I can find.
    >

    I was hoping not to have to incur the expense of untie/tie every
    time.
    But it seems like for a quick/easy solution, that'll be it.

    The long running processes are read only. An external program
    from it will
    be the only one with write/update capability. (Actually, when the file
    gets rebuilt
    it gets REBUILT. Basically looks like it re-writes the whole file from
    scratch.
    No "delete" of records, just "open, insert*X, close".
    >
    > > What should I be using then? I need something that I can ask it
    > > by a key, and get data back. It needs to be accessible from multiple
    > > programs, and easily updated without modifying the program. I need it
    > > to be fast/lightweight/not require any additional processes running.

    >
    > You will probably have to compromise somewhere along that list. But
    > without knowing your usage patterns, it is hard to say where.
    >

    The upshot is that this is part of a sendmail milter. Every mail
    in or out
    gets run through the milter. On outbound ones, it checks to see if the
    recipient
    is the key to a record. If so, the sender of the email is changed to
    the value
    for that key and then sent along its way. If there isn't a match, it
    checks the
    sender against another file and if there is a key match, the sender is
    changed
    to the value for that key and sent along its way. If neither match,
    its untouched.
    The files are created with sendmails "makemap hash DBNAME <
    TEXTFILE".
    >
    > Database servers aren't just about size. Allowing multiple connections to
    > access data quickly and concurrently without causing corruption or needless
    > slowness is the very reason that database servers exist. Saying "I don't
    > need a database because it is only 100 rows" is like saying "I don't need
    > to put engine oil in my engine because I'm only going to drive 30 mph".
    >
    > > So what do you suggest to be able to do this? Just "open, while,
    > > close" a text file?

    >
    > I don't see how this would get the job done. There would have to be a
    > "print" in there someplace, or else the whole premise of your question
    > would be void. And then there would have to be locking, or corruption
    > would happen.
    >


    Be reasonable, you know there was more to it than what was said, it
    was
    just a way to convey the idea of always opening a file, having a while
    loop
    to go line by line through the file, and then being able to find the
    key and
    use the data. If you need the real code :


    #previous programming above here, including shbang to perl interpreter
    undef $value;

    open (MAILID,"</etc/mail/mailid");
    while (<MAILID>) {
    ($key,$value)=split(/\t/,$_);
    if ($key =~ /^$lookingfor$/) {
    last;
    }
    }
    close (MAILID);

    if ($value)
    {
    #rest of processing here
    }

    >
    > If this other program doesn't do locking and can't be made to do it
    > in a way compatible with your program, then you are already playing with
    > fire by having them touch the same DB_File file.
    >

    sendmail only uses the file read only too. I do know it opens the
    file every
    email that comes through though.

    Tuc
    Tuc, Aug 31, 2008
    #6
  7. Tuc

    Tuc Guest

    On Aug 31, 8:58 am, Alexander Clouter <> wrote:
    >
    > The documentation for DB_File has *nothing* to say on this that is useful,
    > this is just general unix'y know how that you never really get to pick up
    > easily.
    >

    It does tell you to look at other options, one of which doesn't
    seem to
    exist. :)
    >
    > When I wrote a squid based url filtering blacklisting mcwhatsit I used
    > DB_File. The important thing is to have *one* writer and many readers,
    > this means you can forget about locking altogether.
    >

    Exactly the first place I ever ran into this. :) And yes, all my
    processes
    are readers in this case. (It wasn't in the squid case.. The first
    time it saw
    a user from a new IP it redirected them to a "Welcome" page, then
    update a file
    so the next request wasn't redirected)
    >
    > UNIX has this rather nice feature where when a file is open that FD's 'view'
    > of the file does not change even if you delete the file or edit it. To see
    > the changes you have to close the FD and reopen the DB_File. On long
    > running processes this is easy, I would recommend you just '(stat($file))[9]'
    > and see if the modification timestamp has changed at regular intervals.
    > If they have then untie and retie the file and you will see your updates.
    >
    > The regular interval I would use alarm() and have a function that does this,
    > should keep things clean without messing up the logic of your core code.
    >


    Interesting idea, thanks. Its probably less expensive to do that
    than
    constantly untie/tie.
    >
    > No no no, MySQL is horrible! Putting any network based database into the
    > critical loop of a realtime interactive is a bad bad idea. You might get
    > away with using sqlite but probably would still feel dirty from the
    > experience, DB_File's are great for this kind of task.
    >

    Never used sqlite, but seems like more+more people are using it.
    Might be
    worth looking at just as a reference point.

    Thanks, Tuc
    Tuc, Aug 31, 2008
    #7
  8. Tuc

    Tuc Guest

    On Aug 31, 9:26 am, Leon Timmermans <> wrote:
    > Op Sat, 30 Aug 2008 20:22:27 -0700, schreef Tuc:
    >
    > My advice would be to either use the BerkeleyDB module or SQLite,
    > depending on your exact needs.
    >


    I downloaded/installed BerkeleyDB shortly after reading (again)
    the
    DB_File page. Now I have to figure out exactly how they do what I'm
    looking
    for.

    Thanks, Tuc
    Tuc, Aug 31, 2008
    #8
  9. Tuc

    Guest

    Alexander Clouter <> wrote:
    > wrote:
    > > Tuc <> wrote:
    > >> Hi,
    > >>
    > >> I'm running into an issue when using a file I've tied, and there
    > >> are multiple long term running processes. I first ran into it with
    > >> Squid as a redirection program (Never resolved it), and now with
    > >> MimeDefang.
    > >>
    > >> When I tie to a DB_File, if one of the processes or even an
    > >> external process updates the file, the persistent processes aren't
    > >> seeing the update. I have to stop them and restart them for that to
    > >> happen.

    > >
    > > Have you read the documentation for DB_File?
    > >

    > The documentation for DB_File has *nothing* to say on this that is
    > useful,


    I found it's discussion of locking useful.

    ....
    >
    > UNIX has this rather nice feature where when a file is open that FD's
    > 'view' of the file does not change even if you delete the file or edit
    > it.


    This is absolutely not true. The FD view does not change on file deletion,
    but it absolutely does change on file content edits.


    > >
    > > Mysql is not a super-collider, it is a very light-weight fly swatter.
    > > What you are trying to doing with DB_File is like trying to swat a fly
    > > with a pencil sharpener.
    > >

    > No no no, MySQL is horrible! Putting any network based database into the
    > critical loop of a realtime interactive is a bad bad idea.


    MySQL doesn't have to be "network based". You can run it on the same
    server if you choose to. And if you are willing to compromise by using
    stale data for a minimum amount of time, you can implement that "feature"
    in mysql just like you can in DB_File.


    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Aug 31, 2008
    #9
  10. Tuc

    Guest

    Tuc <> wrote:
    > On Aug 31, 12:15 am, wrote:
    > >
    > > You might be able to use DB_File, you would just need to untie and
    > > retie each time you want to sync. But, if you have multiple concurrent
    > > accesses, which you do otherwise the problem wouldn't exist, then you
    > > need to do locking as well or your database file will be corrupted.
    > >
    > > From the DB_File docs, it sound like Tie::DB_LockFile might be just
    > > what you need, except that no module by that name actually seems to
    > > exist on CPAN or anywhere else I can find.
    > >

    > I was hoping not to have to incur the expense of untie/tie every
    > time.
    > But it seems like for a quick/easy solution, that'll be it.


    If you are willing to use stale data for up to, say, 15 seconds, then
    you could delay for at least that long between untie/tie operations.
    But if you do, you need to be careful not to get corrupted input. One way
    is to copy and use the copy for those >=15 seconds (like Tie::DB_Lock,
    which actually does seem to exist, does). The other is to make sure
    the write process doesn't overwrite the DB file, but rather replaces it
    by moving a different inode to that same name.


    > The long running processes are read only. An external program
    > from it will
    > be the only one with write/update capability. (Actually, when the file
    > gets rebuilt
    > it gets REBUILT. Basically looks like it re-writes the whole file from
    > scratch.
    > No "delete" of records, just "open, insert*X, close".


    If this is done "in place", then locking is probably still necessary. If
    one of the read-only scripts reads the file while it is in the process of
    being re-built, it could get very confused. Perhaps this is rare
    enough/harmless enough that you are willing to take the risk.

    If it is done by creating a new DB_File, then mv-ing the new file to
    replace the old one, then it should probably be safe on unix-ish
    filesystems.


    > >
    > > > So what do you suggest to be able to do this? Just "open,
    > > > while, close" a text file?

    > >
    > > I don't see how this would get the job done. There would have to be a
    > > "print" in there someplace, or else the whole premise of your question
    > > would be void. And then there would have to be locking, or corruption
    > > would happen.
    > >

    >
    > Be reasonable, you know there was more to it than what was said, it
    > was
    > just a way to convey the idea of always opening a file,


    Optimization and concurrency are both fiddly businesses. Trying to do them
    requires an "unreasonable" level of precision.


    >
    > #previous programming above here, including shbang to perl interpreter
    > undef $value;
    >
    > open (MAILID,"</etc/mail/mailid");
    > while (<MAILID>) {
    > ($key,$value)=split(/\t/,$_);
    > if ($key =~ /^$lookingfor$/) {
    > last;
    > }
    > }
    > close (MAILID);
    >
    > if ($value)
    > {
    > #rest of processing here
    > }


    If the file is small (<100), doing this each time might be faster than
    untie and retie each time. But other than that, this is morally equivalent
    to using DB_File. The same issues of locking, isolation, concurrency,
    corruption, etc. still apply.


    > > If this other program doesn't do locking and can't be made to do it
    > > in a way compatible with your program, then you are already playing
    > > with fire by having them touch the same DB_File file.
    > >

    > sendmail only uses the file read only too. I do know it opens the
    > file every
    > email that comes through though.


    Since it is read-only, it won't cause corruption in the disk file. But
    it can still get corrupted itself if it reads the file while the other
    process is writing it. This is probably an unlikely event, but if you
    process a lot of email it will happen eventually. Whether the occasional
    weirdness is tolerable to you or not I don't know.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Aug 31, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Smegly
    Replies:
    1
    Views:
    1,096
    Mitchell
    May 19, 2004
  2. Replies:
    4
    Views:
    952
    M.E.Farmer
    Feb 13, 2005
  3. Replies:
    3
    Views:
    380
    Oliver Wong
    Feb 13, 2007
  4. Marc Heiler
    Replies:
    1
    Views:
    173
    Robert Klemme
    May 24, 2009
  5. botfood

    tie() with DB_File not tie()ing ?

    botfood, Apr 24, 2006, in forum: Perl Misc
    Replies:
    23
    Views:
    452
    botfood
    Apr 26, 2006
Loading...

Share This Page