Multiple processes and tie'd files

T

Tuc

Hi,

I'm running into an issue when using a file I've tied, and there
are multiple long term running processes. I first ran into it with
Squid as a redirection program (Never resolved it), and now with
MimeDefang.

When I tie to a DB_File, if one of the processes or even an
external process updates the file, the persistent processes aren't
seeing the update. I have to stop them and restart them for that to
happen. Sorta defeats the whole reason for using a tie'd file, I could
just put it into a hash.

I've tried using the "sync" method on the handle for the tie,
before and after every read, still with no luck.

Short of going to mysql (Which is like trying to swat a fly with
the supercollider) is there another option?

Thanks, Tuc
 
X

xhoster

Tuc said:
Hi,

I'm running into an issue when using a file I've tied, and there
are multiple long term running processes. I first ran into it with
Squid as a redirection program (Never resolved it), and now with
MimeDefang.

When I tie to a DB_File, if one of the processes or even an
external process updates the file, the persistent processes aren't
seeing the update. I have to stop them and restart them for that to
happen.

Have you read the documentation for DB_File?
Sorta defeats the whole reason for using a tie'd file, I could
just put it into a hash.

If that is the "whole" reason you are using DB_File, then you shouldn't
be using DB_File in the first place.
I've tried using the "sync" method on the handle for the tie,
before and after every read, still with no luck.

sync syncs up memory changes to the disk. I don't think it is supposed to
sync disk changes back to memory.
Short of going to mysql (Which is like trying to swat a fly with
the supercollider) is there another option?

Mysql is not a super-collider, it is a very light-weight fly swatter. What
you are trying to doing with DB_File is like trying to swat a fly with a
pencil sharpener.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Tuc

Have you read the documentation for DB_File?
I did, way back. Then 10 minutes after I posted I read it again
and found the section that said "Hey, Tuc, you can't do that with
DB_File".
If that is the "whole" reason you are using DB_File, then you shouldn't
be using DB_File in the first place.
What should I be using then? I need something that I can ask it
by a key, and get data back. It needs to be accessible from multiple
programs, and easily updated without modifying the program. I need it
to be fast/lightweight/not require any additional processes running.
sync syncs up memory changes to the disk. I don't think it is supposed to
sync disk changes back to memory.
Had hoped.



Mysql is not a super-collider, it is a very light-weight fly swatter. What
you are trying to doing with DB_File is like trying to swat a fly with a
pencil sharpener.
Doesn't make sense to start an instance of Mysql for a table
that will probably be 75-100 entries.

So what do you suggest to be able to do this? Just "open, while,
close" a text file?

I was also trying to keep with DB_File since another program
actually was generating it, DB_File the only available format. I might
be able to (and it looks like might have to, unless I want to keep 2
copies) remove the usage of the file from the other program.

Tuc
 
X

xhoster

Tuc said:
I did, way back. Then 10 minutes after I posted I read it again
and found the section that said "Hey, Tuc, you can't do that with
DB_File".

You might be able to use DB_File, you would just need to untie and retie
each time you want to sync. But, if you have multiple concurrent accesses,
which you do otherwise the problem wouldn't exist, then you need to do
locking as well or your database file will be corrupted.

From the DB_File docs, it sound like Tie::DB_LockFile might be just
what you need, except that no module by that name actually seems to exist
on CPAN or anywhere else I can find.

What should I be using then? I need something that I can ask it
by a key, and get data back. It needs to be accessible from multiple
programs, and easily updated without modifying the program. I need it
to be fast/lightweight/not require any additional processes running.

You will probably have to compromise somewhere along that list. But
without knowing your usage patterns, it is hard to say where.


....
Doesn't make sense to start an instance of Mysql for a table
that will probably be 75-100 entries.

Database servers aren't just about size. Allowing multiple connections to
access data quickly and concurrently without causing corruption or needless
slowness is the very reason that database servers exist. Saying "I don't
need a database because it is only 100 rows" is like saying "I don't need
to put engine oil in my engine because I'm only going to drive 30 mph".
So what do you suggest to be able to do this? Just "open, while,
close" a text file?

I don't see how this would get the job done. There would have to be a
"print" in there someplace, or else the whole premise of your question
would be void. And then there would have to be locking, or corruption
would happen.

I was also trying to keep with DB_File since another program
actually was generating it, DB_File the only available format. I might
be able to (and it looks like might have to, unless I want to keep 2
copies) remove the usage of the file from the other program.

If this other program doesn't do locking and can't be made to do it
in a way compatible with your program, then you are already playing with
fire by having them touch the same DB_File file.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
L

Leon Timmermans

Op Sat, 30 Aug 2008 20:22:27 -0700, schreef Tuc:
What should I be using then? I need something that I can ask it
by a key, and get data back. It needs to be accessible from multiple
programs, and easily updated without modifying the program. I need it to
be fast/lightweight/not require any additional processes running.
Doesn't make sense to start an instance of Mysql for a table
that will probably be 75-100 entries.

So what do you suggest to be able to do this? Just "open, while,
close" a text file?

My advice would be to either use the BerkeleyDB module or SQLite,
depending on your exact needs.

Regards,

Leon Timmermans
 
T

Tuc

You might be able to use DB_File, you would just need to untie and retie
each time you want to sync. But, if you have multiple concurrent accesses,
which you do otherwise the problem wouldn't exist, then you need to do
locking as well or your database file will be corrupted.

From the DB_File docs, it sound like Tie::DB_LockFile might be just
what you need, except that no module by that name actually seems to exist
on CPAN or anywhere else I can find.
I was hoping not to have to incur the expense of untie/tie every
time.
But it seems like for a quick/easy solution, that'll be it.

The long running processes are read only. An external program
from it will
be the only one with write/update capability. (Actually, when the file
gets rebuilt
it gets REBUILT. Basically looks like it re-writes the whole file from
scratch.
No "delete" of records, just "open, insert*X, close".
You will probably have to compromise somewhere along that list. But
without knowing your usage patterns, it is hard to say where.
The upshot is that this is part of a sendmail milter. Every mail
in or out
gets run through the milter. On outbound ones, it checks to see if the
recipient
is the key to a record. If so, the sender of the email is changed to
the value
for that key and then sent along its way. If there isn't a match, it
checks the
sender against another file and if there is a key match, the sender is
changed
to the value for that key and sent along its way. If neither match,
its untouched.
The files are created with sendmails "makemap hash DBNAME <
TEXTFILE".
Database servers aren't just about size. Allowing multiple connections to
access data quickly and concurrently without causing corruption or needless
slowness is the very reason that database servers exist. Saying "I don't
need a database because it is only 100 rows" is like saying "I don't need
to put engine oil in my engine because I'm only going to drive 30 mph".


I don't see how this would get the job done. There would have to be a
"print" in there someplace, or else the whole premise of your question
would be void. And then there would have to be locking, or corruption
would happen.

Be reasonable, you know there was more to it than what was said, it
was
just a way to convey the idea of always opening a file, having a while
loop
to go line by line through the file, and then being able to find the
key and
use the data. If you need the real code :


#previous programming above here, including shbang to perl interpreter
undef $value;

open (MAILID,"</etc/mail/mailid");
while (<MAILID>) {
($key,$value)=split(/\t/,$_);
if ($key =~ /^$lookingfor$/) {
last;
}
}
close (MAILID);

if ($value)
{
#rest of processing here
}
If this other program doesn't do locking and can't be made to do it
in a way compatible with your program, then you are already playing with
fire by having them touch the same DB_File file.
sendmail only uses the file read only too. I do know it opens the
file every
email that comes through though.

Tuc
 
T

Tuc

The documentation for DB_File has *nothing* to say on this that is useful,
this is just general unix'y know how that you never really get to pick up
easily.
It does tell you to look at other options, one of which doesn't
seem to
exist. :)
When I wrote a squid based url filtering blacklisting mcwhatsit I used
DB_File. The important thing is to have *one* writer and many readers,
this means you can forget about locking altogether.
Exactly the first place I ever ran into this. :) And yes, all my
processes
are readers in this case. (It wasn't in the squid case.. The first
time it saw
a user from a new IP it redirected them to a "Welcome" page, then
update a file
so the next request wasn't redirected)
UNIX has this rather nice feature where when a file is open that FD's 'view'
of the file does not change even if you delete the file or edit it. To see
the changes you have to close the FD and reopen the DB_File. On long
running processes this is easy, I would recommend you just '(stat($file))[9]'
and see if the modification timestamp has changed at regular intervals.
If they have then untie and retie the file and you will see your updates.

The regular interval I would use alarm() and have a function that does this,
should keep things clean without messing up the logic of your core code.

Interesting idea, thanks. Its probably less expensive to do that
than
constantly untie/tie.
No no no, MySQL is horrible! Putting any network based database into the
critical loop of a realtime interactive is a bad bad idea. You might get
away with using sqlite but probably would still feel dirty from the
experience, DB_File's are great for this kind of task.
Never used sqlite, but seems like more+more people are using it.
Might be
worth looking at just as a reference point.

Thanks, Tuc
 
T

Tuc

Op Sat, 30 Aug 2008 20:22:27 -0700, schreef Tuc:

My advice would be to either use the BerkeleyDB module or SQLite,
depending on your exact needs.

I downloaded/installed BerkeleyDB shortly after reading (again)
the
DB_File page. Now I have to figure out exactly how they do what I'm
looking
for.

Thanks, Tuc
 
X

xhoster

Alexander Clouter said:
The documentation for DB_File has *nothing* to say on this that is
useful,

I found it's discussion of locking useful.

....
UNIX has this rather nice feature where when a file is open that FD's
'view' of the file does not change even if you delete the file or edit
it.

This is absolutely not true. The FD view does not change on file deletion,
but it absolutely does change on file content edits.

No no no, MySQL is horrible! Putting any network based database into the
critical loop of a realtime interactive is a bad bad idea.

MySQL doesn't have to be "network based". You can run it on the same
server if you choose to. And if you are willing to compromise by using
stale data for a minimum amount of time, you can implement that "feature"
in mysql just like you can in DB_File.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
X

xhoster

Tuc said:
I was hoping not to have to incur the expense of untie/tie every
time.
But it seems like for a quick/easy solution, that'll be it.

If you are willing to use stale data for up to, say, 15 seconds, then
you could delay for at least that long between untie/tie operations.
But if you do, you need to be careful not to get corrupted input. One way
is to copy and use the copy for those >=15 seconds (like Tie::DB_Lock,
which actually does seem to exist, does). The other is to make sure
the write process doesn't overwrite the DB file, but rather replaces it
by moving a different inode to that same name.

The long running processes are read only. An external program
from it will
be the only one with write/update capability. (Actually, when the file
gets rebuilt
it gets REBUILT. Basically looks like it re-writes the whole file from
scratch.
No "delete" of records, just "open, insert*X, close".

If this is done "in place", then locking is probably still necessary. If
one of the read-only scripts reads the file while it is in the process of
being re-built, it could get very confused. Perhaps this is rare
enough/harmless enough that you are willing to take the risk.

If it is done by creating a new DB_File, then mv-ing the new file to
replace the old one, then it should probably be safe on unix-ish
filesystems.

Be reasonable, you know there was more to it than what was said, it
was
just a way to convey the idea of always opening a file,

Optimization and concurrency are both fiddly businesses. Trying to do them
requires an "unreasonable" level of precision.

#previous programming above here, including shbang to perl interpreter
undef $value;

open (MAILID,"</etc/mail/mailid");
while (<MAILID>) {
($key,$value)=split(/\t/,$_);
if ($key =~ /^$lookingfor$/) {
last;
}
}
close (MAILID);

if ($value)
{
#rest of processing here
}

If the file is small (<100), doing this each time might be faster than
untie and retie each time. But other than that, this is morally equivalent
to using DB_File. The same issues of locking, isolation, concurrency,
corruption, etc. still apply.

sendmail only uses the file read only too. I do know it opens the
file every
email that comes through though.

Since it is read-only, it won't cause corruption in the disk file. But
it can still get corrupted itself if it reads the file while the other
process is writing it. This is probably an unlikely event, but if you
process a lot of email it will happen eventually. Whether the occasional
weirdness is tolerable to you or not I don't know.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top