question about forked processes writing to the same file

  • Thread starter it_says_BALLS_on_your forehead
  • Start date
I

it_says_BALLS_on_your forehead

is this dangerous? for instance, is there ever a danger of race
conditions/locking/etc if i have:

use strict; use warnings;
use Parallel::ForkManager;

my $pm = Parallel::ForkManager->new(10);

# assume @files contains 100 files that will be processed,
# and processing time could range from subseconds to hours

my $out = 'results.txt';
for my $file (@files) {
$pm->start and next;

# some code to process file
# blah blah blah

open( my $fh_out, '>', $out ) or die "can't open $out: $!\n";
print $fh_out "$file\n";
close $fh_out;

$pm->finish;
}
$pm->wait_all_children;
 
I

it_says_BALLS_on_your forehead

it_says_BALLS_on_your forehead said:
is this dangerous? for instance, is there ever a danger of race
conditions/locking/etc if i have:

use strict; use warnings;
use Parallel::ForkManager;

my $pm = Parallel::ForkManager->new(10);

# assume @files contains 100 files that will be processed,
# and processing time could range from subseconds to hours

my $out = 'results.txt';
for my $file (@files) {
$pm->start and next;

# some code to process file
# blah blah blah

open( my $fh_out, '>', $out ) or die "can't open $out: $!\n";

Apologies... ^^^ should be '>>'
 
G

Gunnar Hjalmarsson

it_says_BALLS_on_your forehead said:
is this dangerous? for instance, is there ever a danger of race
conditions/locking/etc if i have:

use strict; use warnings;
use Parallel::ForkManager;

my $pm = Parallel::ForkManager->new(10);

# assume @files contains 100 files that will be processed,
# and processing time could range from subseconds to hours

my $out = 'results.txt';
for my $file (@files) {
$pm->start and next;

# some code to process file
# blah blah blah

open( my $fh_out, '>>', $out ) or die "can't open $out: $!\n";
print $fh_out "$file\n";
close $fh_out;

$pm->finish;
}
$pm->wait_all_children;

As long as you don't care about the order in which the output from
respective file is appended to $out, I can't see what the problem would be.
 
I

it_says_BALLS_on_your forehead

Gunnar said:
As long as you don't care about the order in which the output from
respective file is appended to $out, I can't see what the problem would be.

order is unimportant at this juncture. i was concerned if one process
would interrupt another that was writing, so that either both failed,
or a single entry became a garbled hybrid of two entries...something
along those lines. these, or other cases that may interfere with
writing one entry per line to $out, are what cause me apprehension.
 
G

Gunnar Hjalmarsson

it_says_BALLS_on_your forehead said:
order is unimportant at this juncture. i was concerned if one process
would interrupt another that was writing, so that either both failed,
or a single entry became a garbled hybrid of two entries...something
along those lines. these, or other cases that may interfere with
writing one entry per line to $out, are what cause me apprehension.

Probably somebody more knowledgable than me should comment on your concerns.

Awaiting that, have you read

perldoc -q append.+text

I can add that I'm using the above method in a widely used Perl program
(run on various platforms), without any problems having been reported.
However, I do use flock() to set an exclusive lock before printing to
the file. Can't tell if locking is necessary.
 
X

xhoster

it_says_BALLS_on_your forehead said:
Apologies... ^^^ should be '>>'

On Linux this is safe, provided the string $file is not more than a few
hundred bytes. On Windows, it is not safe (although probably safe
enough, provided this is just for progress monitoring)

Xho
 
X

xhoster

On Linux this is safe, provided the string $file is not more than a few
hundred bytes. On Windows, it is not safe (although probably safe
enough, provided this is just for progress monitoring)

I meant to say "On Linux this is fairly safe." I have never ran into a
problem with it, but that is no gaurantee of absolute safety.
 
I

it_says_BALLS_on_your forehead

I meant to say "On Linux this is fairly safe." I have never ran into a
problem with it, but that is no gaurantee of absolute safety.
yeah, i tried it and ran into no problems, but i didn't know if anyone
was aware of existing issues with this--it's difficult to test. thanks
for relating your experience :).
 
J

Joe Smith

it_says_BALLS_on_your forehead said:
is this dangerous?

It would be better to have the file opened for writing in
the parent process, and let all the forked processes inherit
that file handle.
-Joe
 
I

it_says_BALLS_on_your forehead

Joe said:
It would be better to have the file opened for writing in
the parent process, and let all the forked processes inherit
that file handle.
-Joe

i thought about that, but i tried something similar with an FTP
connection, and discovered it was not fork safe, so i was hesitant to
share anything among the forked processes.
 
S

simon.chao

On Linux this is safe, provided the string $file is not more than a few
hundred bytes. On Windows, it is not safe (although probably safe
enough, provided this is just for progress monitoring)

i'm running into some issues, which may be related to this. unsure
right now. i'm also using a similar process to actually print out
apache web log records (unlike th above example, which is just a short
file name), some of which, may be large due to cookies and query
strings, and additional authentication information we put in there. i
know in at least one case, the user agent string appears once fully,
then later in the processed record--BUT the second time only a jagged
latter half of it gets inserted somewhere before the end of the record.

from your response here, it seems like you may be aware of some problem
is the string is large?

should i use flock()? i don't really know much about it...

this never occurred when i wrote to the same file using many system()
calls. again, i am unsure if this is actually a forking issue. i am
currently investigating. just wondered if this raised any alarms with
anyone who may know what is going on.
 
S

simon.chao

Gunnar said:
Probably somebody more knowledgable than me should comment on your concerns.

Awaiting that, have you read

perldoc -q append.+text

I can add that I'm using the above method in a widely used Perl program
(run on various platforms), without any problems having been reported.
However, I do use flock() to set an exclusive lock before printing to
the file. Can't tell if locking is necessary.
hello Gunnar, could you please post a snippet of code illustrating how
you would use flock() in my above example? it would be much
appreciated. if it doesn't fix my problem, at least it is one less
thing to worry about.
 
A

A. Sinan Unur

(e-mail address removed) wrote in
Gunnar Hjalmarsson wrote:
....

hello Gunnar, could you please post a snippet of code illustrating how
you would use flock() in my above example?

So far,I do not see any code posted by you.

Maybe, you should read the documentation, and come up with a short
script illustrating your problem, and we can comment on it:

perldoc -q lock

perldoc -f flock

Sinan
 
I

it_says_BALLS_on_your forehead

A. Sinan Unur said:
(e-mail address removed) wrote in


So far,I do not see any code posted by you.

Maybe, you should read the documentation, and come up with a short
script illustrating your problem, and we can comment on it:

perldoc -q lock

perldoc -f flock

apologies, i was referring to the following example:
use strict; use warnings;
use Parallel::ForkManager;


my $pm = Parallel::ForkManager->new(10);


# assume @files contains 100 files that will be processed,
# and processing time could range from subseconds to hours


my $out = 'results.txt';
for my $file (@files) {
$pm->start and next;


# some code to process file
# blah blah blah


open( my $fh_out, '>>', $out ) or die "can't open $out: $!\n";
print $fh_out "$file\n";
close $fh_out;


$pm->finish;
}
$pm->wait_all_children;
 
A

A. Sinan Unur

apologies, i was referring to the following example:

I see now ... You like to keep your readers guessing by not sticking
with a single posting address. Not very polite. Just configure whichever
client you are using with the same name and email address, please, so I
can figure out with whom I am corresponding.
use strict; use warnings;
use Parallel::ForkManager;


my $pm = Parallel::ForkManager->new(10);


# assume @files contains 100 files that will be processed,
# and processing time could range from subseconds to hours


my $out = 'results.txt';
for my $file (@files) {
$pm->start and next;


# some code to process file
# blah blah blah

This is a good place to put an exclusive flock on a sentinel file (not
the output file). You code will block until it gets the exclusive lock.
open( my $fh_out, '>>', $out ) or die "can't open $out: $!\n";
print $fh_out "$file\n";
close $fh_out;

And, this would be the place to release that lock.

So long as the sentinel file is not on a network mounted volume, this
will give you the assurance that at most one process will write to the
file at any given time.
$pm->finish;
}
$pm->wait_all_children;

Sinan
 
I

it_says_BALLS_on_your forehead

I see now ... You like to keep your readers guessing by not sticking
with a single posting address. Not very polite. Just configure whichever
client you are using with the same name and email address, please, so I
can figure out with whom I am corresponding.



sorry about that. i am unable to access non-work email at work. i will
try to set up my home account to use the same name and address.

This is a good place to put an exclusive flock on a sentinel file (not
the output file). You code will block until it gets the exclusive lock.

i don't understand. in the Cookbook 2nd ed., pg 279 Locking a File they
write:

open(FH, "+<" $path)
flock(FH, LOCK_EX)
#update file, then...
close(FH)

i'm not importing the Fcntl module, so i must use the numeric values. 2
corresponds to LOCK_EX.

i don't understand what you mean by placing a LOCK_EX on a sentinal
file. why wouldn't i lock my output file? i thought i would :

open the fh
lock it
write to the file
close the fh (which unlocks it).

is this not correct? the file may get up to 3 GB, so i don't know if
locking a REGION of the file may be more apt, but i am also unsure of
whether or not that only applies to file updates, because how can you
lock a region of a new file, when the newest part doesn't exist yet.
 
G

Gunnar Hjalmarsson

A. Sinan Unur said:
This is a good place to put an exclusive flock on a sentinel file (not
the output file). You code will block until it gets the exclusive lock.

Why not the output file?

Why not just:

flock $fh_out, 2 or die $!;
And, this would be the place to release that lock.

Unless you use a lexical ref to the filehandle as above, in which case
you don't need to release it explicitly.

My related comment was posted at
http://groups.google.com/group/comp.lang.perl.misc/msg/2ca8d6e6894030b3
 
A

A. Sinan Unur

sorry about that. i am unable to access non-work email at work. i will
try to set up my home account to use the same name and address.



i don't understand. in the Cookbook 2nd ed., pg 279 Locking a File
they write:

open(FH, "+<" $path)
flock(FH, LOCK_EX)
#update file, then...
close(FH)

i'm not importing the Fcntl module, so i must use the numeric values.
2 corresponds to LOCK_EX.

You should do

use Fcntl qw:)flock);

because there is no guarantee that LOCK_EX will always be 2. Using the
LOCK_EX will make sure that the correct value for the system is going to
be used.
i don't understand what you mean by placing a LOCK_EX on a sentinal
file. why wouldn't i lock my output file? i thought i would :

open the fh
lock it
write to the file
close the fh (which unlocks it).

is this not correct?

It is not wrong, although I can see how my post can be interpreted as
stating otherwise.

I have gotten into the habit of locking a sentinel file as a semaphore,
because, I frequently find myself needing to serialize access to
multiple resources, in addition to, occasionally, serializing access to
files on a network mounted drive.

Sinan
 
G

Gunnar Hjalmarsson

A. Sinan Unur said:
Gunnar, I had seen your response, but I thought Simon Chao and 'balls'
were different people.

Yeah, I know, and you also answered my question above in your reply to
"it_says_BALLS_on_your forehead".

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top