subprocesses lifecycle

M

Matthieu Imbert

hi.

I have a perl script that forks several subprocesses at various times.

I use the open "process_name |" syntax, and then use select to read
multiple process outputs, and have a timeout on all these subprocesses.

If the timeout is reached, I want to immediately exit my script with an
error message.

Currently, when I detect the timeout, I call die "error message". the
message is displayed, but the script does not return until subprocesses
finish (this may take several minutes, depending on what the
subprocesses do).

Is there a way to force the end of all subprocesses when calling die?

best regards,

Matthieu
 
C

C.DeRykus

hi.

I have a perl script that forks several subprocesses at various times.

I use the open "process_name |" syntax, and then use select to read
multiple process outputs, and have a timeout on all these subprocesses.

If the timeout is reached, I want to immediately exit my script with an
error message.

Currently, when I detect the timeout, I call die "error message". the
message is displayed, but the script does not return until subprocesses
finish (this may take several minutes, depending on what the
subprocesses do).

Is there a way to force the end of all subprocesses when calling die?



Each successful pipe open will return the child process id. So,
assuming a Unix O/S, you could save child id's and then send the
signal 'TERM' serially to each child id when there's a timeout, eg,

foreach my $child (@pids) {
kill 'TERM', $child
or kill 'KILL',$child
or warn "can't signal $child\n";
}

Alternatively, 'perldoc perlipc' demo's an idiom using a negative
process id to signal an entire Unix process group, eg,

{
local $SIG{TERM} = 'IGNORE';
kill TERM => -$$;
}
 
E

Eric Pozharski

Currently, when I detect the timeout, I call die "error message". the
message is displayed, but the script does not return until subprocesses
finish (this may take several minutes, depending on what the
subprocesses do).

perl -mIO::pipe -wle '
$pipe = IO::pipe->new;
if(fork) {
$pipe->reader;
sleep 1;
die; };
$pipe->writer;
open $h, q|</etc/passwd|;
while($l = <$h>) {
print $pipe $l;
sleep 1; }
' ; ps -O ppid t ; sleep 40 ; ps -O ppid t
Died at -e line 6.
PID PPID S TTY TIME COMMAND
12782 1 S pts/1 00:00:00 perl -mIO::pipe -wle ?$pipe = IO::pipe->new;
12783 29996 R pts/1 00:00:00 ps -O ppid t
29996 29991 S pts/1 00:00:11 bash
PID PPID S TTY TIME COMMAND
12785 29996 R pts/1 00:00:00 ps -O ppid t
29996 29991 S pts/1 00:00:11 bash

See that? There's no problem with dieing (I won't comment why the
system needs more that half a minute to get rid off child (kernel?
shell? init? panic...); YMMV).

However you say that you have a problem. I suppose you have to
investigate why your script attempts to collect zombies. It should not
unless told so.
Is there a way to force the end of all subprocesses when calling die?

Second. No one can kill process which hangs in syscall till the process
gets out into userspace. So you'd be better finding why you collect.
 
M

Matthieu Imbert

Eric said:
perl -mIO::pipe -wle '
$pipe = IO::pipe->new;
if(fork) {
$pipe->reader;
sleep 1;
die; };
$pipe->writer;
open $h, q|</etc/passwd|;
while($l = <$h>) {
print $pipe $l;
sleep 1; }
' ; ps -O ppid t ; sleep 40 ; ps -O ppid t
Died at -e line 6.
PID PPID S TTY TIME COMMAND
12782 1 S pts/1 00:00:00 perl -mIO::pipe -wle ?$pipe = IO::pipe->new;
12783 29996 R pts/1 00:00:00 ps -O ppid t
29996 29991 S pts/1 00:00:11 bash
PID PPID S TTY TIME COMMAND
12785 29996 R pts/1 00:00:00 ps -O ppid t
29996 29991 S pts/1 00:00:11 bash

See that? There's no problem with dieing (I won't comment why the
system needs more that half a minute to get rid off child (kernel?
shell? init? panic...); YMMV).

However you say that you have a problem. I suppose you have to
investigate why your script attempts to collect zombies. It should not
unless told so.


Second. No one can kill process which hangs in syscall till the process
gets out into userspace. So you'd be better finding why you collect.

Hi Eric, here is how i understand things:

In your example code, the child process stays alive after the end of
parent process. As there are probably 30 to 40 lines in /etc/passwd and
it sleeps 1 second for each line, it's not surprising that it takes
about half a minute to end and die.

This confirms the fact that perl does not kill subprocesses when calling
die.

But it does not explain why in your example the parent script returns
immediately when calling die, while in my case the parent script waits
for children to end before returning. I thought that this could be
related to the way you create child processes (with fork), whereas i
create then with open. But this little test script returns immediately:

perl -e '
open (CHILD,"sleep 30 |");
die "byebye";
'

So the problem must come from something else. i have to understand why
it behaves differently in my first script (i'll try to isolate the
simplest reproducible demonstration code of the problem).

Currently, as a workaround, i added code that finds all subprocesses of
my script and sends TERM, then wait 10s, then send KILL to all of them


Matthieu
 
H

Hans Mulder

Matthieu said:
But it does not explain why in your example the parent script returns
immediately when calling die, while in my case the parent script waits
for children to end before returning. I thought that this could be
related to the way you create child processes (with fork), whereas i
create then with open. But this little test script returns immediately:

perl -e '
open (CHILD,"sleep 30 |");
die "byebye";
'

By contrast, if I do this:

perl -e '
open my $child ,"sleep 30 |";
die "byebye";
'
, then I have to wait 30 seconds.

It looks like when my $child goes out of scope, perl closes the handle
and this implies waiting for the child to finish and then setting $?.

I would have thought your example should behave the same, but it doesn't
(not on my machine anyway).

Perhaps you need a double fork. That is, your child could fork and then
the original child exits immediately, letting the grandchild to the real
work. That way, your script won't have to wait when it decides to close
the $child handle.

What you'd really want, is a way to tell C<open> that you don't want
C<close> to wait for this child. As far as I know, there is currently
no simple way to achieve that.

Hope this helps,

-- HansM
 
E

Eric Pozharski

Matthieu Imbert said:
Eric Pozharski wrote: *SKIP*
In your example code, the child process stays alive after the end of
parent process. As there are probably 30 to 40 lines in /etc/passwd
and it sleeps 1 second for each line, it's not surprising that it
takes about half a minute to end and die.

Positive. My fault. I've moved B<sleep> in child before B<while> and
the child exits immediately (with regard to B<sleep> of course). What I
don't understand is why the child succesfully writes in pipe. The pipe
isn't closed if a reader exits? I don't grok pipes obviously.

*SKIP*
So the problem must come from something else. i have to understand why
it behaves differently in my first script (i'll try to isolate the
simplest reproducible demonstration code of the problem).

Consider reviewing the list of modules loaded. There is B<waitpid> or
B<wait> somewhere. Consider reviewing bugreports of B<perl> for your
distribution (there's such thing as distribution specific quirks, you
know).

Anyway, I wish you a good luck (hunting for such things is a big
challenge). Anyway, your understanding of Perl will improve a lot.

*CUT*
 
C

C.DeRykus

By contrast, if I do this:

perl -e '
open my $child ,"sleep 30 |";
die "byebye";
'
, then I have to wait 30 seconds.

It looks like when my $child goes out of scope, perl closes the handle
and this implies waiting for the child to finish and then setting $?.

I would have thought your example should behave the same, but it doesn't
(not on my machine anyway).

Perhaps you need a double fork. That is, your child could fork and then
the original child exits immediately, letting the grandchild to the real
work. That way, your script won't have to wait when it decides to close
the $child handle.

What you'd really want, is a way to tell C<open> that you don't want
C<close> to wait for this child. As far as I know, there is currently
no simple way to achieve that.

Wouldn't backgrounding the task
accomplish that:

open my $fd, "/some/task & |"
or die...

However, child subprocesses would then need to be foregrounded with
SIGCONT if the parent wants to kill them before exiting.
 
E

Eric Pozharski

open my $fd, "/some/task & |"
or die...
However, child subprocesses would then need to be foregrounded with
SIGCONT if the parent wants to kill them before exiting.

Backgrounding doesn't work. I meant it doesn't matter.

time perl -wle '
open my $h, q{(sleep 1 ; /bin/echo -en xyz ) & |} or die $!;
print `ps --cols 60 -O ppid t`;
print <$h>;
print `ps --cols 60 -O ppid t`;'
PID PPID S TTY TIME COMMAND
2198 2193 S pts/1 00:00:02 bash
7528 2198 S pts/1 00:00:00 perl -wle ?open my $h, q{(sl
7529 7528 Z pts/1 00:00:00 [sh] <defunct>
7530 1 S pts/1 00:00:00 sh -c (sleep 1 ; /bin/echo -
7531 7530 R pts/1 00:00:00 sh -c (sleep 1 ; /bin/echo -
7532 7528 R pts/1 00:00:00 ps --cols 60 -O ppid t

xyz
PID PPID S TTY TIME COMMAND
2198 2193 S pts/1 00:00:02 bash
7528 2198 S pts/1 00:00:00 perl -wle ?open my $h, q{(sl
7529 7528 Z pts/1 00:00:00 [sh] <defunct>
7534 7528 R pts/1 00:00:00 ps --cols 60 -O ppid t


real 0m1.277s
user 0m0.084s
sys 0m0.080s
 
P

Peter J. Holzer

Second. No one can kill process which hangs in syscall till the process
gets out into userspace. So you'd be better finding why you collect.

This is not generally true. Only when the syscall is in an
uninterruptible sleep (also known as "disk wait", although a disk is not
necessarily involved), no signals (not even SIGKILL) are accepted.

hp
 
P

Peter J. Holzer

Positive. My fault. I've moved B<sleep> in child before B<while> and
the child exits immediately (with regard to B<sleep> of course). What I
don't understand is why the child succesfully writes in pipe. The pipe
isn't closed if a reader exits? I don't grok pipes obviously.

When the reader exits (or more exactly, then the last reader closes the
pipe) and attempt to write into the pipe will yield a SIGPIPE. Since
your script doesn't catch SIGPIPE, this will cause your child process to
terminate. But since you didn't call $pipe->autoflush the child won't
actually try to write to the pipe until the buffer (4kB on Linux, 8kB on
most other unixes) is full - that will be after about 75 or 150 lines,
respectively.

hp
 
P

Peter J. Holzer

Wouldn't backgrounding the task
accomplish that:

open my $fd, "/some/task & |"
or die...

Man, that's ugly!
But yes, I think that should work (although I haven't actually tried
it).

However, child subprocesses would then need to be foregrounded with
SIGCONT if the parent wants to kill them before exiting.

No. SIGCONT doesn't "foreground" a process running in the background.
It continues a process which has been stopped. A running process can be
sent signals whether it is in the foreground or the background.

hp
 
C

C.DeRykus

Man, that's ugly!
But yes, I think that should work (although I haven't actually tried
it).

I'm not sure why a lone "&" tips the ugly balance :)

No. SIGCONT doesn't "foreground" a process running in the background.
It continues a process which has been stopped. A running process can be
sent signals whether it is in the foreground or the background.

Yes, I mis-spoke but a SIGCONT actually is sent to the process
group when a backgrounded job is
moved to the foreground via "fg"
to enable a terminal read for example. At least that's I glean from
Stevens. Tricky stuff though...

I was also wrong about needing
to foreground the bg jobs so
they could be killed. They can
be killed easily if you know the
pid's. But, there doesn't appear an elegant way to pick them up
without some grubbing around.
 
P

Peter J. Holzer

I'm not sure why a lone "&" tips the ugly balance :)

* Because it invokes a shell (which wouldn't otherwise be necessary)
* Because you have to think about it for a minute to figure out why it
works (if you figure it out at all - see Eric's post).

Yes, I mis-spoke but a SIGCONT actually is sent to the process
group when a backgrounded job is
moved to the foreground via "fg"
to enable a terminal read for example.

"fg" moves a job to the foreground. But that job isn't necessarily in
the background, it can also be stopped. In the latter case of course a
SIGCONT is necessary.

hp
 
P

Peter J. Holzer

Backgrounding doesn't work. I meant it doesn't matter.

time perl -wle '
open my $h, q{(sleep 1 ; /bin/echo -en xyz ) & |} or die $!;
print `ps --cols 60 -O ppid t`;
print <$h>;

You are waiting for input here - of course you can read "xyz\n" only
when the client writes it. So here is your 1 second delay. That has
nothing to do with close.
print `ps --cols 60 -O ppid t`;' [...]
real 0m1.277s
user 0m0.084s
sys 0m0.080s

If you remove print <$h>; the parent will exit immediately, but the
child will continue to run.

% time perl -wle '
open my $h, q{(sleep 1 ; /bin/echo -en xyz )& |} or die $!;
print `ps --cols 60 -O ppid t`;'
ps --cols 60 -O ppid t
PID PPID S TTY TIME COMMAND
2287 27943 R pts/3 00:00:00 perl -wle ?open my $h, q{(sl
2288 2287 Z pts/3 00:00:00 [sh] <defunct>
2289 2287 R pts/3 00:00:00 ps --cols 60 -O ppid t
2290 1 S pts/3 00:00:00 sh -c (sleep 1 ; /bin/echo -
2291 2290 S pts/3 00:00:00 sleep 1
27943 27942 S pts/3 00:00:00 zsh

perl -wle 0.00s user 0.01s system 71% cpu 0.011 total
PID PPID S TTY TIME COMMAND
2290 1 S pts/3 00:00:00 sh -c (sleep 1 ; /bin/echo -
2291 2290 S pts/3 00:00:00 sleep 1
2292 27943 R pts/3 00:00:00 ps --cols 60 -O ppid t
27943 27942 S pts/3 00:00:00 zsh

hp
 
E

Eric Pozharski

Peter J. Holzer said:
If you remove print <$h>; the parent will exit immediately, but the
child will continue to run.
% time perl -wle '
open my $h, q{(sleep 1 ; /bin/echo -en xyz )& |} or die $!;
print `ps --cols 60 -O ppid t`;'
ps --cols 60 -O ppid t
PID PPID S TTY TIME COMMAND
2287 27943 R pts/3 00:00:00 perl -wle ?open my $h, q{(sl
2288 2287 Z pts/3 00:00:00 [sh] <defunct>
2289 2287 R pts/3 00:00:00 ps --cols 60 -O ppid t
2290 1 S pts/3 00:00:00 sh -c (sleep 1 ; /bin/echo -
2291 2290 S pts/3 00:00:00 sleep 1
27943 27942 S pts/3 00:00:00 zsh
perl -wle 0.00s user 0.01s system 71% cpu 0.011 total
PID PPID S TTY TIME COMMAND
2290 1 S pts/3 00:00:00 sh -c (sleep 1 ; /bin/echo -
2291 2290 S pts/3 00:00:00 sleep 1
2292 27943 R pts/3 00:00:00 ps --cols 60 -O ppid t
27943 27942 S pts/3 00:00:00 zsh

Yes, I know that that will exit immediately. I've just attempted to
show that backgrounding doesn't work.
 
C

C.DeRykus

* Because it invokes a shell (which wouldn't otherwise be necessary)
* Because you have to think about it for a minute to figure out why it
works (if you figure it out at all - see Eric's post).

True, the "&" forces a shell and complexity
increases but, an "ugly" solution which
requires only a single keystroke suddenly
looks better even in a beauty contest..

However, maybe Hans' suggested double fork
is a more palatable solution.

"fg" moves a job to the foreground. But that job isn't necessarily in
the background, it can also be stopped. In the latter case of course a
SIGCONT is necessary.

<off topic>
Yes, a SIGCONT can come into play in both cases ---
* when a foreground process is stopped (SIGTSTP)
* a background process attempts to read from the
terminal, gets a SIGTTIN, and then is put
into the foreground with "fg" to enable a
terminal read.
</off topic>
 
E

Eric Pozharski

Eric Pozharski said:
*SKIP*
However you say that you have a problem. I suppose you have to
investigate why your script attempts to collect zombies. It should
not unless told so.

I've thought (and read) a lot about this. I believe now, that my guess
is wrong.

There's no problem with zombies (and respectively waiting for childs).
As C.DeRykus clearly showed double fork doesn't help.

Now, I think, that B<perl> waits till pipe closes. That happens when
writer (I intentionally say 'writer' but 'child', because it can be
child of B<init> (since double fork)) intentionally closes pipe or just
terminates.

I was wrong. Again. Sorry for inconvinience.

And what surprises me most is that, as Hans Mulder discovered, lexical
filehandles are waited, globals are not. Wouldn't someone willing to dig
through source and explain why that's that way? I've checked, both of
them are B<isa> B<FileHandle>. And till they differ a lot. Errmm,..
Can I guess again?

*CUT*
 
X

xhoster

Eric Pozharski said:
And what surprises me most is that, as Hans Mulder discovered, lexical
filehandles are waited, globals are not. Wouldn't someone willing to dig
through source and explain why that's that way?

I am guessing it is because lexicals are destroyed when they go out of
scope, while globals are only destroyed during "global destruction", during
which time the automatic waiting behavior may not be working.

If one uses circular refs to prevent a lexical filehandle from going out of
scope until global destruction, they don't wait. For example,
the below exits immediately:

perl -le ' my @y; open $y[0], "sleep 5 |" or die; push @y,\@y'

Sometimes it waits anyway. Global destruction is hard to predict.
This waits:

perl -le ' my @y; open $y[0], "sleep 5 |" or die; push @y,\@y; $z=bless {}'


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
E

Eric Pozharski

Peter J. Holzer said:
terminate. But since you didn't call $pipe->autoflush the child won't
actually try to write to the pipe until the buffer (4kB on Linux, 8kB on
most other unixes) is full - that will be after about 75 or 150 lines,
respectively.

That's what I'd messed up, line oriented at surface while buffered in
depth. Thanks, now I feel much better.
 
P

Peter J. Holzer

Peter J. Holzer said:
If you remove print <$h>; the parent will exit immediately, but the
child will continue to run.
[...]
Yes, I know that that will exit immediately. I've just attempted to
show that backgrounding doesn't work.

But it does work! I.e. it does what Hans wanted. If you want it to do
something else you need to explain what you want it to do first.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,586
Members
45,088
Latest member
JeremyMedl

Latest Threads

Top