page faults when spawning subprocesses

D

Dave Kirby

I am working on a network management program written in python that has
multiple threads (typically 20+) spawning subprocesses which are used
to communicate with other systems on the network. This runs fine for a
while, but eventually slows down to a crawl. Running sar shows that
when it is running slowly there is an exceptionally large number of
minor page faults - there are continuously 14000 faults/sec, with a
variation of about +/-100. There are no pages swapped to disk, these
are purely in-memory faults.

I have a hypothesis about what is happening, but have not been able to
prove or disprove it:
the theory is that when a subprocess is spawned, there is a small
window between the call to fork and the call to exec where the parent's
memory is shared between the two processes. Linux marks the memory as
copy-on-write, so if the parent process then accesses memory during
that window a minor page fault is generated and the page is copied.
Normally this is not a problem, but with a large number of threads all
spawning subprocesses there is a chance of a another process being
spawned during that window and the whole of memory is copied. This
slows everything else down so the probability of another collision
increases, and the whole thing snowballs. This could also happen if
something else tries to write to large areas of memory (maybe the
python garbage collector?).

This is running on a Sun V40 64 bit SMP with Fedora Core 3. The same
code has been run on intel systems and the problem has not been seen -
this could be because the problem is specific to that hardware or
because the intel systems are not fast enough for a collision to occur.

My questions are:

1) is the theory plausible/likely?

2) what could I do to prove/disprove it?

3) has anyone else seen this problem?

4) are there any other situations that could be causing a continuous
stream of minor page faults?

5) WTF can I do about it?

Dave Kirby
(dave.x.kirby at
gmail dot
com)
 
K

Kasper Dupont

Dave said:
5) WTF can I do about it?

Maybe using vfork rather than fork would help. But
I'm not sure that will work as intended when there
are multiple threads, in fact I'm not sure fork
will work either. You could have fork racing against
another thread being in a critical region thus
duplicating the memory map at some point where some
data structures are in an inconsistent state and
apparently locked by some thread existing in the
parent.

A possible solution would be to use fork to create
two processes before creating any threads. Have
the communicate over pipes or sockets when new
processes are to be created. Then one process can
create all the threads you need, and the other can
fork off children.

Even in that case vfork may come in handy. If you
dislike the semantics of vfork, but still want the
parent to block until the child has called execve,
then you can do so manually using a pipe. Create
the pipe before calling fork, in parent process
you close write end and try to read from the pipe,
in child process you close read end and mark write
end close on exec. When exec succeeds, the pipe is
closed and parent gets EOF.

(I have tried some of this in C, but I must admit,
I don't know if it can be done in Python as well.)
 
D

Daniel Kabs

Dave said:
I am working on a network management program written in python that has
multiple threads (typically 20+) spawning subprocesses which are used
to communicate with other systems on the network. ...

Let me check if I got you right: You are using fork() inside a thread in
a multi-threaded environment. That sounds complicated. :)

Have a look at

http://www.opengroup.org/onlinepubs/009695399/functions/fork.html

It mentions your use of fork, i.e. to create a new process running a
different program (in this case the call to fork() is soon followed by a
call to exec()).

If you fork in your multi-threaded environment, what happens with all
your threads? The document resorts to "the effects of calling functions
that require certain resources between the call to fork() and the call
to an exec function are undefined.". Maybe you are just experiencing
this :)

The document above recommends: "to avoid errors, the child process may
only execute async-signal-safe operations until such time as one of the
exec functions is called."

Maybe this discussion is also of some help:

http://groups.google.com/group/comp...4c329/217660515af867ea?tvc=2#217660515af867ea

Cheers
Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top