Using fork() with a telnet-esque socket connection

Exide Arabellan · Feb 18, 2004

Greetings,

Before I get too far into the guts of writing a custom telnet daemon, I
thought i'de test the waters and get some input. Im building an open
source text-based RPG (brings back memories eh?

. Looking to support
anywhere from 50-250 simultaneous connections (just depends on the
host's bandwith), with MySQL as the backend. Majority of my Perl
knowledge is self taught, so my terminology is unique (and thus hard to
ask specific questions). This is my first venture into Perl using socket
connections, and have read about using fork() to create a multithreaded
daemon. What i gathered was each end user connection would spawn a child
process of the parent daemon. Now upon inspecting other common
applications running on my box i was unsure if this is the right way to
go about it. If its not, could someone suggest an alternative?

Exide Arabellan
www.arabellan.com

Ben Morrow · Feb 18, 2004

Exide Arabellan said:
Before I get too far into the guts of writing a custom telnet daemon, I
thought i'de test the waters and get some input. Im building an open
source text-based RPG (brings back memories eh? . Looking to support
anywhere from 50-250 simultaneous connections (just depends on the
host's bandwith), with MySQL as the backend. Majority of my Perl
knowledge is self taught, so my terminology is unique (and thus hard to
ask specific questions). This is my first venture into Perl using socket
connections, and have read about using fork() to create a multithreaded
daemon. What i gathered was each end user connection would spawn a child
process of the parent daemon. Now upon inspecting other common
applications running on my box i was unsure if this is the right way to
go about it. If its not, could someone suggest an alternative?

There are four ways to handle multiple simultaneous connections:

1. Single-process. You have one process, which handles creating new
connections and all existing connections, using select(). This is
usually the most efficient solution for small numbers of connections,
but scales badly.

2. Forking. The parent process accepts each new connection and forks a
child to handle it. An example of this is sshd. This can make the
programming simpler (writing a select loop is not very easy), and also
will scale better than the first. There is still an appreciable delay
between a client making a connection and that client being serviced,
however, due to the fork.

3. Pre-forking. The parent initially forks a pool of children, and when
a new connection comes in it is passed to one of the existing children
to handle. When the number of free children gets too low, some more are
forked; contrariwise, when a whole lot have finished their processing
and gone back to waiting some are killed. An example of this is
(traditional) Apache. This is probably the most flexible and responsive
solution, although it still uses a whole process for each connection
which can cause problems if the number is huge.

4. Threaded. Basically, as forking; but instead of creating a new
process a new thread within the current process is created. This is
(usually) the least resource-intensive of these last three, but writing
(traditional) threaded programs adds a whole level of complexity to the
task. Examples of this are Apache/win32 and IIS. With Perl threads in
their current state, I would strongly advise against attempting this on
any decent Unix OS: Perl thread creation as currently implemented is
simply an emulation of fork() within the process, and the OS can do it
more efficiently than perl can.

Ben

Uri Guttman · Feb 18, 2004

BM> There are four ways to handle multiple simultaneous connections:

BM> 1. Single-process. You have one process, which handles creating new
BM> connections and all existing connections, using select(). This is
BM> usually the most efficient solution for small numbers of connections,
BM> but scales badly.

where did you get the scales badly from? event loops (which is what
using select is), are general the most efficient way to handle multiple
connections. the only weakness is that you have to deal with any
potential blocking operations. given a pure cpu bound server, an event
loop has the least overhead over the other method. there is no forking,
no thread spawning, no extra context switching, etc. and event loop
design and coding is usually simpler than the others once you get the
hang of it. unfortunately we have a world of kiddies spoonfed that
threads are the only solution to any multiplexing and they can't break
out of that mold. threads have so many problems (in any langauge) that
must be dealt with including synchronization, locking and sharing and no
simple way to scale beyond a single box.

uri

Exide Arabellan · Feb 18, 2004

Ben said:
There are four ways to handle multiple simultaneous connections:

1. Single-process. You have one process, which handles creating new
connections and all existing connections, using select(). This is
usually the most efficient solution for small numbers of connections,
but scales badly.

2. Forking. The parent process accepts each new connection and forks a
child to handle it. An example of this is sshd. This can make the
programming simpler (writing a select loop is not very easy), and also
will scale better than the first. There is still an appreciable delay
between a client making a connection and that client being serviced,
however, due to the fork.

3. Pre-forking. The parent initially forks a pool of children, and when
a new connection comes in it is passed to one of the existing children
to handle. When the number of free children gets too low, some more are
forked; contrariwise, when a whole lot have finished their processing
and gone back to waiting some are killed. An example of this is
(traditional) Apache. This is probably the most flexible and responsive
solution, although it still uses a whole process for each connection
which can cause problems if the number is huge.

4. Threaded. Basically, as forking; but instead of creating a new
process a new thread within the current process is created. This is
(usually) the least resource-intensive of these last three, but writing
(traditional) threaded programs adds a whole level of complexity to the
task. Examples of this are Apache/win32 and IIS. With Perl threads in
their current state, I would strongly advise against attempting this on
any decent Unix OS: Perl thread creation as currently implemented is
simply an emulation of fork() within the process, and the OS can do it
more efficiently than perl can.

Ben

Thank you for that insight. That is exactly the type of information i
was after. I grep'd the newsgroup for fork() and read a bunch of posts
on 'Fork vs Thread', and have eliminated the Threaded possibility. I do
hope to port this to Win32 in the future, but my main concern is use in
Linux. I'm leaning towards Pre-forking, though im concerned about the
following statement.

This is probably the most flexible and responsive solution, although
it still uses a whole process for each connection which can cause
problems if the number is huge.

Can you give me a ballpark on 'huge'? Flexibility, scaleability, and
response time are key factors (and the main reason i asked the original
question). Im aiming for medium-level consumer servers (512m~1gb ram,
1.0gHz+ cpu).

Exide Arabellan
www.arabellan.com

Ben Morrow · Feb 18, 2004

Uri Guttman said:
BM> There are four ways to handle multiple simultaneous connections:

BM> 1. Single-process. You have one process, which handles creating new
BM> connections and all existing connections, using select(). This is
BM> usually the most efficient solution for small numbers of connections,
BM> but scales badly.

where did you get the scales badly from?

Err... I was under the impression it was conventional wisdom. I've read
it in lots of places (CS textbooks). Thinking about it anew, though, you
are (of course) correct; even in the multi-processor case, the advantage
of forking can only be at best of the order of the number of processors.

unfortunately we have a world of kiddies spoonfed that
threads are the only solution to any multiplexing

....and Unix users (such as myself) spoonfed that 'fork is an easy way to
improve performance'

.

So, to the OP: try POE (probably what I should have said in the first
place).

Ben

Uri Guttman · Feb 18, 2004

BM> Err... I was under the impression it was conventional wisdom. I've read
BM> it in lots of places (CS textbooks). Thinking about it anew, though, you
BM> are (of course) correct; even in the multi-processor case, the advantage
BM> of forking can only be at best of the order of the number of processors.

BM> ...and Unix users (such as myself) spoonfed that 'fork is an easy
BM> way to improve performance'

.

well, it was and still is a good way to work around blocking issues. in
old unix days it was the only way to do that. threads are more recent
and have taken over as the current spoonfed pablum.

BM> So, to the OP: try POE (probably what I should have said in the first
BM> place).

or stem which you should check out at stemsystems.com or on cpan.

uri

Exide Arabellan · Feb 18, 2004

Uri said:
BM> So, to the OP: try POE (probably what I should have said in the first
BM> place).

or stem which you should check out at stemsystems.com or on cpan.

uri

Did some preliminary research on Stem, and it looks like somethin i may
use. If nothing else im going to play around with it and get familiar. I
see it solving alot of 'reinventing the wheel' scenarios as advertised.
Thanks to the both of you for this enlightening discussion

Fork Problem	2	Nov 18, 2005
incorrect errno/perror with IO::socket->new	26	Jan 16, 2008
How to close a listening socket asynchronously	6	Feb 11, 2005
Definition of a socket on Sun's website	11	Jun 22, 2005
Help with a program	0	Sep 17, 2004
More on 5.8 and signals	2	Sep 16, 2003

Using fork() with a telnet-esque socket connection

Exide Arabellan

Ben Morrow

Uri Guttman

Exide Arabellan

Ben Morrow

Uri Guttman

Exide Arabellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads