Using fork() with a telnet-esque socket connection

E

Exide Arabellan

Greetings,

Before I get too far into the guts of writing a custom telnet daemon, I
thought i'de test the waters and get some input. Im building an open
source text-based RPG (brings back memories eh? :). Looking to support
anywhere from 50-250 simultaneous connections (just depends on the
host's bandwith), with MySQL as the backend. Majority of my Perl
knowledge is self taught, so my terminology is unique (and thus hard to
ask specific questions). This is my first venture into Perl using socket
connections, and have read about using fork() to create a multithreaded
daemon. What i gathered was each end user connection would spawn a child
process of the parent daemon. Now upon inspecting other common
applications running on my box i was unsure if this is the right way to
go about it. If its not, could someone suggest an alternative?

Exide Arabellan
www.arabellan.com
 
B

Ben Morrow

Exide Arabellan said:
Before I get too far into the guts of writing a custom telnet daemon, I
thought i'de test the waters and get some input. Im building an open
source text-based RPG (brings back memories eh? :). Looking to support
anywhere from 50-250 simultaneous connections (just depends on the
host's bandwith), with MySQL as the backend. Majority of my Perl
knowledge is self taught, so my terminology is unique (and thus hard to
ask specific questions). This is my first venture into Perl using socket
connections, and have read about using fork() to create a multithreaded
daemon. What i gathered was each end user connection would spawn a child
process of the parent daemon. Now upon inspecting other common
applications running on my box i was unsure if this is the right way to
go about it. If its not, could someone suggest an alternative?

There are four ways to handle multiple simultaneous connections:

1. Single-process. You have one process, which handles creating new
connections and all existing connections, using select(). This is
usually the most efficient solution for small numbers of connections,
but scales badly.

2. Forking. The parent process accepts each new connection and forks a
child to handle it. An example of this is sshd. This can make the
programming simpler (writing a select loop is not very easy), and also
will scale better than the first. There is still an appreciable delay
between a client making a connection and that client being serviced,
however, due to the fork.

3. Pre-forking. The parent initially forks a pool of children, and when
a new connection comes in it is passed to one of the existing children
to handle. When the number of free children gets too low, some more are
forked; contrariwise, when a whole lot have finished their processing
and gone back to waiting some are killed. An example of this is
(traditional) Apache. This is probably the most flexible and responsive
solution, although it still uses a whole process for each connection
which can cause problems if the number is huge.

4. Threaded. Basically, as forking; but instead of creating a new
process a new thread within the current process is created. This is
(usually) the least resource-intensive of these last three, but writing
(traditional) threaded programs adds a whole level of complexity to the
task. Examples of this are Apache/win32 and IIS. With Perl threads in
their current state, I would strongly advise against attempting this on
any decent Unix OS: Perl thread creation as currently implemented is
simply an emulation of fork() within the process, and the OS can do it
more efficiently than perl can.

Ben
 
U

Uri Guttman

BM> There are four ways to handle multiple simultaneous connections:

BM> 1. Single-process. You have one process, which handles creating new
BM> connections and all existing connections, using select(). This is
BM> usually the most efficient solution for small numbers of connections,
BM> but scales badly.

where did you get the scales badly from? event loops (which is what
using select is), are general the most efficient way to handle multiple
connections. the only weakness is that you have to deal with any
potential blocking operations. given a pure cpu bound server, an event
loop has the least overhead over the other method. there is no forking,
no thread spawning, no extra context switching, etc. and event loop
design and coding is usually simpler than the others once you get the
hang of it. unfortunately we have a world of kiddies spoonfed that
threads are the only solution to any multiplexing and they can't break
out of that mold. threads have so many problems (in any langauge) that
must be dealt with including synchronization, locking and sharing and no
simple way to scale beyond a single box.

uri
 
E

Exide Arabellan

Ben said:
There are four ways to handle multiple simultaneous connections:

1. Single-process. You have one process, which handles creating new
connections and all existing connections, using select(). This is
usually the most efficient solution for small numbers of connections,
but scales badly.

2. Forking. The parent process accepts each new connection and forks a
child to handle it. An example of this is sshd. This can make the
programming simpler (writing a select loop is not very easy), and also
will scale better than the first. There is still an appreciable delay
between a client making a connection and that client being serviced,
however, due to the fork.

3. Pre-forking. The parent initially forks a pool of children, and when
a new connection comes in it is passed to one of the existing children
to handle. When the number of free children gets too low, some more are
forked; contrariwise, when a whole lot have finished their processing
and gone back to waiting some are killed. An example of this is
(traditional) Apache. This is probably the most flexible and responsive
solution, although it still uses a whole process for each connection
which can cause problems if the number is huge.

4. Threaded. Basically, as forking; but instead of creating a new
process a new thread within the current process is created. This is
(usually) the least resource-intensive of these last three, but writing
(traditional) threaded programs adds a whole level of complexity to the
task. Examples of this are Apache/win32 and IIS. With Perl threads in
their current state, I would strongly advise against attempting this on
any decent Unix OS: Perl thread creation as currently implemented is
simply an emulation of fork() within the process, and the OS can do it
more efficiently than perl can.

Ben

Thank you for that insight. That is exactly the type of information i
was after. I grep'd the newsgroup for fork() and read a bunch of posts
on 'Fork vs Thread', and have eliminated the Threaded possibility. I do
hope to port this to Win32 in the future, but my main concern is use in
Linux. I'm leaning towards Pre-forking, though im concerned about the
following statement.
This is probably the most flexible and responsive solution, although
it still uses a whole process for each connection which can cause
problems if the number is huge.

Can you give me a ballpark on 'huge'? Flexibility, scaleability, and
response time are key factors (and the main reason i asked the original
question). Im aiming for medium-level consumer servers (512m~1gb ram,
1.0gHz+ cpu).

Exide Arabellan
www.arabellan.com
 
B

Ben Morrow

Uri Guttman said:
BM> There are four ways to handle multiple simultaneous connections:

BM> 1. Single-process. You have one process, which handles creating new
BM> connections and all existing connections, using select(). This is
BM> usually the most efficient solution for small numbers of connections,
BM> but scales badly.

where did you get the scales badly from?

Err... I was under the impression it was conventional wisdom. I've read
it in lots of places (CS textbooks). Thinking about it anew, though, you
are (of course) correct; even in the multi-processor case, the advantage
of forking can only be at best of the order of the number of processors.
unfortunately we have a world of kiddies spoonfed that
threads are the only solution to any multiplexing

....and Unix users (such as myself) spoonfed that 'fork is an easy way to
improve performance' :).

So, to the OP: try POE (probably what I should have said in the first
place).

Ben
 
U

Uri Guttman

BM> Err... I was under the impression it was conventional wisdom. I've read
BM> it in lots of places (CS textbooks). Thinking about it anew, though, you
BM> are (of course) correct; even in the multi-processor case, the advantage
BM> of forking can only be at best of the order of the number of processors.

BM> ...and Unix users (such as myself) spoonfed that 'fork is an easy
BM> way to improve performance' :).

well, it was and still is a good way to work around blocking issues. in
old unix days it was the only way to do that. threads are more recent
and have taken over as the current spoonfed pablum.

BM> So, to the OP: try POE (probably what I should have said in the first
BM> place).

or stem which you should check out at stemsystems.com or on cpan.

uri
 
E

Exide Arabellan

Uri said:
BM> So, to the OP: try POE (probably what I should have said in the first
BM> place).

or stem which you should check out at stemsystems.com or on cpan.

uri

Did some preliminary research on Stem, and it looks like somethin i may
use. If nothing else im going to play around with it and get familiar. I
see it solving alot of 'reinventing the wheel' scenarios as advertised.
Thanks to the both of you for this enlightening discussion :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top