[ANN] Multiplexer - linear non-blocking I/O

M

Mikael Brockman

Blocking I/O is really easy to use. But when you use it to write
servers, you run into problems: you can't run two blocking syscalls
simultaneously. So if you're writing a huge file to some guy, every
other client is stalled, and no one new can connect. Unacceptable, for
many types of servers. They need non-blocking I/O.

Non-blocking I/O is a lot more annoying to use. Instead of going

| write "Hello. What's your name?"
| name = read_line
| write "How do you do, #{name}?"
| state = read_line
| write "It's nice to hear that you're #{state}, #{name}."

we have to make weird state machines.

The good we can use callcc to make the non-blocking nature
practically invisible. Multiplexer does that. Here's how you'd write a
hello server:

| class Test < Multiplexer::Handler
| def handle
| write_line "Hello. What's your name?"
| name = read_line
| write_line "How do you do, #{name}?"
| state = read_line
| write_line "It's nice to hear that you're #{state}, #{name}."
| disconnect
| end
| end
|
| def test
| multiplexer = Multiplexer::Multiplexer.new 0.5
| multiplexer.listen 31337, Test
| multiplexer.run
| end
|
| test

It'll run in one thread. Multiplexer handles the select(3) calls.

Get it:
http://www.phubuh.org/Projects/nntpu/_darcs/current/multiplexer.rb
 
R

Robert Klemme

Mikael Brockman said:
Blocking I/O is really easy to use. But when you use it to write
servers, you run into problems: you can't run two blocking syscalls
simultaneously. So if you're writing a huge file to some guy, every
other client is stalled, and no one new can connect. Unacceptable, for
many types of servers. They need non-blocking I/O.

Non-blocking I/O is a lot more annoying to use. Instead of going

It'll run in one thread. Multiplexer handles the select(3) calls.

What is the advantage over a solution with threads? IOW, why should I use
multiplexer over individual threads per connection?

Kind regards

robert
 
M

Mikael Brockman

Robert Klemme said:
What is the advantage over a solution with threads? IOW, why should I
use multiplexer over individual threads per connection?

Since Ruby's threads aren't native, you can't do I/O from several at a
time. So one IO#read call blocking for a long time will block the other
threads, too. You could loop a read with a time-out, I guess. But with
a single thread running select, the whole process can stall completely
while waiting for I/O. And it's more elegant. :)
 
D

Dave Thomas

Since Ruby's threads aren't native, you can't do I/O from several at a
time. ]

This is not true. Ruby goes non-blocking I/O from threads, so in
general you'll see overlapped execution.

Thread.new do
loop do
puts "You said #{gets}"
end
end

10.times do
sleep(1)
puts "Say something"
end



Cheers

Dave
 
R

Robert Klemme

Mikael Brockman said:
Since Ruby's threads aren't native, you can't do I/O from several at a
time.

That's not true.
So one IO#read call blocking for a long time will block the other
threads, too.

Also wrong: execute this script


tickers = [$stdout, $stderr].map do |io|
Thread.new do
100.times do |i|
io.puts "#{io.fileno}: #{Time.now}: Tick #{i}"
sleep 1
end
end
end

puts "PROMPT"
# blocks in next line:
input = gets
puts "ENTERED #{input}"

tickers.each {|th| th.join }

16:39:44 [ruby]: ruby ticker.rb
1: Fri Nov 26 17:40:05 GMT+2:00 2004: Tick 0
2: Fri Nov 26 17:40:05 GMT+2:00 2004: Tick 0
PROMPT
2: Fri Nov 26 17:40:06 GMT+2:00 2004: Tick 1
1: Fri Nov 26 17:40:06 GMT+2:00 2004: Tick 1
2: Fri Nov 26 17:40:07 GMT+2:00 2004: Tick 2
1: Fri Nov 26 17:40:07 GMT+2:00 2004: Tick 2
2: Fri Nov 26 17:40:08 GMT+2:00 2004: Tick 3
1: Fri Nov 26 17:40:08 GMT+2:00 2004: Tick 3
foo
ENTERED foo
2: Fri Nov 26 17:40:09 GMT+2:00 2004: Tick 4
1: Fri Nov 26 17:40:09 GMT+2:00 2004: Tick 4
2: Fri Nov 26 17:40:10 GMT+2:00 2004: Tick 5
1: Fri Nov 26 17:40:10 GMT+2:00 2004: Tick 5
2: Fri Nov 26 17:40:11 GMT+2:00 2004: Tick 6
1: Fri Nov 26 17:40:11 GMT+2:00 2004: Tick 6
2: Fri Nov 26 17:40:12 GMT+2:00 2004: Tick 7
1: Fri Nov 26 17:40:12 GMT+2:00 2004: Tick 7
You could loop a read with a time-out, I guess. But with
a single thread running select, the whole process can stall completely
while waiting for I/O. And it's more elegant. :)

IMHO threads are more elegant.

Regards

robert
 
M

Mikael Brockman

Robert Klemme said:
Mikael Brockman said:
Since Ruby's threads aren't native, you can't do I/O from several at
a time.

That's not true.
So one IO#read call blocking for a long time will block the other
threads, too.

Also wrong: execute this script
[snip]

Sorry: faulty generalization. The problem is demonstrated in this
script:

| require 'socket'
|
| server = TCPServer.new 12345
|
| t_a = Thread.start do
| a = server.accept
| data = "foo" * 10000000
| a << data
| a.close
| end
|
| b = server.accept
| b << "b"
| b.close
|
| t_a.join

The second client isn't accepted until the huge batch of data is sent.
I guess you could solve the problem by splitting it into a bunch of
smaller batches.

Another appeal could be that by keeping it single-threaded, you have
fewer concurrency issues to worry about.
 
R

Robert Klemme

Mikael Brockman said:
Robert Klemme said:
Mikael Brockman said:
Blocking I/O is really easy to use. But when you use it to
write servers, you run into problems: you can't run two blocking
syscalls simultaneously. So if you're writing a huge file to
some guy, every other client is stalled, and no one new can
connect. Unacceptable, for many types of servers. They need
non-blocking I/O.

Non-blocking I/O is a lot more annoying to use. Instead of
going

<snip/>

It'll run in one thread. Multiplexer handles the select(3) calls.

What is the advantage over a solution with threads? IOW, why
should I use multiplexer over individual threads per connection?

Since Ruby's threads aren't native, you can't do I/O from several at
a time.

That's not true.
So one IO#read call blocking for a long time will block the other
threads, too.

Also wrong: execute this script
[snip]

Sorry: faulty generalization. The problem is demonstrated in this
script:

| require 'socket'
|
| server = TCPServer.new 12345
|
| t_a = Thread.start do
| a = server.accept
| data = "foo" * 10000000
| a << data
| a.close
| end
|
| b = server.accept
| b << "b"
| b.close
|
| t_a.join

The second client isn't accepted until the huge batch of data is sent.
I guess you could solve the problem by splitting it into a bunch of
smaller batches.

Another appeal could be that by keeping it single-threaded, you have
fewer concurrency issues to worry about.

The typical pattern for TCPserver looks different: You create a thread per
accepted connection.

require 'socket'

server = TCPServer.new 12345

loop do
Thread.new(server.accept) do |a|
begin
data = "foo" * 10000000
a << data
ensure
a.close
end
end
end

Regards

robert
 
M

Mikael Brockman

Robert Klemme said:
The typical pattern for TCPserver looks different: You create a thread per
accepted connection.

require 'socket'

server = TCPServer.new 12345

loop do
Thread.new(server.accept) do |a|
begin
data = "foo" * 10000000
a << data
ensure
a.close
end
end
end

You're right, but I get the same results. Only one client at a time is
accepted.
 
L

Lloyd Zusman

Robert Klemme said:
[ ... ]

What is the advantage over a solution with threads? IOW, why should I
use multiplexer over individual threads per connection?

The issues raised in subsequent posts illustrate the classic arguments
between one-thread-per-connection proponents and select-loop proponents
that I have heard since the mid 1990's.

Given good thread and select/non-blocking-io implementations, both
methods can solve the same problems and can work just fine.

As for me, I believe that it's good to have both methods to choose from.

The java folks have recently offered a 'select' methodology in addition
to their traditional thread-based approach. Perl has not-so-recently
added thread support on top of its traditional preference for
'select'.

And I'm glad that I also have both options in ruby.
 
B

Bill Kelly

From: "Mikael Brockman said:
You're right, but I get the same results. Only one client at a time is
accepted.

Strange... I'm surprised there are too many differences between
your select multiplexer using continuations and a threads solution,
since: a) ruby uses select() to multiplex behind the scenes when
multiple threads are doing I/O; and b) ruby continuations are
implemented using threads.

That said I haven't studied your continuations solution so maybe
my surprise is misplaced...

But I wrote this simplistic threaded socket IO performance test
last month http://bwk.homeip.net/ftp/ruby/tcptest.rb and I've
watched it accept and handle 100+ clients without delay. Each
client is pushing bytes as fast as it can (at the specified
chunk size), and each server thread handling a client is in
return replying with bytes as fast as it can.

. . Just chiming in in case this is somehow helpful... if not,
please disregard. =D


Regards,

Bill
 
G

Gyoung-Yoon Noh

Until one thread terminates a system call like IO-related task, other
threads will be blocked. Kernel does not know ruby's (userland)
threads, so if your application needs concurrency in massive IO tasks,
maybe you should implement a kind of thread scheduler by yourself. OK,
Kernel#fork will be an another choice.

If 'thread' is only sufficient for all, why did Sun adopt NIO in Java2 1.4?

Run following server example, and execute 'telnet localhost 31337' on
two each shell, no client connection will be blocked.

<code>
require 'multiplexer'

class Test2 < Multiplexer::Handler
def initialize
super()
end

def handle
begin
write_line "ooo" * 10000000
rescue EOFError
puts "My client closed its socket."
end
end
end

def test
multiplexer = Multiplexer::Multiplexer.new 0.5
multiplexer.client_data = []
multiplexer.listen 31337, Test2
puts "Listening on port 31337."
multiplexer.run
end

if $0 == __FILE__
test
end
</code>


IMHO, multiplexinig is a good choice for concurrent programming in
ruby, insofar ruby VM does not support native threads.


Best regards,
 
P

Phil Tomson

Since Ruby's threads aren't native, you can't do I/O from several at a
time. So one IO#read call blocking for a long time will block the other
threads, too. You could loop a read with a time-out, I guess. But with
a single thread running select, the whole process can stall completely
while waiting for I/O. And it's more elegant. :)

But if I'm not mistaken, isn't callcc implemented with threads in Ruby?

Phil
 
T

Tanaka Akira

Bill Kelly said:
Strange... I'm surprised there are too many differences between
your select multiplexer using continuations and a threads solution,
since: a) ruby uses select() to multiplex behind the scenes when
multiple threads are doing I/O; and b) ruby continuations are
implemented using threads.

Because write(2) system call blocks until whole data is written unless
nonblocking flag is set, even after select notify the fd is writable.

The multiplexer works well because it sets nonblocking flag.
The threaded server also works well if it sets nonblocking flag.

Try:

require 'socket'
require 'fcntl'

server = TCPServer.new 12345

t_a = Thread.start do
a = server.accept
a.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
data = "foo" * 10000000
a << data
a.close
end

b = server.accept
b.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
b << "b"
b.close

t_a.join

However nonblocking flag with stdio buffering may cause data lost...
 
M

Mikael Brockman

But if I'm not mistaken, isn't callcc implemented with threads in Ruby?

It is, but I don't think that several ``continuation threads'' run
concurrently. In any case, there is definitely only one thread doing
the I/O.
 
B

Bill Kelly

Hi,

From: "Tanaka Akira said:
Because write(2) system call blocks until whole data is written unless
nonblocking flag is set, even after select notify the fd is writable.

I've always used send/recv for my socket I/O. I *think* I don't
have a problem with blocking, even without NONBLOCK set. (Excepting
rare circumstances as are possible with accept(), etc.)

Does that sound right? Or should I be seeing blocking problems
even using send/recv ?


Thanks,

Regards,

Bill
 
T

Tanaka Akira

Bill Kelly said:
I've always used send/recv for my socket I/O. I *think* I don't
have a problem with blocking, even without NONBLOCK set. (Excepting
rare circumstances as are possible with accept(), etc.)

Does that sound right? Or should I be seeing blocking problems
even using send/recv ?

You'll see blocking problem with send when you send huge data.

Linux man page of send(2) says:

| When the message does not fit into the send buffer of the socket, send
| normally blocks, unless the socket has been placed in non-blocking I/O
| mode. In non-blocking mode it would return EAGAIN in this case. The
| select(2) call may be used to determine when it is possible to send
| more data.

Try:

require 'socket'
require 'fcntl'

server = TCPServer.new 12345

t_a = Thread.start do
a = server.accept
data = "foo" * 10000000
a.send data, 0
a.close
end

b = server.accept
b.send "b", 0
b.close

t_a.join
 
J

James Edward Gray II

You'll see blocking problem with send when you send huge data.

This problem is pretty easy to work around in threaded servers.

Realistically, it's usually one thread passing the data to send into a
thread (or thread pool) that just does the writing, in most production
servers I work with at least. The server I'm building right now uses
thread safe queues to handle this communication. It's trivial to add a
data length check, bust up the data, and queue multiple prints. Just
remember to do that inside a synchronized block, to ensure that all
messages are queued back-to-back.

James Edward Gray II
 
B

Bill Kelly

Hi,

From: "Tanaka Akira said:
You'll see blocking problem with send when you send huge data.

Linux man page of send(2) says:

| When the message does not fit into the send buffer of the socket, send
| normally blocks, unless the socket has been placed in non-blocking I/O
| mode. In non-blocking mode it would return EAGAIN in this case. The
| select(2) call may be used to determine when it is possible to send
| more data.

Thank you for your patience. I've been wrong about that
for years.

Now I'm suddenly extremely interested in the non-blocking
extension for Win32 ... :)


Regards,

Bill
 
R

Robert Klemme

Gyoung-Yoon Noh said:
Until one thread terminates a system call like IO-related task, other
threads will be blocked. Kernel does not know ruby's (userland)
threads, so if your application needs concurrency in massive IO tasks,
maybe you should implement a kind of thread scheduler by yourself. OK,
Kernel#fork will be an another choice.

If 'thread' is only sufficient for all, why did Sun adopt NIO in Java2
1.4?

Because threads have a certain overhead and Sun wanted to provide a means to
write high performance (i.e. highly optimized) servers. Apart from that I
think they wanted to make the IO architecture more flexible and modular.

That said, a solution with threads and blocking IO is still more elegant
IMHO. It's just that for some circumstances this is not the right solution.

Kind regards

robert
 
M

Mikael Brockman

Robert Klemme said:
That said, a solution with threads and blocking IO is still more
elegant IMHO. It's just that for some circumstances this is not the
right solution.

I agree. What I find inelegant is threads with non-blocking I/O and
kludges like splitting the data into several system calls.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top