Thinking about Threaded IO

  • Thread starter James Edward Gray II
  • Start date
J

James Edward Gray II

I've not used Ruby's threads before, so I have what will probably be
some basic questions. I'm pretty familiar with using native thread
systems, but these "in processor" threads raise some questions for me.

I'm especially wondering about IO. I've read that Ruby's threads can
be dangerous when making possibly lengthy calls to the operating
system. How does that affect threaded servers in Ruby? If you call
gets() on a socket, does the program hang until that socket produces
input containing a \n character? If so, what's the best solution? Use
non-blocking IO techniques?

I guess that's a pretty specific example and I am interested in the
affects of this type of threading on networking code, but let me ask
something more general. When should I be stopping to worry if this
action I'm threading will stall the whole program? Where is this
generally a problem?

Thanks.

James Edward Gray II
 
D

David G. Andersen

I've not used Ruby's threads before, so I have what will probably be
some basic questions. I'm pretty familiar with using native thread
systems, but these "in processor" threads raise some questions for me.

I'm especially wondering about IO. I've read that Ruby's threads can
be dangerous when making possibly lengthy calls to the operating
system. How does that affect threaded servers in Ruby? If you call
gets() on a socket, does the program hang until that socket produces
input containing a \n character? If so, what's the best solution? Use
non-blocking IO techniques?

I guess that's a pretty specific example and I am interested in the
affects of this type of threading on networking code, but let me ask
something more general. When should I be stopping to worry if this
action I'm threading will stall the whole program? Where is this
generally a problem?

Ruby's threads seem "generally pretty good" about not blocking
on IO calls _if_ you do them right. An example from a program
I was working on:

mybuf.sbuf += s.sysread(65536)
vs
mybuf.sbuf += s.sysread(4096)

The former caused (all of) Ruby to block. The latter
was handled properly. I _assume_, but didn't verify,
that this is because Ruby was doing something like

select()
read(foo)

internally, and the read with the huge blocksize scrogged
things by blocking anyway. But I was being stupidly lazy
trying to sysread such a large blocksize anyway ...
is/was this a Ruby bug? Perhaps. But easily worked around.

In _general_, you shouldn't have to use nonblocking IO,
but there are likely operations that Ruby can't make internally
nonblocking. DNS lookups are often a pain to perform
asynchronously unless you explicitly use an async
DNS library, for instance. File operations over
NFS can block and are next to impossible to
deal with without either multiple processes or
kernel-level multithreading (particularly metadata
operations like lookup and open).

-Dave
 
B

Brian Candler

Ruby's threads seem "generally pretty good" about not blocking
on IO calls _if_ you do them right. An example from a program
I was working on:

mybuf.sbuf += s.sysread(65536)
vs
mybuf.sbuf += s.sysread(4096)

The former caused (all of) Ruby to block. The latter
was handled properly. I _assume_, but didn't verify,
that this is because Ruby was doing something like

select()
read(foo)

internally

Essentially that's right. Was there any reason to use 'sysread' rather than
'read'? I think that

mybuf.sbuf += s.read(65536)

probably would have worked as you'd expected.
In _general_, you shouldn't have to use nonblocking IO,
but there are likely operations that Ruby can't make internally
nonblocking. DNS lookups are often a pain to perform
asynchronously unless you explicitly use an async
DNS library, for instance. File operations over
NFS can block and are next to impossible to
deal with without either multiple processes or
kernel-level multithreading (particularly metadata
operations like lookup and open).

All good points. I'd just add that things like external database libraries
(e.g. mysql) tend to block too. I've seen some which don't; the Oracle OCI8
binding for ruby has the ability to be put in a 'nonblocking' mode, but what
it actually does is poll for a result after 1ms, 2ms, 4ms, 8ms...etc !

For such applications, separate processes are often essential. For web
applications I've had a lot of success with fcgi, where a pool of persistent
processes is set up by Apache under mod_fastcgi, and each one only handles a
single request at a time. This means you don't have to worry about thread
safety as well as blocking.

Regards,

Brian.
 
J

James Edward Gray II

Essentially that's right. Was there any reason to use 'sysread' rather
than
'read'? I think that

mybuf.sbuf += s.read(65536)

probably would have worked as you'd expected.

So with a big read, you can still hang waiting for the bytes? Are you
suggesting read()) would have handled this better than sysread()?
Where does that leave gets()?

Thanks.

James Edward Gray II
 
B

Brian Candler

So with a big read, you can still hang waiting for the bytes? Are you
suggesting read()) would have handled this better than sysread()?
Where does that leave gets()?

IO#read and IO#gets work properly; in other words Ruby wraps the calls
appropriately to make sure they never block the interpreter engine.

If you use sysread then you're telling Ruby to bypass what it knows, and
just call the underlying O/S function directly. In that case, you should
know what you are doing before you ask for it!

Checking with the source, IO#sysread checks the FD is ready (essentially
using select()) and then does a single read() operation of the size
requested:

n = fileno(fptr->f);
rb_thread_wait_fd(fileno(fptr->f));
TRAP_BEG;
n = read(fileno(fptr->f), RSTRING(str)->ptr, RSTRING(str)->len);
TRAP_END;

whereas IO#read goes via rb_io_fread, which reads only as much data is
available at a time, appending it to a string. IO#gets goes via appendline
which also checks how much data is available before reading it.

Regards,

Brian.
 
J

James Edward Gray II

IO#read and IO#gets work properly; in other words Ruby wraps the calls
appropriately to make sure they never block the interpreter engine.

Thank you for the excellent information. I must say that this makes me
fear Ruby's threads a lot less. They seem extremely well thought out.

James Edward Gray II
 
D

David G. Andersen

Essentially that's right. Was there any reason to use 'sysread' rather than
'read'? I think that

mybuf.sbuf += s.read(65536)

probably would have worked as you'd expected.

Think so too. I don't remember why I switched it to sysread -
I think I was having problems telling ruby to read as much
as possible from the file descriptor without blocking, and I
didn't want to have to make it explicitly nonblocking and
abandon the happiness of threads.

-dave
 
J

James Edward Gray II

Not sure if any of you use Windows, but it's probably worth pointing
out that on Windows IO#gets *does* block all threads.

E.g. if I run the following:

thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
sleep 10
str = $stdin.gets
sleep 10

It should display the numbers 1 to 10, and then go suspiciously quiet
until you provide the input to gets. Once you do that, you'll get the
numbers 11 to 20.

It's that because you are calling gets() on STDIN here? Would it
behave the same if we were dealing with sockets instead?

James Edward Gray II
 
R

Robert Klemme

Kevin McConnell said:
Not sure if any of you use Windows, but it's probably worth pointing out
that on Windows IO#gets *does* block all threads.

E.g. if I run the following:

thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
sleep 10
str = $stdin.gets
sleep 10

It should display the numbers 1 to 10, and then go suspiciously quiet
until you provide the input to gets. Once you do that, you'll get the
numbers 11 to 20.

That's not true for the cygwin build:

17:12:51 [ruby]: uname -a
CYGWIN_NT-5.0 bond 1.5.10(0.116/4/2) 2004-05-25 22:07 i686 unknown unknown
Cygwin
17:12:54 [ruby]: ruby -v
ruby 1.8.1 (2003-12-25) [i386-cygwin]
17:12:58 [ruby]: cat ioblock.rb
thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
sleep 10
print "prompt> "
print "got: ", $stdin.gets, "\n"
sleep 10
17:13:01 [ruby]: ruby ioblock.rb
1
2
3
4
5
6
7
8
9
10
prompt> 11
12
13
foo
got: foo

14
15
16
ioblock.rb:5:in `sleep': Interrupt from ioblock.rb:5

Kind regards

robert
 
A

Ara.T.Howard

Sorry, I should have pointed that out. I know it works OK on a cygwin
build, just not on a visual studio build following the instructions in the
win32 directory.

is that to say that there is some other way (not the instructions) to build
that does NOT result in blocking?

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top