nio questions

D

Dave Rathnow

We have an application running on W2K that needs to handle hundreds of
incoming TCP connections at a time. These connections will arrive at a
rate of 200 or more at a time, which means they will all be hitting the
target machine at virtually the same time. This number could go much
higher in a production environment. The initial cut of this program
was not well tested and we are currently having scaling problems.

I've been playing with NIO's non-blocking channels to see how Java will
handle this load. I have a test program that is running on a separate
machine and fires off 200+ threads to attempt connections into the
machine at the same time. What I'm seeing are a lot of "connection
refused" errors yet my server application seems to be idling, that is
there is little CPU usage.

We have a customer who is seeing similar behavior from out existing
application, which runs on a dual 2.4 GHz processor machine with 4 GB
or memory. The machine is virtually idle, yet incoming applications
are getting "connection refused" errors.

So this makes me wonder where is the bottleneck. Is it with the JVM?
The OS? The network interface? Is this type of traffic practical for a
java application to use? Could anyone give me some pointers on how I
much find the source of the problem come up with a solution?'

Thanks
Dave.
 
G

Gordon Beaton

If all the connection arrive at the same time, it could be just a
question of how many unaccepted connections can be queued. The
second argument in ServerSocket's constructors.

And if that's the case, then it may help to choose a larger backlog
when creating the ServerSocket (if it's important for the application
to be able to handle peaks such as these).

/gordon
 
T

Thomas Hawtin

Dave said:
I've been playing with NIO's non-blocking channels to see how Java will
handle this load. I have a test program that is running on a separate
machine and fires off 200+ threads to attempt connections into the
machine at the same time. What I'm seeing are a lot of "connection
refused" errors yet my server application seems to be idling, that is
there is little CPU usage.

If all the connection arrive at the same time, it could be just a
question of how many unaccepted connections can be queued. The second
argument in ServerSocket's constructors.

Tom Hawtin
 
D

Dave Rathnow

Thanks for the tip. That seems to have done thet trick, but I have
another problem that is likely caused by something I've missed.

When I accept an incoming connection, I create a SocketChannel,
configure it non-blocking and then register it for OP_READ. I then
enter a loop where I do a select(1000). I assumed that it would pause
for a second or until one or more channels had data waiting. However,
that doesn't seem to be the case. My select returns immediately with a
set of selection keys that says there is data waiting on my channel(s),
but when I issue a read, I get zero length returned. This is cause my
selector thread to chew up all my CPU time and cause problems for other
threads that can't get any CPU time.

I've tried using just "selector.select()" and then calling
selector.wakeup() but my thread sitting in "selector.select()" isn't
waking up.

Have I missed something?

Dave.
 
G

Gordon Beaton

When I accept an incoming connection, I create a SocketChannel,
configure it non-blocking and then register it for OP_READ. I then
enter a loop where I do a select(1000). I assumed that it would pause
for a second or until one or more channels had data waiting. However,
that doesn't seem to be the case. My select returns immediately with a
set of selection keys that says there is data waiting on my channel(s),
but when I issue a read, I get zero length returned. This is cause my
selector thread to chew up all my CPU time and cause problems for other
threads that can't get any CPU time.

Off hand it sounds like you've failed to remove each key from the
selected-keys list as you iterate over it, or that you're failing to
test for isReadable() before actually attempting to read. It's hard to
say without seeing the code.

/gordon
 
D

Dave Rathnow

Okay,

You're partially right. I remove all the keys after iterating over the list
of selected keys but I am checking for is readable. Here's what I'm doing

private void waitForSelector() throws IOException {
if (selector.select(500) > 0) {
for (Iterator iter = selector.selectedKeys().iterator();
iter.hasNext();)
performOperationForSelectionKey((SelectionKey)iter.next());
selector.selectedKeys().clear();
}
}

private void performOperationForSelectionKey(SelectionKey selectionKey)
{
if (selectionKey.isConnectable())

((SocketChannelManager)selectionKey.attachment()).finishConnecting();
if (selectionKey.isReadable())

((SocketChannelManager)selectionKey.attachment()).readIncomingData();
if (selectionKey.isWritable())

((SocketChannelManager)selectionKey.attachment()).writeOutgoingData();
}
 
G

Gordon Beaton

You're partially right. I remove all the keys after iterating over
the list of selected keys but I am checking for is readable. Here's
what I'm doing

[...]

Does calling clear() have the same effect as calling remove() on each
of the elements in the selected-keys set? (I would hope so at any
rate).

Are you by any chance registering OP_WRITE as well? (if so, don't)

Or perhaps you have run into this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4629307

(There are several other selector-related bugs, I just browsed through
them until I found one that seems to fit.)

/gordon
 
D

Dave Rathnow

Yes, clear does have the same effect and this looks like the bug. The
behavior is exactly the same. The problem seems to be happen only
after I blast the application with many connection. I can't reproduce
it with only a few connections.

Thanks for looking this up! Now all I need to do is find a workaround.
:(

Dave
 
D

Dave Rathnow

Hmmm...I just realized this was suppose to have been fixed in 1.4.1,
but that's the version I'm using :(
 
T

Thomas Hawtin

Dave said:
Hmmm...I just realized this was suppose to have been fixed in 1.4.1,
but that's the version I'm using :(

Yeah, NIO had a 'troubled' early history. When did 1.4.2 come out?? The
only sensible reason I can see to stick with 1.4 is to avoid newly
introduced bugs. But to use an early version of an old version is nuts.

Tom Hawtin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top