$100 reward for socket disconnection help

D

DennyOR

I'm developing an application that uses java sockets for client-server
communication. Every now and then, open socket connections between the
client and server are lost. This is a client side problem, which may be
related to using a Linksys router. When my client notices that it's not
receiving communications from the server, it begins attempting to create a
new Socket() connection. Attempts to connect with a new Socket() fail
(IOException) for up to 30 minutes until finally a new connection is made,
at which point client-server communication continues normally until the
connection is lost again.

However, if while my client is trying to reconnect to the server, I start up
AOL instant messenger, the next attempt by my client to create a new
Socket() connection to the server always succeeds. (By the way, if instant
messenger is running when my application is running, they both disconnect
from their servers at the same time.)

I will give the $100 reward to anyone who can give an explanation or a
suggestion that would allow me to programmatically reconnect immediately to
the server after a disconnection occurs (without having to start up AOL
instant messenger to grease the way) or otherwise solve this problem.

Denny
 
R

Robert

What is the IOException saying and are you closing the open sockets on
the server side when they die? This is an interesting problem.
 
C

christopher

We always called this behavior "flapping", because pinging through a
router doing this looks like it's "door" is opened, closed, etc . . . I
would suspect hardware.
 
M

Matt Atterbury

DennyOR said:
I'm developing an application that uses java sockets for client-server
communication. Every now and then, open socket connections between the
client and server are lost. This is a client side problem, which may be
related to using a Linksys router. When my client notices that it's not
receiving communications from the server, it begins attempting to create a
new Socket() connection. Attempts to connect with a new Socket() fail
(IOException) for up to 30 minutes until finally a new connection is made,
at which point client-server communication continues normally until the
connection is lost again.

However, if while my client is trying to reconnect to the server, I start up
AOL instant messenger, the next attempt by my client to create a new
Socket() connection to the server always succeeds. (By the way, if instant
messenger is running when my application is running, they both disconnect
from their servers at the same time.)

I will give the $100 reward to anyone who can give an explanation or a
suggestion that would allow me to programmatically reconnect immediately to
the server after a disconnection occurs (without having to start up AOL
instant messenger to grease the way) or otherwise solve this problem.

What role is the router playing? If it's a connection to an ISP (ie. an
ADSL/cable router) it's possible that it has been configured with an
inactivity timer that results in it dropping its internet connection, so
effectively dropping all active sessions. For some reason AOL triggers it
to auto-connect whereas your app doesn't.

Suggestions: 1. Add a hearbeat to your client/server protocol and send one
every few minutes from your client.
2. Investigate if you can set the socket option SO_KEEPALIVE
on your client. This might not work since IIRC it doesn't
handle all the cases an application level heartbeat can,
but it should be easier to try.

m.
 
K

Kevin McMurtrie

DennyOR said:
I'm developing an application that uses java sockets for client-server
communication. Every now and then, open socket connections between the
client and server are lost. This is a client side problem, which may be
related to using a Linksys router. When my client notices that it's not
receiving communications from the server, it begins attempting to create a
new Socket() connection. Attempts to connect with a new Socket() fail
(IOException) for up to 30 minutes until finally a new connection is made,
at which point client-server communication continues normally until the
connection is lost again.

However, if while my client is trying to reconnect to the server, I start up
AOL instant messenger, the next attempt by my client to create a new
Socket() connection to the server always succeeds. (By the way, if instant
messenger is running when my application is running, they both disconnect
from their servers at the same time.)

I will give the $100 reward to anyone who can give an explanation or a
suggestion that would allow me to programmatically reconnect immediately to
the server after a disconnection occurs (without having to start up AOL
instant messenger to grease the way) or otherwise solve this problem.

Denny

Your router, ISP, or computer is terminating long-running connections.
You could try setting the socket option TCP Keep-alive to generate some
traffic but it's really a matter of your hardware sucking.
 
D

DennyOR

Here's more details on my situation:

I'm running Java 1.4.1 as an applet on the client side, under Windows 98,
with a DSL connection through a Linksys Wireless-B Router. In test mode,
after my applet makes a socket connection to my server application, it
listens for intermittent notifications from the server to know that the
client-server connection still exists, and if the notifications stop, it
loops through attempts to make a new connection to the server, and when a
new connection is made it goes back to listening for the intermittent
notifications from the server.

Fairly frequently, about once an hour, the client-server connection is lost.
This is probably due to some issue with the router, because if I remove the
router, client-server disconnects while I run my application are very rare.
(A search on "BEFW11S4", my router model number, and "disconnection" gets a
lot of hits.)

Running AOL Instant Messenger (IM) while running my test application has no
effect on the disconnection rate of my application, but if IM is running, IM
disconnects at the same time my application does ("AOL Instant Messenger
Error: Connection lost. Check your internet connection").

After disconnect my client starts trying to make a new connection with my
server. It always succeeds in reconnecting, sometimes within a minute but
often taking up to 30 minutes. Each time it fails to make a new connection
the IOException message is "java.net.ConnectException: Connection timed out:
connect".

Interestingly, if I re-start IM (click on "Sign On") while my application is
still trying to reconnect, IM is always able to immediately reconnect, and
right after IM reconnects my application always immediately reconnects.

The fact that my application and IM are tightly coupled in both disconnect
and reconnect behavior indicates that my problem has nothing to do with my
server. It's some sort of local client issue.

I don't know exactly what the problem is with my router, but since other
people may be using the same router or another with a similar problem, I'd
like my application to be able to recover gracefully from these disconnects.
A thirty minute wait for reconnection is not graceful, nor is asking my
application users to start up IM anytime my application disconnects.

IM is doing something that my applet isn't doing to reconnect to its server
that seems to break down some barrier in my local client, and once that
barrier is broken then my applet can also reconnect. Perhaps IM is accessing
some system resource that's not available in the java virtual machine, but
more likely I'm overlooking something.

The $100 reward will go to any suggestion that helps me understand and deal
with this problem, including the suggestions that have already been made
when I get a chance to do some more testing. Thanks.

Denny
 
N

Nigel Wade

DennyOR said:
Here's more details on my situation:

I'm running Java 1.4.1 as an applet on the client side, under Windows 98,
with a DSL connection through a Linksys Wireless-B Router. In test mode,
after my applet makes a socket connection to my server application, it
listens for intermittent notifications from the server to know that the
client-server connection still exists, and if the notifications stop, it
loops through attempts to make a new connection to the server, and when a
new connection is made it goes back to listening for the intermittent
notifications from the server.

Fairly frequently, about once an hour, the client-server connection is lost.
This is probably due to some issue with the router, because if I remove the
router, client-server disconnects while I run my application are very rare.
(A search on "BEFW11S4", my router model number, and "disconnection" gets a
lot of hits.)

Running AOL Instant Messenger (IM) while running my test application has no
effect on the disconnection rate of my application, but if IM is running, IM
disconnects at the same time my application does ("AOL Instant Messenger
Error: Connection lost. Check your internet connection").

After disconnect my client starts trying to make a new connection with my
server. It always succeeds in reconnecting, sometimes within a minute but
often taking up to 30 minutes. Each time it fails to make a new connection
the IOException message is "java.net.ConnectException: Connection timed out:
connect".

Interestingly, if I re-start IM (click on "Sign On") while my application is
still trying to reconnect, IM is always able to immediately reconnect, and
right after IM reconnects my application always immediately reconnects.

The fact that my application and IM are tightly coupled in both disconnect
and reconnect behavior indicates that my problem has nothing to do with my
server. It's some sort of local client issue.

I don't know exactly what the problem is with my router, but since other
people may be using the same router or another with a similar problem, I'd
like my application to be able to recover gracefully from these disconnects.
A thirty minute wait for reconnection is not graceful, nor is asking my
application users to start up IM anytime my application disconnects.

IM is doing something that my applet isn't doing to reconnect to its server
that seems to break down some barrier in my local client, and once that
barrier is broken then my applet can also reconnect. Perhaps IM is accessing
some system resource that's not available in the java virtual machine, but
more likely I'm overlooking something.

The $100 reward will go to any suggestion that helps me understand and deal
with this problem, including the suggestions that have already been made
when I get a chance to do some more testing. Thanks.

Denny

At the time your application has lost connection and is attempting to
reconnect, what does the system say about the status of the network?

If the network is not up, and you start AOL IM does it bring up the network?

My guess is that AOL IM is capable of starting the network if it's down.
Your application does not do this while it's attempting to reconnect.
 
M

Michael Rauscher

DennyOR said:
I'm developing an application that uses java sockets for client-server
communication. Every now and then, open socket connections between the
client and server are lost. This is a client side problem, which may be
related to using a Linksys router. When my client notices that it's not
receiving communications from the server, it begins attempting to create a
new Socket() connection. Attempts to connect with a new Socket() fail
(IOException) for up to 30 minutes until finally a new connection is made,
at which point client-server communication continues normally until the
connection is lost again.

Add some extra delay (e.g. 5 seconds) between reconnects.

Bye
Michael
 
A

Alex Buell

Fairly frequently, about once an hour, the client-server connection is
lost. This is probably due to some issue with the router, because if I
remove the router, client-server disconnects while I run my
application are very rare. (A search on "BEFW11S4", my router model
number, and "disconnection" gets a lot of hits.)

Upgrade the firmware on the LinkSys router.
 
L

Lee Ryman

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi Denny,

Could I suggest that you run Ethereal ( http://www.ethereal.com/ ) to
determine if the problem does indeed lies with a bug in the routers NAT
firmware or a misunderstanding of what is occurring in the TCP/IP stack
(Note that Ethereal needs the Pcap packet capture drivers installed,
they are available from the ethereal site).

Although the symptoms point the finger at the router, problems like
these can often originate back to things like connections sitting in the
FIN_WAIT_x states on one of the machines, especially when you mention
things like it rectifying itself after a number of minutes. Im not
saying it is, but when you have a tricky situation like this its often
helpful to definitively isolate the problem. It could indeed be a bug in
the router's NAT firmware that is somehow being jolted back alive by a
UPnP request from the AOL app.

It would be worthwhile running Ethereal on both machines during a
connection attempt, and setting up an appropriate capture filter to only
catch those packets from/to the client and server, and possibly the AOL
connection attempts as well. That way you can get more information about
what is actually occurring on the wire to trigger the reconnection (or
lack-thereof).

Points to consider (not sure how much you understand TCP/IP. Please
ignore me if any of this is blatantly obvious to you)...

*) Is your applet attempting to connect using the same outbound port
each time?

*) When detecting the connection timeout (via the exception), are you
still forcably closing your socket in a "finnally" block to ensure the
TCP stack is left in the right state.

*) I had something else in mind but forgot. Will follow-up when I
remember :)



(BTW, I don't expect any reward for this suggestion, I guess it should
be you would call "common knowledge")


Kind regards,

Lee

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iEYEARECAAYFAkKEfqQACgkQhbcFpQga0LBYkwCgnXN0cOgdZXQd89wivI4TKsno
e5UAniovZEGY7VBiL/IHjBT3U4qui9Ga
=HCan
-----END PGP SIGNATURE-----
 
C

Chris Uppal

Lee said:
Could I suggest that you run Ethereal ( http://www.ethereal.com/ ) to
determine [...]

There's a fair chance that bringing up Ethereal will "fix" the problem in the
same way as running IM does.

There's a similar problem on this Win XP pro laptop, when it comes out of
hibernation, it doesn't necessarily get its network properly reset and will
not --say -- connect to my news-server (I've never worked out exactly what's
going wrong -- any pointers welcome). If I leave it for a while then it starts
working spontaneously, or if I bring up Ethereal then the problem goes away
instantly.

Windows is wonderful....

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top