Why does socket:read() take so long to determine broken connection?

O

Oliver Hittmeyer

hello [NG],

Assuming a client/server app: Client and server are communicating by
sending message objects to each other. When we're going to unplug the
network adapter server-side, the server-app will detect this
automatically & is fine out...

not so the client-app: the application (implemented in blocking mode;
implemented via 2 different threads) is still listening to the socket..

and now there comes the story: when we're going to trigger the client-
app to send a message object over to the server, the socket

1. does not throw any IOException when going to write the data

2. approx. 50sec after the write(), the blocking read() will throw
a SocketException: Connection reset by peer


does anybody know about this? why doesn't there occur an IOException
when going to write to a socket, which does not have a connected peer?

and why is it, that the read() does take 50sec to determine the socket
is dead? - and are there any tips/tricks/work-arounds to get this
"broken network"-detection faster?

thanks in advance
Oliver
 
E

Eric Sosman

Oliver said:
hello [NG],

Assuming a client/server app: Client and server are communicating by
sending message objects to each other. When we're going to unplug the
network adapter server-side, the server-app will detect this
automatically & is fine out...

not so the client-app: the application (implemented in blocking mode;
implemented via 2 different threads) is still listening to the socket..

and now there comes the story: when we're going to trigger the client-
app to send a message object over to the server, the socket

1. does not throw any IOException when going to write the data

2. approx. 50sec after the write(), the blocking read() will throw
a SocketException: Connection reset by peer


does anybody know about this? why doesn't there occur an IOException
when going to write to a socket, which does not have a connected peer?

and why is it, that the read() does take 50sec to determine the socket
is dead? - and are there any tips/tricks/work-arounds to get this
"broken network"-detection faster?

The question really isn't about Java, but about TCP/IP
networking. Here's an over-simplified answer; for more
details consult a networking newsgroup or reference book.

TCP simulates a reliable data stream atop the lower-level
IP facility, which amounts to an unreliable transport for
individual "datagrams." It's a little bit like simulating a
telephone conversation by exchanging postcards, some of which
get lost. Naturally, there are conventions for establishing
and tearing down this virtual phone call: There's a prescribed
sequence of postcard exchanges to initiate the call ("I want to
talk to you." "Okay, I'm willing." "Great: let's begin."),
and another prescribed sequence by which both ends coordinate
a sign-off ("I'm finished." "Okay, so am I.") There's also
a system of acknowledgments to help recover when postcards
get lost or defaced in transit.

But if the other participant in your simulated phone call
gets struck by lightning, how do you find out he's no longer
there? The only symptom is that some of your postcards go
unacknowledged -- but that can happen in the ordinary course
of events, since delivery is not reliable: Maybe your postcard
never reached him so he hasn't acknowledged it, or maybe his
acknowledging postcard never made it back to you. All you know
is that some days have gone by and you haven't received mail.

So you send out duplicates of the unacknowledged postcards,
and wait a while longer. Only after several such attempts do
you finally begin to suspect that your correspondent is no
longer "on the phone," even though you and he haven't agreed
to terminate the call. Note that there has been no "hang up"
signal -- he can only signal you by sending a postcard, and if
he can no longer send you postcards he can't tell you he wants
to hang up. Your postcards just vanish without eliciting any
response, and eventually you figure out he's stopped writing.
The 50-second delay is how long it takes before TCP finally
gives up and declares your correspondent unresponsive (there
are standards governing such timeouts, but a lot of systems are
configured with non-standard values; "50 seconds" is by no
means universal).

One further thing: When you've initiated a virtual phone
call but there's a long interval when neither side has anything
to say, how many postcards do you suppose are exchanged in the
period of silence? That's right: none. So if lightning strikes
your correspondent during such a period, you have no indication
at all of the event -- if you're sending no postcards, you won't
notice the lack of acknowledgments. Until you try to tell him
something you won't learn that he's no longer listening.

I hope this analogy helps you interpret what you've observed.
All I ask is that you not try to design an entire system around
an imperfect analogy! If you need more details, there are lots
of references you can look up for the real nuts and bolts.
 
S

Steve Horsley

Eric Sosman wrote:
<lots>

That was one of the best explanations I have ever read. Nice one!

Doesn't explain "Connection reset by peer" though. I wonder if he
plugged the cable back in before triggering the client to send.
In this case, the server may have already figured out that the
connection was broken, and may be responding with a RST that means
"I don't acknowlege this connection.

Steve
 
T

Tom Dyess

Very nice explaination.

--
Tom Dyess
OraclePower.com
Eric Sosman said:
Oliver said:
hello [NG],

Assuming a client/server app: Client and server are communicating by
sending message objects to each other. When we're going to unplug the
network adapter server-side, the server-app will detect this
automatically & is fine out...

not so the client-app: the application (implemented in blocking mode;
implemented via 2 different threads) is still listening to the socket..

and now there comes the story: when we're going to trigger the client-
app to send a message object over to the server, the socket

1. does not throw any IOException when going to write the data

2. approx. 50sec after the write(), the blocking read() will throw
a SocketException: Connection reset by peer


does anybody know about this? why doesn't there occur an IOException
when going to write to a socket, which does not have a connected peer?

and why is it, that the read() does take 50sec to determine the socket
is dead? - and are there any tips/tricks/work-arounds to get this
"broken network"-detection faster?

The question really isn't about Java, but about TCP/IP
networking. Here's an over-simplified answer; for more
details consult a networking newsgroup or reference book.

TCP simulates a reliable data stream atop the lower-level
IP facility, which amounts to an unreliable transport for
individual "datagrams." It's a little bit like simulating a
telephone conversation by exchanging postcards, some of which
get lost. Naturally, there are conventions for establishing
and tearing down this virtual phone call: There's a prescribed
sequence of postcard exchanges to initiate the call ("I want to
talk to you." "Okay, I'm willing." "Great: let's begin."),
and another prescribed sequence by which both ends coordinate
a sign-off ("I'm finished." "Okay, so am I.") There's also
a system of acknowledgments to help recover when postcards
get lost or defaced in transit.

But if the other participant in your simulated phone call
gets struck by lightning, how do you find out he's no longer
there? The only symptom is that some of your postcards go
unacknowledged -- but that can happen in the ordinary course
of events, since delivery is not reliable: Maybe your postcard
never reached him so he hasn't acknowledged it, or maybe his
acknowledging postcard never made it back to you. All you know
is that some days have gone by and you haven't received mail.

So you send out duplicates of the unacknowledged postcards,
and wait a while longer. Only after several such attempts do
you finally begin to suspect that your correspondent is no
longer "on the phone," even though you and he haven't agreed
to terminate the call. Note that there has been no "hang up"
signal -- he can only signal you by sending a postcard, and if
he can no longer send you postcards he can't tell you he wants
to hang up. Your postcards just vanish without eliciting any
response, and eventually you figure out he's stopped writing.
The 50-second delay is how long it takes before TCP finally
gives up and declares your correspondent unresponsive (there
are standards governing such timeouts, but a lot of systems are
configured with non-standard values; "50 seconds" is by no
means universal).

One further thing: When you've initiated a virtual phone
call but there's a long interval when neither side has anything
to say, how many postcards do you suppose are exchanged in the
period of silence? That's right: none. So if lightning strikes
your correspondent during such a period, you have no indication
at all of the event -- if you're sending no postcards, you won't
notice the lack of acknowledgments. Until you try to tell him
something you won't learn that he's no longer listening.

I hope this analogy helps you interpret what you've observed.
All I ask is that you not try to design an entire system around
an imperfect analogy! If you need more details, there are lots
of references you can look up for the real nuts and bolts.
 
O

Oliver Hittmeyer

Eric said:
The question really isn't about Java, but about TCP/IP
networking. Here's an over-simplified answer; for more
details consult a networking newsgroup or reference book.

:

I've to agree to the other posters - thanks for that nice introduction
to "how tcp works" :)
 
E

Eric Sosman

Oliver said:
Eric Sosman wrote:




I've to agree to the other posters - thanks for that nice introduction
to "how tcp works" :)

Some of those others have already pointed out a few flaws
in my analogy, so be wary of taking it too seriously. (It's
a good thing I called it "imperfect" to begin with!)

The main point is that having a "connection" doesn't mean
you have a "circuit" whose state can be monitored. AFAIK the
only way to tell whether a connection is alive is to attempt
to move some traffic (either "payload" or "control" across it;
in the absence of traffic there's no indication of the status.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top