TCP server does not detect the client network failure

M

Mariya

Hi,

This is a behavior of the TCP socket in Java I dont' understand:.
Let say we have a simple client/server application running on TWO different
machines.
The client send bytes to the server.
The server recieved the bytes and wait for about X sec.
During this waiting time, we disconnect the server from the network (just by
unplugging the server network cable).
After the waiting time the server is aware of the network failure (a
SocketException is thrown : connection reset, because it is trying to send
the response)
But on the client side it is still stuck on a "rcv =
commandInput.readLine();" statement (see after for the complete code), it
will never be aware of the network failure !!! Even after one hour the
client is still waiting to read something on a closed socket. Is this the
normal behavior ?

In reality we are dealing with an application that use the FTP server to
server mode (we are controlling only the command socket, the data transfer
is made by the servers). We have to transfer huge files, thus we can not set
up a time out. If the last scenario occurs, some of our transfers are stuck
and it is not possible for us to detect the failure. The client will wait
for ever the response.

Does anybody have already deal with that
 
R

Roedy Green

But on the client side it is still stuck on a "rcv =
commandInput.readLine();" statement (see after for the complete code), it
will never be aware of the network failure !!! Even after one hour the
client is still waiting to read something on a closed socket. Is this the
normal behavior ?

That is because there is no virtual connection. The only hint the
client has is that something is amiss is that no packets have arrived,
which it considers normal.

I suggest you send a heartbeat packet every 60 seconds or so if
nothing else has been sent. Then each end on not hearing anything for
say 120 seconds knows there is a problem. I do this in two of my apps
that use a C server with a complex message protocol. When detect a
missing heartbeat (a read timeout), I close the socket and reopen it.

This is not a UDP packet, but packet constructed in the TCP/IP stream,
fitting in with the other messages you send.
 
F

Frank

Hi,

This is a behavior of the TCP socket in Java I dont' understand:.
Let say we have a simple client/server application running on TWO
different
machines.
The client send bytes to the server.
The server recieved the bytes and wait for about X sec.
During this waiting time, we disconnect the server from the network
(just by
unplugging the server network cable).
After the waiting time the server is aware of the network failure (a
SocketException is thrown : connection reset, because it is trying to
send
the response)
But on the client side it is still stuck on a "rcv =
commandInput.readLine();" statement (see after for the complete code), it
will never be aware of the network failure !!! Even after one hour the
client is still waiting to read something on a closed socket. Is this the
normal behavior ?

In reality we are dealing with an application that use the FTP server to
server mode (we are controlling only the command socket, the data
transfer
is made by the servers). We have to transfer huge files, thus we can not
set
up a time out. If the last scenario occurs, some of our transfers are
stuck
and it is not possible for us to detect the failure. The client will wait
for ever the response.

Does anybody have already deal with that

well, Socket.setSoTimeout( int ) can be used on the client end if it has
waited an unreasonable amount of time. But with your comment of server <->
server going on, this is perhaps not what you had in mind.

Now, you can also use Socket.setKeepAlive( true ), which will (or should)
throw an exception if the server doesn't return a keep alive, but this
doesn't give you much control over the keep alive interval.

So, that begs the question, can you change what sort of information is
going over the client <-> server stream? If so, you can make your own
"Keep alive" packets ontop of the socket... like Telnet AYT bytes. Or, set
up the server to have a "watch" process for long jobs, and respond to the
client such as: "20 MB transfered, 35 MB remaining" every 15 seconds or
so... and allow the client to time out if it doesn't recieve a response in
45 seconds.

HTH,

Frank
 
T

Tony Morris

This is a behavior of the TCP socket in Java I dont' understand:.

More specifically, this is the behaviour of TCP that you don't understand
i.e. nothing to do with Java.

Without going into details, I suggest you read up on TCP to learn why a TCP
connection closing is undetectable by the other end. There are various
solutions (workarounds if you will) to solving this problem - the most
common being a ping/pong - that is, a connection close is undetectable until
a read/write is attempted.
 
R

Rene

The server recieved the bytes and wait for about X sec.
During this waiting time, we disconnect the server from the network (just
by unplugging the server network cable).
After the waiting time the server is aware of the network failure (a
SocketException is thrown : connection reset, because it is trying to
send the response)
But on the client side it is still stuck on a "rcv =
commandInput.readLine();" statement (see after for the complete code), it
will never be aware of the network failure !!! Even after one hour the
client is still waiting to read something on a closed socket. Is this the
normal behavior ?

<Warning: Black humor>
Yes, the Telepathy-API is not coming before Java 1.7
</Warning: Black humor>

Without joke: The server could be anywhere as could the client. Assume the
server is in the USA and the client in Europe. How should any of them
detect that a transatlantic sea cable is down and they cannot communicate
any further?

There is no way to get such an information unless you actually try to send
data - which then will fail and you know that there is a problem, at least
with TCP, UDP is more difficult in that even the failure notice may never
reach you or some forms of data transfer over UDP never generate any
response even in the presence of failures.

[..]
Does anybody have already deal with that

Yes, because this is a basic problem of networking, a lot of people dealt
with it for a long time. Generally, if you want to ensure that the
connection is still there, send a NOP (or NOOP in FTP). A No-OP is a
"no-operation", it does nothing except to ensure that the connection is
still alive. You can periodically send them or after no other real protocol
communication took place after some time. Don't send the more than say 1
every 30 seconds or so, otherwise the FTP server may see this as unfriendly
behaviour and close down the connection.

CU

René
 
S

Steve Horsley

Rene said:
Yes, the Telepathy-API is not coming before Java 1.7

Oh dammit! I could really make use of that. I've goot some really
nifty ideas that really do need the extra features.
 
L

Liz

Steve Horsley said:
Oh dammit! I could really make use of that. I've goot some really
nifty ideas that really do need the extra features.

Several helpers suggest sending some sort of message that would
generate a response, a heartbeat type of thing. Can you not in fact
claim that the last message actually sent performs this function
since the user expected a response and didn't get one (so an extra
heartbeat message would not help either.) The complaint is that the
connection hangs. If I run a packet sniffer on my laptop, I see a
bunch of messages going back and forth some of which are generated
by a timeout. Something should be there that will take care of this
problem. I think it is entirely reasonable that the high level
user (the java programmer) should expect to get some message back
or to have an exception. It is similar to using a normal telephone.
You pick up the handset and expect dial tone. If there is no dial
tone, you don't expect the user to understand that the voltage on
his line to the phone company is running at -48 volts and for him
to put a voltmeter on the line to make sure. Here is another comparison.
The user is doing IO. Compare to the user reading a file from disk.
You don't recommend for the user to periodically read a directory
to make sure the path to the disk is working for the file read.

(ps. Some how, I am now expecting a bunch of static over this.)
 
R

Rene

Liz said:
Several helpers suggest sending some sort of message that would
generate a response, a heartbeat type of thing. Can you not in fact

Yes, the only way to know if the connection is still there is trying to
send something over it and if the connection is down, you will get a
message from the network layer telling you so.
claim that the last message actually sent performs this function
since the user expected a response and didn't get one (so an extra
heartbeat message would not help either.) The complaint is that the

No that has nothing to do with my (hopefully everyone got that) funny
comment. If the only way to communicate with your peer is torn down, then
it cannot notify you that it lost the connection - except by using
telepathy. Basically that was my point. Also you can have commands that do
not cause any immediate response, like "DO COMPLEX CALCULATION". Some
protocols always acknowledge anything, some don't. If you know you should
receive an answer immediately but you don't, then you know that somethings
wrong. But if not, well then you know nothing actually.
connection hangs. If I run a packet sniffer on my laptop, I see a
bunch of messages going back and forth some of which are generated
by a timeout. Something should be there that will take care of this

Well that depends on how the connection was set up. If you use the
TCPKeepalive option or do that yourself, then you will see packes going
back and forth. However if you do not do that, which is the normal way,
then you can have an established TCP connection that stays connected for 10
hours without sending a single byte during that time and then continue
later to send some data. A normal TCP connection that is established and is
idle does not send anything at all.

In fact, you can establish a TCP session, which stands idle, disconnect the
cable, wait an hour, put it back in and resume as if there was nothing.
Because for TCP, there actually *was* nothing exceptional, if both sides
didn't try to send data during that time which would have caused a
connection teardown. Note that some OS tear down all connections once you
pull the cable, so in that case just pull the cable at your modem or router
upstream.

There are servers that disconnect after inactivity, or send a
noop/keepalive after some time to see if you're still there, but that is
entirely protocol stuff which you can do if you want.

From the network point of view however, it is a problem that is
undetectable without sending data which is why I wrote the part about
telepathy in the first place.
problem. I think it is entirely reasonable that the high level
user (the java programmer) should expect to get some message back
or to have an exception. It is similar to using a normal telephone.

You get one iff you try to send data after the connectivity is gone.
However if you just listen, how do you expect anything to come through when
the cable has been pulled somewhere and data is no longer able to reach you
? That is the basic problem and the one solution left is telepathy :)
You pick up the handset and expect dial tone. If there is no dial
tone, you don't expect the user to understand that the voltage on
his line to the phone company is running at -48 volts and for him
to put a voltmeter on the line to make sure. Here is another comparison.

Well that analogy is wrong - the dialtone is part of the connection setup
procedure and there you also get a lot of exceptions in TCP if there is a
problem. Look at it this way: You call me or I call you on the phone. We
talk a little. Then after a while we don't have anything more to say
immediately so we both just lay the phone down (but we're not hanging up,
just putting it on the desk) now I walk away, get some coffe, watch a TV
show, whatever. 3 hrs later I come back to the desk. How should I know
wheter you are still on the other side without taking the phone up and
asking you something which you must reply ? How should I have been notified
(by what?) during the time where I was drinking coffe or watching TV that
the connection just died ? With telepathy, this problem is solvable,
without it isn't.

And there is a huge difference between a connection switched network like
the telephony system and a packet-oriented stream protocol like TCP,
anyway.
The user is doing IO. Compare to the user reading a file from disk.
You don't recommend for the user to periodically read a directory
to make sure the path to the disk is working for the file read.

On Windows, if a file is open, it cannot be erased, on Unix however, this
is very possible. The filehandle then remains open and valid, but the
directory entry for the file disappears, so any access from another process
will fail, because the file is simply no longer around, while the one who
still got it open can *still* read from it and doesn't know that the file
has been deleted. The file only gets effectively removed after the last
process closes the file handle, but it cannot be accessed or seen by other
processes starting from the moment it was deleted (actually unlinked from
the directory structure).

But this is again another thing altogether. TCP is not a file, compare it
to your postal service and you are sending (real) packets to your
destination. If your destination dies or moves away and cannot/does not
inform you beforehand, you will only know of that fact after you try to
send a packet and you get it back with a notice that the recipee's deceased
or the address became invalid. But if you just wait to receive any packet -
well you can wait forever, none will ever come from there again. And this
is exacly what happend to the OP and this behaviour, well is not possible
to solve by just waiting for appearing data (packets). You must send
packets either on the application level or at the network level using
TCPKeepAlive or have some sort of timeout mechanism that just closes the
connection when nothing more comes after X seconds.
(ps. Some how, I am now expecting a bunch of static over this.)

Well the question certainly is not "dumb". It may sound strange if you
don't know it and haven't thought it through. I've been working with
networking stuff for many years now and it is a basic problem in that sort
of communication and just not solveable without actually testing the
connection by sending data. Wheter the network infrastructure does that for
you or you need to do it yourself is more or less irrelevant. I knew that
and made a little bit fun with my telepathy sentence (but I also gave the
solution to the problem)

Hopefully this now quite long post makes it clear why this is so. But if
not, just ask.

CU

René
 
S

Steve Horsley

Liz said:
Several helpers suggest sending some sort of message that would
generate a response, a heartbeat type of thing. Can you not in fact
claim that the last message actually sent performs this function
since the user expected a response and didn't get one (so an extra
heartbeat message would not help either.) The complaint is that the
connection hangs. If I run a packet sniffer on my laptop, I see a
bunch of messages going back and forth some of which are generated
by a timeout. Something should be there that will take care of this
problem. I think it is entirely reasonable that the high level
user (the java programmer) should expect to get some message back
or to have an exception. It is similar to using a normal telephone.
You pick up the handset and expect dial tone. If there is no dial
tone, you don't expect the user to understand that the voltage on
his line to the phone company is running at -48 volts and for him
to put a voltmeter on the line to make sure. Here is another comparison.
The user is doing IO. Compare to the user reading a file from disk.
You don't recommend for the user to periodically read a directory
to make sure the path to the disk is working for the file read.

(ps. Some how, I am now expecting a bunch of static over this.)

It's layers on layers. Forget the application behaviour for a moment
and think what is required of TCP (none of the layers below TCP do
anything to check that sent data actually arrives).

TCP has the job of getting data from A to B reliably. To achieve that,
all data sent has a sequence number allocated, and TCP requires that
all data sent must be acknowleged by the reciever. To cope with lost
messages, it implements timers and will eventually retransmit any
data that goes un-acknowleged. After numerous retransmissions without
an ack, it assumes the connection is broken.

But take the position of a TCP connection end that has nothing to send:
it just sits and waits for a message to arrive. If nothing arrives,
so what? Wait some more. There's no timeout here that says "nothing
has arrived, the connection must be broken", there's just infinite
patience.

So the ONLY way you can know a connection is broken (at the TCP layer)
is when sent data goes unacknowleged.

If your application wants to be sure that the other end is still there,
it needs to send a message once in a while. You can either implement
a timer in the application waiting for a response, or you can send a
dummy message that gets ignored by the far end application and let the
TCP layer retransmissions figure out that the connection's broken. An
application that is prepared to sit and wait for a message that never
comes cannot blame TCP for not delivering a message that was never
sent, any more than you can blame the postman for not telling you that
Great Aunt Maude has died and therefore won't be writing to you any
more.

P.S. The keepalive socket option tells TCP to send a byte periodically
in a way invisible to the application.

Steve
 
L

Liz

Hay, very informative, thanks.

Rene said:
Yes, the only way to know if the connection is still there is trying to
send something over it and if the connection is down, you will get a
message from the network layer telling you so.


No that has nothing to do with my (hopefully everyone got that) funny
comment. If the only way to communicate with your peer is torn down, then
it cannot notify you that it lost the connection - except by using
telepathy. Basically that was my point. Also you can have commands that do
not cause any immediate response, like "DO COMPLEX CALCULATION". Some
protocols always acknowledge anything, some don't. If you know you should
receive an answer immediately but you don't, then you know that somethings
wrong. But if not, well then you know nothing actually.


Well that depends on how the connection was set up. If you use the
TCPKeepalive option or do that yourself, then you will see packes going
back and forth. However if you do not do that, which is the normal way,
then you can have an established TCP connection that stays connected for 10
hours without sending a single byte during that time and then continue
later to send some data. A normal TCP connection that is established and is
idle does not send anything at all.

In fact, you can establish a TCP session, which stands idle, disconnect the
cable, wait an hour, put it back in and resume as if there was nothing.
Because for TCP, there actually *was* nothing exceptional, if both sides
didn't try to send data during that time which would have caused a
connection teardown. Note that some OS tear down all connections once you
pull the cable, so in that case just pull the cable at your modem or router
upstream.

There are servers that disconnect after inactivity, or send a
noop/keepalive after some time to see if you're still there, but that is
entirely protocol stuff which you can do if you want.

From the network point of view however, it is a problem that is
undetectable without sending data which is why I wrote the part about
telepathy in the first place.


You get one iff you try to send data after the connectivity is gone.
However if you just listen, how do you expect anything to come through when
the cable has been pulled somewhere and data is no longer able to reach you
? That is the basic problem and the one solution left is telepathy :)


Well that analogy is wrong - the dialtone is part of the connection setup
procedure and there you also get a lot of exceptions in TCP if there is a
problem. Look at it this way: You call me or I call you on the phone. We
talk a little. Then after a while we don't have anything more to say
immediately so we both just lay the phone down (but we're not hanging up,
just putting it on the desk) now I walk away, get some coffe, watch a TV
show, whatever. 3 hrs later I come back to the desk. How should I know
wheter you are still on the other side without taking the phone up and
asking you something which you must reply ? How should I have been notified
(by what?) during the time where I was drinking coffe or watching TV that
the connection just died ? With telepathy, this problem is solvable,
without it isn't.

And there is a huge difference between a connection switched network like
the telephony system and a packet-oriented stream protocol like TCP,
anyway.


On Windows, if a file is open, it cannot be erased, on Unix however, this
is very possible. The filehandle then remains open and valid, but the
directory entry for the file disappears, so any access from another process
will fail, because the file is simply no longer around, while the one who
still got it open can *still* read from it and doesn't know that the file
has been deleted. The file only gets effectively removed after the last
process closes the file handle, but it cannot be accessed or seen by other
processes starting from the moment it was deleted (actually unlinked from
the directory structure).

But this is again another thing altogether. TCP is not a file, compare it
to your postal service and you are sending (real) packets to your
destination. If your destination dies or moves away and cannot/does not
inform you beforehand, you will only know of that fact after you try to
send a packet and you get it back with a notice that the recipee's deceased
or the address became invalid. But if you just wait to receive any packet -
well you can wait forever, none will ever come from there again. And this
is exacly what happend to the OP and this behaviour, well is not possible
to solve by just waiting for appearing data (packets). You must send
packets either on the application level or at the network level using
TCPKeepAlive or have some sort of timeout mechanism that just closes the
connection when nothing more comes after X seconds.


Well the question certainly is not "dumb". It may sound strange if you
don't know it and haven't thought it through. I've been working with
networking stuff for many years now and it is a basic problem in that sort
of communication and just not solveable without actually testing the
connection by sending data. Wheter the network infrastructure does that for
you or you need to do it yourself is more or less irrelevant. I knew that
and made a little bit fun with my telepathy sentence (but I also gave the
solution to the problem)

Hopefully this now quite long post makes it clear why this is so. But if
not, just ask.

CU

René
 
L

Liz

Steve Horsley said:
It's layers on layers. Forget the application behaviour for a moment
and think what is required of TCP (none of the layers below TCP do
anything to check that sent data actually arrives).

TCP has the job of getting data from A to B reliably. To achieve that,
all data sent has a sequence number allocated, and TCP requires that
all data sent must be acknowleged by the reciever. To cope with lost
messages, it implements timers and will eventually retransmit any
data that goes un-acknowleged. After numerous retransmissions without
an ack, it assumes the connection is broken.

But take the position of a TCP connection end that has nothing to send:
it just sits and waits for a message to arrive. If nothing arrives,
so what? Wait some more. There's no timeout here that says "nothing
has arrived, the connection must be broken", there's just infinite
patience.

So the ONLY way you can know a connection is broken (at the TCP layer)
is when sent data goes unacknowleged.

If your application wants to be sure that the other end is still there,
it needs to send a message once in a while. You can either implement
a timer in the application waiting for a response, or you can send a
dummy message that gets ignored by the far end application and let the
TCP layer retransmissions figure out that the connection's broken. An
application that is prepared to sit and wait for a message that never
comes cannot blame TCP for not delivering a message that was never
sent, any more than you can blame the postman for not telling you that
Great Aunt Maude has died and therefore won't be writing to you any
more.

P.S. The keepalive socket option tells TCP to send a byte periodically
in a way invisible to the application.

Steve

If the link is disconnected then level 2 fails. But I guess there is no
obligation to notify higher levels.
 
S

Steve Horsley

If the link is disconnected then level 2 fails.

Not necessarily. F'rinstance, Ethernet doesn't normally operate a
layer 2 link protocol - it doesn't maintain a link session between
hosts on the same LAN. Pulling the plug will only cause a layer-1
failure, and even that depends on the type of Ethernet - there is no
link test signal on coaxial Ethernet.
But I guess there is no
obligation to notify higher levels.

Not until a higher layer tries to use the lower layer service.

Steve
 
K

Keith Wansbrough

Steve Horsley said:
P.S. The keepalive socket option tells TCP to send a byte periodically
in a way invisible to the application. Steve

Note that on many (most?) OSes, the default keepalive time is 2 hours
(a probe is sent when the connection is idle for 2 hours). So this is
probably not much use to you. You can change this, but you shouldn't
- first, because it is a per-host parameter not a per-socket
parameter, so it affects all applications on the machine; and second,
because the problem is much better solved at the application layer.

--KW :cool:
 
S

Steve Horsley

Keith said:
Note that on many (most?) OSes, the default keepalive time is 2 hours
(a probe is sent when the connection is idle for 2 hours). So this is
probably not much use to you. You can change this, but you shouldn't
- first, because it is a per-host parameter not a per-socket
parameter, so it affects all applications on the machine; and second,
because the problem is much better solved at the application layer.

Good grief! The boxes I used to work on used a keepalive of 60 seconds.
They were a custom IP stack though.

Two hours! Still, that's enough to kill off all the dead user
sessions eventually, like when they all switch their PCs off at night.
That should stop the box running out of resources, which is what
it is really there for.

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top