Reliability of Java, sockets and TCP transmissions

Qu0ll · Oct 5, 2007

I am writing client and server components of an application that communicate
using Socket, ServerSocket and TCP. I would like to know just how reliable
this connection/protocol combination is in terms of transmission errors. So
far I have only been able to run the application where the client and server
are on the same local machine or separated by an intranet/LAN so I have no
results of an internet deployment to report but I have not encountered any
IO errors to this point.

So just how reliable are TCP and Java sockets over the actual internet? I
mean do I need to implement some kind of "advanced" protocol whereby check
sums are transmitted along with packets and the packet retransmitted if the
check sum is invalid or is all this handled by either the Java sockets or
the TCP protocol already?

--
And loving it,

-Q
_________________________________________________
(e-mail address removed)
(Replace the "SixFour" with numbers to email me)

Gordon Beaton · Oct 5, 2007

So just how reliable are TCP and Java sockets over the actual
internet? I mean do I need to implement some kind of "advanced"
protocol whereby check sums are transmitted along with packets and
the packet retransmitted if the check sum is invalid or is all this
handled by either the Java sockets or the TCP protocol already?

"Java sockets" is a misnomer, since sockets are not specific to Java
or any other programming language. Java provides an interface to the
socket mechanism (sometimes called "Berkeley sockets") already
provided by your operating system.

TCP is a reliable protocol. It manages retransmission when data is out
of order, missing or corrupt, and it will break the connection when it
is unable to provide an error-free data stream, but not before making
a number of attempts to correct the situation first.

Your application can safely assume that as long as your connection is
up, your data stream contains exactly the data sent by the remote.

There are however a number of things that your application can do
wrong. The commonest ones to watch out for are thinking that TCP will
respect your message boundaries (it won't), or expecting that you'll
always read as much data as you requested (you won't).

/gordon

--

Nigel Wade · Oct 5, 2007

Qu0ll said:
I am writing client and server components of an application that communicate
using Socket, ServerSocket and TCP. I would like to know just how reliable
this connection/protocol combination is in terms of transmission errors. So
far I have only been able to run the application where the client and server
are on the same local machine or separated by an intranet/LAN so I have no
results of an internet deployment to report but I have not encountered any
IO errors to this point.

So just how reliable are TCP and Java sockets over the actual internet? I
mean do I need to implement some kind of "advanced" protocol whereby check
sums are transmitted along with packets and the packet retransmitted if the
check sum is invalid or is all this handled by either the Java sockets or
the TCP protocol already?

That is handled by TCP/IP. Each IP datagram contains multiple checksums. There
are at least checksums for the IP header and for the TCP segment (header+data).
Other protocols layered on top of TCP/IP may add their own.

Sockets are very reliable. The Internet is built on them.

Qu0ll · Oct 6, 2007

Your application can safely assume that as long as your connection is
up, your data stream contains exactly the data sent by the remote.

This is good to know, except then you say:

There are however a number of things that your application can do
wrong. The commonest ones to watch out for are thinking that TCP will
respect your message boundaries (it won't), or expecting that you'll
always read as much data as you requested (you won't).

Can you please explain what you mean by not respecting message boundaries?
I am sending data in 512 byte "packets", are you saying that this won't be
respected by TCP?

And what do you mean by not reading as much data as requested? When would
this happen and why?

--
And loving it,

-Q
_________________________________________________
(e-mail address removed)
(Replace the "SixFour" with numbers to email me)

Qu0ll · Oct 6, 2007

That is handled by TCP/IP. Each IP datagram contains multiple checksums.
There
are at least checksums for the IP header and for the TCP segment
(header+data).
Other protocols layered on top of TCP/IP may add their own.

Sockets are very reliable. The Internet is built on them.

Good to hear - thanks Nigel.

--
And loving it,

-Q
_________________________________________________
(e-mail address removed)
(Replace the "SixFour" with numbers to email me)

Thomas Schodt · Oct 6, 2007

Qu0ll said:
Can you please explain what you mean by not respecting message
boundaries? I am sending data in 512 byte "packets", are you saying that
this won't be respected by TCP?

And what do you mean by not reading as much data as requested? When
would this happen and why?

This is possibly not entirely correct but illustrates the issue nicely.

You write a bunch of 512 byte "packets".

TCP/IP sticks them all together and transmits when it deems fit - it may
send too-and-a-half of your "packets" in one message.

On the receiving side, when you read the first two times you get the 512
bytes you ask for. When you read the third time you might only get only
half of the 512 bytes you expect - you have to issue another read to
wait for the rest.

Gordon Beaton · Oct 6, 2007

Can you please explain what you mean by not respecting message
boundaries? I am sending data in 512 byte "packets", are you saying
that this won't be respected by TCP?

TCP gives you a data *stream*. So if you send 512 bytes and then
another 512 bytes, the only guarantee is that the 1024 bytes will
arrive in the correct order. There is nothing in the stream to
separate the two messages from each other, that's the responsibility
of your application, for example by terminating each message with a
unique character the application can recognize, or by prefixing each
message with its length.

And what do you mean by not reading as much data as requested? When
would this happen and why?

If you read the API documentation carefully, you'll see that all of
the methods for reading from a stream will only read "up to" the
requested amount.

Even though TCP provides a data stream, lower layers in the network
are packet based and your data will be sent in chunks that bear little
relation to the actual messages you sent. Sometimes two messages might
get sent in one IP packet, or a single message might span more than
one IP packet.

So an application that always requests 512 bytes might get 512 bytes
the first two times but only 436 bytes the third time, because the
remainder of the third message might still in transit, or needed to be
resent. The application needs to check how many bytes were actually
read, and read again until the desired number of bytes are received.

/gordon

--

Qu0ll · Oct 6, 2007

TCP gives you a data *stream*. So if you send 512 bytes and then
another 512 bytes, the only guarantee is that the 1024 bytes will
arrive in the correct order. There is nothing in the stream to
separate the two messages from each other, that's the responsibility
of your application, for example by terminating each message with a
unique character the application can recognize, or by prefixing each
message with its length.

If you read the API documentation carefully, you'll see that all of
the methods for reading from a stream will only read "up to" the
requested amount.

Even though TCP provides a data stream, lower layers in the network
are packet based and your data will be sent in chunks that bear little
relation to the actual messages you sent. Sometimes two messages might
get sent in one IP packet, or a single message might span more than
one IP packet.

So an application that always requests 512 bytes might get 512 bytes
the first two times but only 436 bytes the third time, because the
remainder of the third message might still in transit, or needed to be
resent. The application needs to check how many bytes were actually
read, and read again until the desired number of bytes are received.

OK thanks to yourself and Thomas for the explanation. This explains why
sometimes not all data were being received by the applet. This had lead me
to conclude that TCP/Java/Sockets were unreliable but it appears to be fixed
now by re-reading until the full packet is received.

--
And loving it,

-Q
_________________________________________________
(e-mail address removed)
(Replace the "SixFour" with numbers to email me)

Roedy Green · Oct 6, 2007

So just how reliable are TCP and Java sockets over the actual internet?

If you look at the format of a TCP/IP packet you can get an idea.
see http://mindprod.com/jgloss/tcpip.html

TPC/IP packets have a 16 bit checksum. Any single bit error would give
you a different checksum. A multibit error has an 1 in 2^16 chance of
coming up with the valid checksum.

This is in addition to any packet-level checksums or hardware error
correction transparent to TCP/IP (e.g. error correcting modems).

The backbones now are fibre optic which very rarely get errors. The
problems comes from the rather wretched quality of the copper near
your end.

Perhaps someone knows of a tool to get the stats on the percentage of
packets getting through. The worse that number is the worse your
throughput and the greater your odds of an error sneaking through.

In my personal case they must be very rare. Nearly all my high volume
traffic is in ZIP files which have an additional checksum. I don't
see problems.

"But my tax return has to be correct. I submitted it with an
error-correcting modem."

Roedy Green · Oct 6, 2007

TCP gives you a data *stream*. So if you send 512 bytes and then
another 512 bytes, the only guarantee is that the 1024 bytes will
arrive in the correct order. There is nothing in the stream to
separate the two messages from each other, that's the responsibility
of your application, for example by terminating each message with a
unique character the application can recognize, or by prefixing each
message with its length.

If for some reason a message is garbled (by software or user error),
when you start reading, you won't necessarily pick up at the beginning
of the next message. You may pick up in the middle of the garbled
message, or part way through the next good one. If you want to be
able to recover, you need to reserve some magic start of message
pattern to scan for, that won't occur incidentally in data.

Socket communications generally require you to know the precise length
of things before you put them on the wire.

Karl Uppiano · Oct 6, 2007

Qu0ll said:
I am writing client and server components of an application that
communicate using Socket, ServerSocket and TCP. I would like to know just
how reliable this connection/protocol combination is in terms of
transmission errors. So far I have only been able to run the application
where the client and server are on the same local machine or separated by
an intranet/LAN so I have no results of an internet deployment to report
but I have not encountered any IO errors to this point.

So just how reliable are TCP and Java sockets over the actual internet? I
mean do I need to implement some kind of "advanced" protocol whereby check
sums are transmitted along with packets and the packet retransmitted if
the check sum is invalid or is all this handled by either the Java sockets
or the TCP protocol already?

Most local area networks these days are TCP/IP using Berkley sockets. It is
extremely reliable. Java sockets simply wrap the platform specific
implementation of Berkley sockets.

TCP (Transmission Control Protocol) is responsible for reliability, error
correction and in-order delivery of the data.

IP (Internet Protocol) is responsible for hardware abstraction, data
transfer, routing, etc., but does not guarantee data integrity (except for
the IP headers, without which TCP reliability would be impossible).

For more information, see http://en.wikipedia.org/wiki/Tcp/ip

It is certainly possible to design an application that is unreliable by
using Berkley sockets inappropriately in Java (or any language or platform,
for that matter). It is also possible to design extremely reliable
applications using Berkley sockets, but it requires some understanding of
detecting and recovering from network failures (such as unplugged cables,
switching and router failures, etc.). Fortunately, Java sockets throw
IOExceptions when things like this occur.

One situation that most sockets will not tell you there is a problem is if
(for example) someone disconnects a cable on the *far side* of an Ethernet
switch that you are connecting through. There is no "heartbeat" in the
TCP/IP protocol. So your socket could listen for hours if no one reconnects
the cable. You can program sockets to time out on a read, but it is quite
common for the distant terminal to remain quiet for hours in some cases. The
TELNET protocol provides ways to periodically poll a device for connection
presence. It is more common than not for applications to layer other
protocols on top of TCP/IP to implement application-specific signaling
requirements.

Sockets and TCP Data Segments	6	Nov 10, 2004
non blocking sockets and alternatives	1	Mar 20, 2009
How to Read File a remotely using TCP/ip Sockets in java	3	May 18, 2006
Java socket programming	1	Dec 30, 2017
How to Read File a remotely using TCP/ip Sockets in java	0	May 18, 2006
NIO UDP and TCP	5	Sep 10, 2007
linux TCP socket program for java client to c server	5	Oct 10, 2008
tcp/ip java application	1	Jan 3, 2006

Reliability of Java, sockets and TCP transmissions

Qu0ll

Gordon Beaton

Nigel Wade

Qu0ll

Qu0ll

Thomas Schodt

Gordon Beaton

Qu0ll

Roedy Green

Roedy Green

Karl Uppiano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads