Reliability of writeUTF / readUTF

korcs · Nov 27, 2007

Hi,

I was wondering about the right "safest" usage of the DataInput/
OutputStream functions write/readUTF.

If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.

So if The message is "Santa Claus has a present for you.", how can I
make sure, that as a Client I read the whole message and not only
"Santa Claus has a present"

The server writes the message like:

DataOutputStream os;
String message = "Santa Claus has a present for you"
os.writeUTF(message);
os.flush();

The client reads the message like:

DataInputStream is;
String message = is.readUTF();

Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?

(A method could be:

Server side: send first the byte length of the message as a "short"
then the message itself;

Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)

Does somebody know the official solution for the problem?

Best, korcs

Gordon Beaton · Nov 27, 2007

Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?

(A method could be:

Server side: send first the byte length of the message as a "short"
then the message itself;

Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)

Does somebody know the official solution for the problem?

There are two common solutions, and some variations on those themes.
You already described one, the other is to delimit each message with a
special character (or sequence) that cannot occur within the message
unless escaped. For text, a newline might be a suitable candidate.

/gordon

--

Andreas Leitgeb · Nov 27, 2007

korcs said:
I was wondering about the right "safest" usage of the DataInput/
OutputStream functions write/readUTF.

If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.

One way to make it dance:
send a bytecount in advance

When the reader will measure
the number of bytes for pleasure
it shall never fall short,
unless on some network-abort.

Damn, I should have read the whole
posting, since obviously you knew that all...

Server side: send first the byte length of the message as a "short"
then the message itself;

I wouldn't send it as a "short",
an "int" might prevent inadvertent abort,
if the message was long and the short wrapped around,
you might cause surprise on reader's ground.

Does somebody know the official solution for the problem?

you can just as well append a \n(ewline),
and have the reader read up to it, fine!

Matt Humphrey · Nov 27, 2007

Gordon Beaton said:
There are two common solutions, and some variations on those themes.
You already described one, the other is to delimit each message with a
special character (or sequence) that cannot occur within the message
unless escaped. For text, a newline might be a suitable candidate.

I'm curious because the OP is using writeUTF / readUTF which I have not
used. The Javadocs say that the encoding includes a 2-byte length field and
readUTF says that it will read that many bytes or throw EOF exception if it
encounteres EOF. This suggests that it will block until it can read fully
and that it won't read additional bytes. I would think that read/write UTF
would properly delimit and reassemble bytes into the original string without
needing an extra length field, markers or so forth. Is that so?

Matt Humphrey http://www.iviz.com/

Gordon Beaton · Nov 27, 2007

I'm curious because the OP is using writeUTF / readUTF which I have not
used.

Me neither...

The Javadocs say that the encoding includes a 2-byte length field
and readUTF says that it will read that many bytes or throw EOF
exception if it encounteres EOF. This suggests that it will block
until it can read fully and that it won't read additional bytes. I
would think that read/write UTF would properly delimit and
reassemble bytes into the original string without needing an extra
length field, markers or so forth. Is that so?

Hmm, could be.

/gordon

--

Joshua Cranmer · Nov 27, 2007

korcs said:
If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.

Well, everything is going to be modulo network considerations, but Java
pieces everything together for you in the end through sockets.

Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?

(A method could be:

Server side: send first the byte length of the message as a "short"
then the message itself;

Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)

Does somebody know the official solution for the problem?

From the Javadocs for DataOutputStream's writeUTF:
First, two bytes are written to the output stream as if by the
writeShort method giving the number of bytes to follow. This value is
the number of bytes actually written out, not the length of the string.
[ ... ]

Java already does the message length processing for you (see the
corresponding documentation in DataInput's readUTF if you don't believe me).

Roedy Green · Nov 29, 2007

String message = is.readUTF();

I suggest you look at the source code for readUTF in src.zip I would
be very surprised if it did not block until it had all the characters
promised in the lead 2-byte count field.

Note that creates a rather severe 10,922 limit on the length of the
field.

See http://65.110.21.43/jgloss/utf.html#WRITEUTF
for details.

Esmond Pitt · Nov 29, 2007

Roedy said:
Note that creates a rather severe 10,922 limit on the length of the
field.

See http://65.110.21.43/jgloss/utf.html#WRITEUTF

.... which is not correct. The length word isn't specified as 'signed' in
the Javadoc, you seem to have just made that up. It is unsigned. It is
read with readUnsignedShort() in DataInputStream, and the Javadoc
clearly specifies a maximum length of 65,535 bytes.

Taking the 3-byte encoding into account, that makes 65535 / 3 = 21845
characters. But 3-byte encoding only applies to characters above the
0x07FF codepoint; characters from 0x007F to 0x07FF are encoded as 2
bytes, as are nulls, and the rest below 0x007F as 1 byte.

So a 64k-1 string composed from
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" encodes
to 65537 bytes including the length word.

So the maximum is 65535. Depending on the actual characters being
encoded it may be less, but the minimum 'less' is 65535/3.

And yes it blocks until it has read everything it is looking for or
encountered an exception, including EOFException. It does this with
DataInputStream.readFully(), as you would expect.

Error with server	3	Nov 20, 2022
I'm tempted to quit out of frustration	1	Aug 13, 2023
Partially completed coding of loop script, need help finishing.	0	Oct 7, 2022
Python client/server that reads HTML body from server	1	Apr 12, 2023
Decoding no of ways and printing each decode message	2	Jun 1, 2021
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
What's wrong with my use of DataInputStream.readUTF() ??	2	Mar 23, 2006
DataOutputStream/DataInputStream	2	Aug 10, 2004

Reliability of writeUTF / readUTF

korcs

Gordon Beaton

Andreas Leitgeb

Matt Humphrey

Gordon Beaton

Joshua Cranmer

Roedy Green

Esmond Pitt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads