Reliability of writeUTF / readUTF

K

korcs

Hi,

I was wondering about the right "safest" usage of the DataInput/
OutputStream functions write/readUTF.

If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.

So if The message is "Santa Claus has a present for you.", how can I
make sure, that as a Client I read the whole message and not only
"Santa Claus has a present"

The server writes the message like:

DataOutputStream os;
String message = "Santa Claus has a present for you"
os.writeUTF(message);
os.flush();


The client reads the message like:

DataInputStream is;
String message = is.readUTF();

Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?

(A method could be:

Server side: send first the byte length of the message as a "short"
then the message itself;

Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)

Does somebody know the official solution for the problem?

Best, korcs
 
G

Gordon Beaton

Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?

(A method could be:

Server side: send first the byte length of the message as a "short"
then the message itself;

Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)

Does somebody know the official solution for the problem?

There are two common solutions, and some variations on those themes.
You already described one, the other is to delimit each message with a
special character (or sequence) that cannot occur within the message
unless escaped. For text, a newline might be a suitable candidate.

/gordon

--
 
A

Andreas Leitgeb

korcs said:
I was wondering about the right "safest" usage of the DataInput/
OutputStream functions write/readUTF.

If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.

One way to make it dance:
send a bytecount in advance

When the reader will measure
the number of bytes for pleasure
it shall never fall short,
unless on some network-abort.

Damn, I should have read the whole
posting, since obviously you knew that all...
Server side: send first the byte length of the message as a "short"
then the message itself;

I wouldn't send it as a "short",
an "int" might prevent inadvertent abort,
if the message was long and the short wrapped around,
you might cause surprise on reader's ground.
Does somebody know the official solution for the problem?
you can just as well append a \n(ewline),
and have the reader read up to it, fine!
 
M

Matt Humphrey

Gordon Beaton said:
There are two common solutions, and some variations on those themes.
You already described one, the other is to delimit each message with a
special character (or sequence) that cannot occur within the message
unless escaped. For text, a newline might be a suitable candidate.

I'm curious because the OP is using writeUTF / readUTF which I have not
used. The Javadocs say that the encoding includes a 2-byte length field and
readUTF says that it will read that many bytes or throw EOF exception if it
encounteres EOF. This suggests that it will block until it can read fully
and that it won't read additional bytes. I would think that read/write UTF
would properly delimit and reassemble bytes into the original string without
needing an extra length field, markers or so forth. Is that so?

Matt Humphrey http://www.iviz.com/
 
G

Gordon Beaton

I'm curious because the OP is using writeUTF / readUTF which I have not
used.

Me neither...
The Javadocs say that the encoding includes a 2-byte length field
and readUTF says that it will read that many bytes or throw EOF
exception if it encounteres EOF. This suggests that it will block
until it can read fully and that it won't read additional bytes. I
would think that read/write UTF would properly delimit and
reassemble bytes into the original string without needing an extra
length field, markers or so forth. Is that so?

Hmm, could be.

/gordon

--
 
J

Joshua Cranmer

korcs said:
If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.

Well, everything is going to be modulo network considerations, but Java
pieces everything together for you in the end through sockets.
Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?

(A method could be:

Server side: send first the byte length of the message as a "short"
then the message itself;

Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)

Does somebody know the official solution for the problem?

From the Javadocs for DataOutputStream's writeUTF:
First, two bytes are written to the output stream as if by the
writeShort method giving the number of bytes to follow. This value is
the number of bytes actually written out, not the length of the string.
[ ... ]

Java already does the message length processing for you (see the
corresponding documentation in DataInput's readUTF if you don't believe me).
 
E

Esmond Pitt

Roedy said:
Note that creates a rather severe 10,922 limit on the length of the
field.

.... which is not correct. The length word isn't specified as 'signed' in
the Javadoc, you seem to have just made that up. It is unsigned. It is
read with readUnsignedShort() in DataInputStream, and the Javadoc
clearly specifies a maximum length of 65,535 bytes.

Taking the 3-byte encoding into account, that makes 65535 / 3 = 21845
characters. But 3-byte encoding only applies to characters above the
0x07FF codepoint; characters from 0x007F to 0x07FF are encoded as 2
bytes, as are nulls, and the rest below 0x007F as 1 byte.

So a 64k-1 string composed from
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" encodes
to 65537 bytes including the length word.

So the maximum is 65535. Depending on the actual characters being
encoded it may be less, but the minimum 'less' is 65535/3.

And yes it blocks until it has read everything it is looking for or
encountered an exception, including EOFException. It does this with
DataInputStream.readFully(), as you would expect.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top