Reading null terminated strings in Java


M

markobrien85

G'day

I'm attempting to get Java to communicate to an existing application
using sockets. My first step was getting a simple java client and echo
server setup as a sort of hello world and introduction into java. Now
when I modify my java server to simply print out messages received
from the client (no sending information back out at this stage) I run
into a problem when I use my non-java client.

My client program sends out null terminated messages using UTF-8
encoding. This can not be modified. Java however treats strings as
classes and from what I could gather stores the length of the string
in the first two bytes.

Java blocks and does not return after my call to readUTF() of my
sockets input steam. My input stream is declared as: is = new
DataInputStream( new BufferedInputStream( sock.getInputStream() ));
This lead me to believe its the internal string representation in java
thats causing me the troubles. I also tried reading in characters one
at time using is.getChar() but that returned some Asian characters,
ie the character encoding java was using wasn't using UTF-8.


A look through the java doc's didn't reveal any obvious ways to parse
in a null terminated string. Any help very, very, very much
appericated
 
Ad

Advertisements

S

Sigfried

(e-mail address removed) a écrit :
G'day

I'm attempting to get Java to communicate to an existing application
using sockets. My first step was getting a simple java client and echo
server setup as a sort of hello world and introduction into java. Now
when I modify my java server to simply print out messages received
from the client (no sending information back out at this stage) I run
into a problem when I use my non-java client.

My client program sends out null terminated messages using UTF-8
encoding. This can not be modified. Java however treats strings as
classes and from what I could gather stores the length of the string
in the first two bytes.

Java blocks and does not return after my call to readUTF() of my
sockets input steam. My input stream is declared as: is = new
DataInputStream( new BufferedInputStream( sock.getInputStream() ));
This lead me to believe its the internal string representation in java
thats causing me the troubles. I also tried reading in characters one
at time using is.getChar() but that returned some Asian characters,
ie the character encoding java was using wasn't using UTF-8.


A look through the java doc's didn't reveal any obvious ways to parse
in a null terminated string. Any help very, very, very much
appericated

by "null", you must mean "nul", which is a 0 byte in utf-8. When you
find a 0 byte, the strings is finished. For another value, you may need
to read another/some others bytes to make a full unicode char (see
Character methods).
 
L

Lew

Sigfried said:
(e-mail address removed) a écrit :

by "null", you must mean "nul", which is a 0 byte in utf-8. When you
find a 0 byte, the strings is finished. For another value, you may need
to read another/some others bytes to make a full unicode char (see
Character methods).

And don't use a DataInputStream.
A data input stream lets an application read
*primitive Java data types* [emph. added]
from an underlying input stream in a machine-independent way.
An application uses a data output stream to write data that
can later be read by a data input stream.
 
T

Tom Anderson

I'm attempting to get Java to communicate to an existing application
using sockets. My first step was getting a simple java client and echo
server setup as a sort of hello world and introduction into java. Now
when I modify my java server to simply print out messages received from
the client (no sending information back out at this stage) I run into a
problem when I use my non-java client.

My client program sends out null terminated messages using UTF-8
encoding.

Okay, first question: what form does the null take? If you're talking
UTF-8, then you must have unicode characters. There's a NUL character in
unicode, but there are a couple of ways to encode it in UTF-8. Do you know
which you're using?

If the client is in C, my bet would be that it's not really a UTF-8 NUL,
it's actually just a zero byte. Which is a UTF-8 NUL, but that's not what
C means by it. :)
This can not be modified. Java however treats strings as classes and
from what I could gather stores the length of the string in the first
two bytes.

That's not to do with java treating strings as classes, it's to do with
the way strings are encoded by DataIn/OutputStream. That encoding is
useful for communicating with other java programs, but not so much
programs written in other languages.
Java blocks and does not return after my call to readUTF() of my sockets
input steam. My input stream is declared as: is = new DataInputStream(
new BufferedInputStream( sock.getInputStream() )); This lead me to
believe its the internal string representation in java thats causing me
the troubles.

Sounds about right.
I also tried reading in characters one at time using
is.getChar() but that returned some Asian characters, ie the character
encoding java was using wasn't using UTF-8.

Ain't no getChar() on DataInputStream. You probably mean read readChar();
readChar() doesn't do UTF-8, it just reads a whole 16-bit character from
the stream, so it's not what you want.
A look through the java doc's didn't reveal any obvious ways to parse in
a null terminated string.

Yes, i'm not aware of one.

Okay, here's what you do. Firstly, if you're reading characters, what you
want is not a stream but a reader:

http://java.sun.com/javase/6/docs/api/java/io/Reader.html

Readers are like streams for characters. You can make one that pulls from
a socket like this:

Reader input = new InputStreamReader(sock.getInputStream(), "UTF-8");

Better yet, a buffered one:

Reader input = new BufferedReader(new InputStreamReader(sock.getInputStream(), "UTF-8"));

You'll then have to read the null-terminated strings from it yourself.
Which is not so hard:

StringBuilder sb = new StringBuilder();
while (true) {
int ch = input.read();
if (ch == -1) throw new EOFException();
if (ch == 0) break; // you read a NUL
sb.append((char)sb);
}
String str = sb.toString();

Done!

tom
 
M

Mike Schilling

Tom said:
If the client is in C, my bet would be that it's not really a UTF-8
NUL, it's actually just a zero byte. Which is a UTF-8 NUL, but
that's not
what C means by it. :)

Moreover, C sends an octal 0, while Java expects decimal.
 
Ad

Advertisements

M

Martin Gregorie

Okay, first question: what form does the null take? If you're talking
UTF-8, then you must have unicode characters. There's a NUL character in
unicode, but there are a couple of ways to encode it in UTF-8. Do you
know which you're using?

If the client is in C, my bet would be that it's not really a UTF-8 NUL,
it's actually just a zero byte. Which is a UTF-8 NUL, but that's not
what C means by it. :)


That's not to do with java treating strings as classes, it's to do with
the way strings are encoded by DataIn/OutputStream. That encoding is
useful for communicating with other java programs, but not so much
programs written in other languages.


Sounds about right.


Ain't no getChar() on DataInputStream. You probably mean read
readChar(); readChar() doesn't do UTF-8, it just reads a whole 16-bit
character from the stream, so it's not what you want.


Yes, i'm not aware of one.

Okay, here's what you do. Firstly, if you're reading characters, what
you want is not a stream but a reader:

http://java.sun.com/javase/6/docs/api/java/io/Reader.html

Readers are like streams for characters. You can make one that pulls
from a socket like this:

Reader input = new InputStreamReader(sock.getInputStream(), "UTF-8");

Better yet, a buffered one:

Reader input = new BufferedReader(new
InputStreamReader(sock.getInputStream(), "UTF-8"));

You'll then have to read the null-terminated strings from it yourself.
Which is not so hard:

StringBuilder sb = new StringBuilder(); while (true) {
int ch = input.read();
if (ch == -1) throw new EOFException(); if (ch == 0) break; // you
read a NUL sb.append((char)sb);
}
String str = sb.toString();

I've successfully handled ASCII message-oriented connections between C
and Java using bare InputStream and OutputStream by reading/writing byte
arrays:

byte[] b = new byte[1];
byte[] bytebuff = new byte[MAXLENGTH];
boolean done = false;
byte sep = 0x00;

for (int i = 0; i < MAXLENGTH && !done && lth >= 0; i++)
{
lth = in.read(b);
if (b[0] != sep)
{
bytebuff = b[0];
n++;
}
else
done = true;
}

String s = new String(bytebuff, 0, n);


where 'in' is a Socket's InputStream. I simplified the code to show the
principle, not the exception handling etc.

This approach works well for Java clients talking to C servers and vice
versa. Seeing that the OP is interesting in sending messages, this
approach might suit him: the message is returned as a string, ready to be
parsed with standard String operations.
 
Ad

Advertisements

R

Roedy Green

My client program sends out null terminated messages using UTF-8
encoding. This can not be modified.

I would do it like this:

read in a whacking great buffer full.

in a loop scan the bytes for a null.
convert the subset of the byte array (start to byte before null) to
String with UTF-8 decoding.

Sun's UTF encoding has 16 bit leading count field. If you have that,
use it with readUTF

See http://mindprod.com/jgloss/conversion.html#BYTETOSTRING
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Here is a point of no return after which warming becomes unstoppable
and we are probably going to sail right through it.
It is the point at which anthropogenic (human-caused) warming triggers
huge releases of carbon dioxide from warming oceans, or similar releases
of both carbon dioxide and methane from melting permafrost, or both.
Most climate scientists think that point lies not far beyond 2°C (4°F) C hotter."
~ Gwynne Dyer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top