Reading null terminated strings in Java

Discussion in 'Java' started by markobrien85@gmail.com, Feb 4, 2009.

  1. Guest

    G'day

    I'm attempting to get Java to communicate to an existing application
    using sockets. My first step was getting a simple java client and echo
    server setup as a sort of hello world and introduction into java. Now
    when I modify my java server to simply print out messages received
    from the client (no sending information back out at this stage) I run
    into a problem when I use my non-java client.

    My client program sends out null terminated messages using UTF-8
    encoding. This can not be modified. Java however treats strings as
    classes and from what I could gather stores the length of the string
    in the first two bytes.

    Java blocks and does not return after my call to readUTF() of my
    sockets input steam. My input stream is declared as: is = new
    DataInputStream( new BufferedInputStream( sock.getInputStream() ));
    This lead me to believe its the internal string representation in java
    thats causing me the troubles. I also tried reading in characters one
    at time using is.getChar() but that returned some Asian characters,
    ie the character encoding java was using wasn't using UTF-8.


    A look through the java doc's didn't reveal any obvious ways to parse
    in a null terminated string. Any help very, very, very much
    appericated
     
    , Feb 4, 2009
    #1
    1. Advertising

  2. Sigfried Guest

    a écrit :
    > G'day
    >
    > I'm attempting to get Java to communicate to an existing application
    > using sockets. My first step was getting a simple java client and echo
    > server setup as a sort of hello world and introduction into java. Now
    > when I modify my java server to simply print out messages received
    > from the client (no sending information back out at this stage) I run
    > into a problem when I use my non-java client.
    >
    > My client program sends out null terminated messages using UTF-8
    > encoding. This can not be modified. Java however treats strings as
    > classes and from what I could gather stores the length of the string
    > in the first two bytes.
    >
    > Java blocks and does not return after my call to readUTF() of my
    > sockets input steam. My input stream is declared as: is = new
    > DataInputStream( new BufferedInputStream( sock.getInputStream() ));
    > This lead me to believe its the internal string representation in java
    > thats causing me the troubles. I also tried reading in characters one
    > at time using is.getChar() but that returned some Asian characters,
    > ie the character encoding java was using wasn't using UTF-8.
    >
    >
    > A look through the java doc's didn't reveal any obvious ways to parse
    > in a null terminated string. Any help very, very, very much
    > appericated


    by "null", you must mean "nul", which is a 0 byte in utf-8. When you
    find a 0 byte, the strings is finished. For another value, you may need
    to read another/some others bytes to make a full unicode char (see
    Character methods).
     
    Sigfried, Feb 4, 2009
    #2
    1. Advertising

  3. Lew Guest

    Sigfried wrote:
    > a écrit :
    >> G'day
    >>
    >> I'm attempting to get Java to communicate to an existing application
    >> using sockets. My first step was getting a simple java client and echo
    >> server setup as a sort of hello world and introduction into java. Now
    >> when I modify my java server to simply print out messages received
    >> from the client (no sending information back out at this stage) I run
    >> into a problem when I use my non-java client.
    >>
    >> My client program sends out null terminated messages using UTF-8
    >> encoding. This can not be modified. Java however treats strings as
    >> classes and from what I could gather stores the length of the string
    >> in the first two bytes.
    >>
    >> Java blocks and does not return after my call to readUTF() of my
    >> sockets input steam. My input stream is declared as: is = new
    >> DataInputStream( new BufferedInputStream( sock.getInputStream() ));
    >> This lead me to believe its the internal string representation in java
    >> thats causing me the troubles. I also tried reading in characters one
    >> at time using is.getChar() but that returned some Asian characters,
    >> ie the character encoding java was using wasn't using UTF-8.
    >>
    >>
    >> A look through the java doc's didn't reveal any obvious ways to parse
    >> in a null terminated string. Any help very, very, very much
    >> appericated

    >
    > by "null", you must mean "nul", which is a 0 byte in utf-8. When you
    > find a 0 byte, the strings is finished. For another value, you may need
    > to read another/some others bytes to make a full unicode char (see
    > Character methods).


    And don't use a DataInputStream.
    > A data input stream lets an application read
    > *primitive Java data types* [emph. added]
    > from an underlying input stream in a machine-independent way.
    > An application uses a data output stream to write data that
    > can later be read by a data input stream.


    --
    Lew
     
    Lew, Feb 4, 2009
    #3
  4. Tom Anderson Guest

    On Wed, 4 Feb 2009, wrote:

    > I'm attempting to get Java to communicate to an existing application
    > using sockets. My first step was getting a simple java client and echo
    > server setup as a sort of hello world and introduction into java. Now
    > when I modify my java server to simply print out messages received from
    > the client (no sending information back out at this stage) I run into a
    > problem when I use my non-java client.
    >
    > My client program sends out null terminated messages using UTF-8
    > encoding.


    Okay, first question: what form does the null take? If you're talking
    UTF-8, then you must have unicode characters. There's a NUL character in
    unicode, but there are a couple of ways to encode it in UTF-8. Do you know
    which you're using?

    If the client is in C, my bet would be that it's not really a UTF-8 NUL,
    it's actually just a zero byte. Which is a UTF-8 NUL, but that's not what
    C means by it. :)

    > This can not be modified. Java however treats strings as classes and
    > from what I could gather stores the length of the string in the first
    > two bytes.


    That's not to do with java treating strings as classes, it's to do with
    the way strings are encoded by DataIn/OutputStream. That encoding is
    useful for communicating with other java programs, but not so much
    programs written in other languages.

    > Java blocks and does not return after my call to readUTF() of my sockets
    > input steam. My input stream is declared as: is = new DataInputStream(
    > new BufferedInputStream( sock.getInputStream() )); This lead me to
    > believe its the internal string representation in java thats causing me
    > the troubles.


    Sounds about right.

    > I also tried reading in characters one at time using
    > is.getChar() but that returned some Asian characters, ie the character
    > encoding java was using wasn't using UTF-8.


    Ain't no getChar() on DataInputStream. You probably mean read readChar();
    readChar() doesn't do UTF-8, it just reads a whole 16-bit character from
    the stream, so it's not what you want.

    > A look through the java doc's didn't reveal any obvious ways to parse in
    > a null terminated string.


    Yes, i'm not aware of one.

    Okay, here's what you do. Firstly, if you're reading characters, what you
    want is not a stream but a reader:

    http://java.sun.com/javase/6/docs/api/java/io/Reader.html

    Readers are like streams for characters. You can make one that pulls from
    a socket like this:

    Reader input = new InputStreamReader(sock.getInputStream(), "UTF-8");

    Better yet, a buffered one:

    Reader input = new BufferedReader(new InputStreamReader(sock.getInputStream(), "UTF-8"));

    You'll then have to read the null-terminated strings from it yourself.
    Which is not so hard:

    StringBuilder sb = new StringBuilder();
    while (true) {
    int ch = input.read();
    if (ch == -1) throw new EOFException();
    if (ch == 0) break; // you read a NUL
    sb.append((char)sb);
    }
    String str = sb.toString();

    Done!

    tom

    --
    In other news, has anyone here read Blindness? Does it get better after
    the 30 page mark, is does the whole thing read like a sentimental fairy
    tale for particularly slow children? -- Abigail
     
    Tom Anderson, Feb 4, 2009
    #4
  5. Tom Anderson wrote:

    >
    > If the client is in C, my bet would be that it's not really a UTF-8
    > NUL, it's actually just a zero byte. Which is a UTF-8 NUL, but
    > that's not
    > what C means by it. :)


    Moreover, C sends an octal 0, while Java expects decimal.
     
    Mike Schilling, Feb 4, 2009
    #5
  6. Tom Anderson Guest

    On Wed, 4 Feb 2009, Mike Schilling wrote:

    > Tom Anderson wrote:
    >
    >> If the client is in C, my bet would be that it's not really a UTF-8
    >> NUL, it's actually just a zero byte. Which is a UTF-8 NUL, but that's
    >> not what C means by it. :)

    >
    > Moreover, C sends an octal 0, while Java expects decimal.


    True!

    tom

    --
    In other news, has anyone here read Blindness? Does it get better after
    the 30 page mark, is does the whole thing read like a sentimental fairy
    tale for particularly slow children? -- Abigail
     
    Tom Anderson, Feb 4, 2009
    #6
  7. On Wed, 04 Feb 2009 20:15:55 +0000, Tom Anderson wrote:

    > On Wed, 4 Feb 2009, wrote:
    >
    >> I'm attempting to get Java to communicate to an existing application
    >> using sockets. My first step was getting a simple java client and echo
    >> server setup as a sort of hello world and introduction into java. Now
    >> when I modify my java server to simply print out messages received from
    >> the client (no sending information back out at this stage) I run into a
    >> problem when I use my non-java client.
    >>
    >> My client program sends out null terminated messages using UTF-8
    >> encoding.

    >
    > Okay, first question: what form does the null take? If you're talking
    > UTF-8, then you must have unicode characters. There's a NUL character in
    > unicode, but there are a couple of ways to encode it in UTF-8. Do you
    > know which you're using?
    >
    > If the client is in C, my bet would be that it's not really a UTF-8 NUL,
    > it's actually just a zero byte. Which is a UTF-8 NUL, but that's not
    > what C means by it. :)
    >
    >> This can not be modified. Java however treats strings as classes and
    >> from what I could gather stores the length of the string in the first
    >> two bytes.

    >
    > That's not to do with java treating strings as classes, it's to do with
    > the way strings are encoded by DataIn/OutputStream. That encoding is
    > useful for communicating with other java programs, but not so much
    > programs written in other languages.
    >
    >> Java blocks and does not return after my call to readUTF() of my
    >> sockets input steam. My input stream is declared as: is = new
    >> DataInputStream( new BufferedInputStream( sock.getInputStream() ));
    >> This lead me to believe its the internal string representation in java
    >> thats causing me the troubles.

    >
    > Sounds about right.
    >
    >> I also tried reading in characters one at time using is.getChar() but
    >> that returned some Asian characters, ie the character encoding java was
    >> using wasn't using UTF-8.

    >
    > Ain't no getChar() on DataInputStream. You probably mean read
    > readChar(); readChar() doesn't do UTF-8, it just reads a whole 16-bit
    > character from the stream, so it's not what you want.
    >
    >> A look through the java doc's didn't reveal any obvious ways to parse
    >> in a null terminated string.

    >
    > Yes, i'm not aware of one.
    >
    > Okay, here's what you do. Firstly, if you're reading characters, what
    > you want is not a stream but a reader:
    >
    > http://java.sun.com/javase/6/docs/api/java/io/Reader.html
    >
    > Readers are like streams for characters. You can make one that pulls
    > from a socket like this:
    >
    > Reader input = new InputStreamReader(sock.getInputStream(), "UTF-8");
    >
    > Better yet, a buffered one:
    >
    > Reader input = new BufferedReader(new
    > InputStreamReader(sock.getInputStream(), "UTF-8"));
    >
    > You'll then have to read the null-terminated strings from it yourself.
    > Which is not so hard:
    >
    > StringBuilder sb = new StringBuilder(); while (true) {
    > int ch = input.read();
    > if (ch == -1) throw new EOFException(); if (ch == 0) break; // you
    > read a NUL sb.append((char)sb);
    > }
    > String str = sb.toString();
    >


    I've successfully handled ASCII message-oriented connections between C
    and Java using bare InputStream and OutputStream by reading/writing byte
    arrays:

    byte[] b = new byte[1];
    byte[] bytebuff = new byte[MAXLENGTH];
    boolean done = false;
    byte sep = 0x00;

    for (int i = 0; i < MAXLENGTH && !done && lth >= 0; i++)
    {
    lth = in.read(b);
    if (b[0] != sep)
    {
    bytebuff = b[0];
    n++;
    }
    else
    done = true;
    }

    String s = new String(bytebuff, 0, n);


    where 'in' is a Socket's InputStream. I simplified the code to show the
    principle, not the exception handling etc.

    This approach works well for Java clients talking to C servers and vice
    versa. Seeing that the OP is interesting in sending messages, this
    approach might suit him: the message is returned as a string, ready to be
    parsed with standard String operations.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Feb 4, 2009
    #7
  8. Lew Guest

    Tom Anderson wrote:
    > On Wed, 4 Feb 2009, Mike Schilling wrote:
    >
    >> Tom Anderson wrote:
    >>
    >>> If the client is in C, my bet would be that it's not really a UTF-8
    >>> NUL, it's actually just a zero byte. Which is a UTF-8 NUL, but that's
    >>> not what C means by it. :)

    >>
    >> Moreover, C sends an octal 0, while Java expects decimal.

    >
    > True!


    Very funny, guys.

    --
    Lew
     
    Lew, Feb 5, 2009
    #8
  9. Lew wrote:
    > Tom Anderson wrote:
    >> On Wed, 4 Feb 2009, Mike Schilling wrote:
    >>
    >>> Tom Anderson wrote:
    >>>
    >>>> If the client is in C, my bet would be that it's not really a UTF-8
    >>>> NUL, it's actually just a zero byte. Which is a UTF-8 NUL, but
    >>>> that's not what C means by it. :)
    >>>
    >>> Moreover, C sends an octal 0, while Java expects decimal.

    >>
    >> True!

    >
    > Very funny, guys.


    And completely wrong - Java expects binary 0.

    :)

    Arne
     
    Arne Vajhøj, Feb 5, 2009
    #9
  10. Roedy Green Guest

    On Wed, 4 Feb 2009 05:21:15 -0800 (PST), wrote,
    quoted or indirectly quoted someone who said :

    >
    >My client program sends out null terminated messages using UTF-8
    >encoding. This can not be modified.


    I would do it like this:

    read in a whacking great buffer full.

    in a loop scan the bytes for a null.
    convert the subset of the byte array (start to byte before null) to
    String with UTF-8 decoding.

    Sun's UTF encoding has 16 bit leading count field. If you have that,
    use it with readUTF

    See http://mindprod.com/jgloss/conversion.html#BYTETOSTRING
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    "Here is a point of no return after which warming becomes unstoppable
    and we are probably going to sail right through it.
    It is the point at which anthropogenic (human-caused) warming triggers
    huge releases of carbon dioxide from warming oceans, or similar releases
    of both carbon dioxide and methane from melting permafrost, or both.
    Most climate scientists think that point lies not far beyond 2°C (4°F) C hotter."
    ~ Gwynne Dyer
     
    Roedy Green, Feb 5, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green
    Replies:
    0
    Views:
    461
    Roedy Green
    Jul 9, 2003
  2. Barry

    strncpy() and null terminated strings

    Barry, Apr 8, 2004, in forum: C Programming
    Replies:
    4
    Views:
    1,140
    Malcolm
    Apr 8, 2004
  3. Roy Smith
    Replies:
    2
    Views:
    1,910
    Peter Otten
    Mar 6, 2004
  4. Matt Helm
    Replies:
    5
    Views:
    1,596
    Matt Helm
    Feb 6, 2006
  5. ssylee
    Replies:
    4
    Views:
    505
    CBFalconer
    Aug 12, 2008
Loading...

Share This Page