Receiving data from a C++ server via DataInputStream (TCP)

J

James Vanns

OK. Platforms are the same. Hosts are the same (localhost for
testing).
Server is written in C++, client in Java.

I have come to the end of my tether trying to 'decode' the header of
this bespoke protocol we've (my team) written. We have the client
written in C++ too and everything works hunky dorey (because you can
use points and casts and stuff!).

Anyway, the server sends a structure (struct) as a series of bytes:

typedef struct __header_t__ {
static const char name[5];
static const char split;
static const uint version;
static const char delimiter;
uint command;
static const char terminator;
} __attribute__ ((packed)) header_t;

Thats what the definition is for what its worth. So, thats 8 bytes for
the chars and 8 bytes for the ints (each int being 4 bytes on this
platform).

Now the Java client receives the correct number of bytes but I'll be
buggered if it can interpret them in the same/correct way! I know that
in Java a char is 16-bits as it uses unicode rather than ascii so I
have been using bytes (still 8-bits right?) to try and print the info
out (as a debug statement).

The first 4 bytes (chars in C/C++ case) is a printable string "HELO"
for example. But when I try and print the same on the Java client its
rubbish.

Any help please! In case you haven't noticed - I'm not a Java
programmer.

Regards

Jim Vanns
 
K

kjc

James said:
OK. Platforms are the same. Hosts are the same (localhost for
testing).
Server is written in C++, client in Java.

I have come to the end of my tether trying to 'decode' the header of
this bespoke protocol we've (my team) written. We have the client
written in C++ too and everything works hunky dorey (because you can
use points and casts and stuff!).

Anyway, the server sends a structure (struct) as a series of bytes:

typedef struct __header_t__ {
static const char name[5];
static const char split;
static const uint version;
static const char delimiter;
uint command;
static const char terminator;
} __attribute__ ((packed)) header_t;

Thats what the definition is for what its worth. So, thats 8 bytes for
the chars and 8 bytes for the ints (each int being 4 bytes on this
platform).

Now the Java client receives the correct number of bytes but I'll be
buggered if it can interpret them in the same/correct way! I know that
in Java a char is 16-bits as it uses unicode rather than ascii so I
have been using bytes (still 8-bits right?) to try and print the info
out (as a debug statement).

The first 4 bytes (chars in C/C++ case) is a printable string "HELO"
for example. But when I try and print the same on the Java client its
rubbish.

Any help please! In case you haven't noticed - I'm not a Java
programmer.

Regards

Jim Vanns
Are you using the C++ htonl subroutines, or are you attempting to write
out raw bytes. Java will handle the reordering of the bytes peculiar to
the architecure of the machine its running on. But, you have to write
them out in a network neutral form.
 
D

Daniel Dyer

Now the Java client receives the correct number of bytes but I'll be
buggered if it can interpret them in the same/correct way! I know that
in Java a char is 16-bits as it uses unicode rather than ascii so I
have been using bytes (still 8-bits right?) to try and print the info
out (as a debug statement).

The first 4 bytes (chars in C/C++ case) is a printable string "HELO"
for example. But when I try and print the same on the Java client its
rubbish.

You should probably post the Java code that you are using. Bytes are
8-bit like you say, but they are signed (so -128 to 127 instead of 0 to
255), but that shouldn't be a problem for the characters you are using.
Have you tried printing out the numeric value of each byte and comparing
that to the original? How are you converting the bytes into a Java string?

Dan.
 
D

Daniel Dyer

Are you using the C++ htonl subroutines, or are you attempting to write
out raw bytes. Java will handle the reordering of the bytes peculiar to
the architecure of the machine its running on. But, you have to write
them out in a network neutral form.

The same thought occurred to me, but this would only cause a problem for
the int fields in his struct, there is only one byte per char so no
ordering problems there.

Dan.
 
J

James Vanns

Yes. Sorry forgot to mention that! Yes I am using htonl on the ints. As
I understand it Java works in 'network byte order', or big-endian
anyway.

Cheers,

Jim
 
J

James Vanns

Ironic that it seems to be the bytes/chars I'm having problems with
then! OK. the bit of code I am using for data retrieval/decoding is:

private String readTCPBuffer (DataInputStream input) {
String data = null;

try {
int i = 0;
int size = input.available ();
byte bucket[] = new byte[size];

i = input.read (bucket, 0, size);
data = new String (bucket, 0, i);
} catch (IOException e) {
System.err.println (e);
data = null;
}

return data;
}

Then to test the header:

private class C4DPMessageHeader {
private static final String name = "C4DP";
private static final byte split = '/';
private static final short version = 1;
private static final byte delimiter = ' ';
private short command;
private static final byte terminator = '\n';

public boolean convert (String s) {
try {
byte[] head = s.getBytes ("US-ASCII"); // Make sure we
aren't using UTF-8

for (int i = 0 ; i < name.length () ; i++)
System.out.println (head);

System.out.println ("Received " + s.length () + " bytes.");
} catch (UnsupportedEncodingException e) {
System.err.println (e);
return false;
}

return true;
}
}

The no. of bytes is correct - the byte values are wrong! I should also
mention that although I first said I'm using ints in the field I've
recently changed those to shorts all round - both client and server
side. However, as I said its the bytes I'm having problems with.

Cheers,

Jim
 
R

Rogan Dawes

James said:
OK. Platforms are the same. Hosts are the same (localhost for
testing).
Server is written in C++, client in Java.

I have come to the end of my tether trying to 'decode' the header of
this bespoke protocol we've (my team) written. We have the client
written in C++ too and everything works hunky dorey (because you can
use points and casts and stuff!).

Anyway, the server sends a structure (struct) as a series of bytes:

typedef struct __header_t__ {
static const char name[5];
static const char split;
static const uint version;
static const char delimiter;
uint command;
static const char terminator;
} __attribute__ ((packed)) header_t;

Thats what the definition is for what its worth. So, thats 8 bytes for
the chars and 8 bytes for the ints (each int being 4 bytes on this
platform).

Now the Java client receives the correct number of bytes but I'll be
buggered if it can interpret them in the same/correct way! I know that
in Java a char is 16-bits as it uses unicode rather than ascii so I
have been using bytes (still 8-bits right?) to try and print the info
out (as a debug statement).

The first 4 bytes (chars in C/C++ case) is a printable string "HELO"
for example. But when I try and print the same on the Java client its
rubbish.

Any help please! In case you haven't noticed - I'm not a Java
programmer.

Regards

Jim Vanns

My first suggestion would be to NOT use a DataInputStream, but rather a
plain InputStream. I'm thinking that the DataInputStream may be doing
some marshalling of the data, which you don't really want it to.

Try something like:

InputStream is = socket.getInputStream();
byte[] buff = new byte[1024];
int got = is.read(buff);
for (int i=0; i<got; i++) {
System.out.println("Byte["+i+"] is '" + buff + "'");
}

and compare that with what you get using the DataInputStream.

Rogan
 
D

Daniel Dyer

Ironic that it seems to be the bytes/chars I'm having problems with
then! OK. the bit of code I am using for data retrieval/decoding is:

private String readTCPBuffer (DataInputStream input) {
String data = null;

try {
int i = 0;
int size = input.available ();
byte bucket[] = new byte[size];

i = input.read (bucket, 0, size);
data = new String (bucket, 0, i);
} catch (IOException e) {
System.err.println (e);
data = null;
}

return data;
}

I'm a bit confused by this method. Are you trying to read the whole
header (chars and int values) into a single String? That doesn't seem
like a good idea, though I'm not sure why the initial chars wouldn't
display correctly (unless the ints are being converted into control
characters or something to mess things up when you print the string).

I would do something like this (based on the typdef you posted earlier):


try
{
// Assumes socket is where you are getting your data.
// Buffered wrapper is for performance.
DataInputStream dataInput = new DataInputStream(new
BufferedInputStream(socket.getInputStream()));

// Read each part of the header into an appropriate variable.
// ints may cause problems if the values are too big because the Java
type
// is signed. Maybe use longs instead.
String header = new String(dataInput.readBytes(new byte[5]), 0, 5); // 5
being the size of your string.
char split = (char) dataInput.readByte();
int version = dataInput.readInt();
char delimiter = (char) dataInput.readByte();
int command = dataInput.readByte();
char terminator = (char) dataInput.readByte();
}
catch (IOException ex)
{
// An EOFException or similar here means your source data doesn't have
enough bytes.
}

You could maybe wrap all these fields up in a class to represent your
header (as close to a struct as you are going to get in Java). Casting
bytes to chars is making an assumption about the character encoding (so if
this doesn't work, that might be something to investigate).

Dan.
 
D

Daniel Dyer

char split = (char) dataInput.readByte();

Sorry, not thinking straight, you don't need explicit casts from byte to
char.

char split = dataInput.readByte();

Dan.
 
E

Esmond Pitt

Rogan said:
James said:
Anyway, the server sends a structure (struct) as a series of bytes:

typedef struct __header_t__ {
static const char name[5];
static const char split;
static const uint version;
static const char delimiter;
uint command;
static const char terminator;
} __attribute__ ((packed)) header_t;

Never send 'C' types, and *especially* structs, over the wire. You are
up against Unicode issues for chars, byte ordering for shorts, ints, and
longs, and compiler-dependent (and often compiler-option-dependent)
padding rules for structs.

Use htnol() and friends:

A short written by htons() can be read by DataInputStream.readShort().
A long written by htonl() can be read by DataInputStream.readInt().
A byte can be read by DataInputStream.readByte().
Don't try to read UTF unless you are sending it, which you aren't.
My first suggestion would be to NOT use a DataInputStream, but rather a
plain InputStream. I'm thinking that the DataInputStream may be doing
some marshalling of the data, which you don't really want it to.

No, it doesn't do any marshalling, whatever this may mean, it just reads
bytes from the stream and assembles them into higher-order types.
 
T

Thomas Weidenfeller

James said:
OK. Platforms are the same. Hosts are the same (localhost for
testing).
Server is written in C++, client in Java.

I have come to the end of my tether trying to 'decode' the header of
this bespoke protocol we've (my team) written. We have the client
written in C++ too and everything works hunky dorey (because you can
use points and casts and stuff!).

Anyway, the server sends a structure (struct) as a series of bytes:

typedef struct __header_t__ {
static const char name[5];
static const char split;
static const uint version;
static const char delimiter;
uint command;
static const char terminator;
} __attribute__ ((packed)) header_t;

Thats what the definition is for what its worth. So, thats 8 bytes for
the chars and 8 bytes for the ints (each int being 4 bytes on this
platform).

Not, if you C++ compiler does word alignment (most do for good reasons),
then your have a bunch of padding bytes on the wire between the entries
of the struct. You also better send the data in network byte order,
because that's what DataInputStream reads.

As a general suggestion when having problems with a protocol, get a
network sniffer like ethereal, snoop, etc., and check what is on the
wire, bit-for-bit first, instead of guessing what is really going on.

Oh, and since you claim you do C++, you don't need the typedef, unlike
you would in C.
but I'll be
buggered if it can interpret them in the same/correct way! I know that
in Java a char is 16-bits as it uses unicode rather than ascii so I
have been using bytes (still 8-bits right?) to try and print the info
out (as a debug statement).

Show us code. It should contain something like

char c = (char)yourDataInputStream.readUnsignedByte();

for the characters. For the uint (unsigned int) you will have a general
problem in Java, since all integer data types in Java are signed, and
the DataInputStream class doesn't have an equivalent of
readUnsignedByte() for reading signed ints. You will either have to read
signed ints, and carefully convert them to long, or you have to read the
bytes and assemble a long.

/Thomas
 
J

James Vanns

In reply to Esmond' post; perhaps you didn't read my update confirming
that I use all those C functions (of course!).

With regards to struct padding the GNU compiler collection (GCC) offers
this directive:

__attribute__ ((packed)) header_t;

This ensures structs are not padded - therefore making sizeof(mystruct)
the sum of all bytes within the struct only. I realise this is compiler
dependent - but this isn't a problem at the moment and AFAIK has
nothing to do with the problem I am posting about.

Regards

Jim
 
C

Chris Uppal

James said:
typedef struct __header_t__ {
static const char name[5];
static const char split;
static const uint version;
static const char delimiter;
uint command;
static const char terminator;
} __attribute__ ((packed)) header_t;

Unless I've forgotten more C++ than I thought, this struct is likely to be 4
bytes long. You can, and should, verify this by printing out 'sizeof
header_t'.

'static' in a struct means exactly the same as it does in a class (a C++ class
is just the same thing as a C++ struct -- only the default privacy changes), so
the static elements are not members of each individual struct, but are 'shared'
between all of them. So if, as I assume, you are filling an instance of
header_t, and then dumping it to the socket, most of your code will be "filling
in" the shared static elements, and then only 'command' will actually be
written to the network.

However, just fixing the struct definition isn't the right way to fix this. As
others have already said, you should (in your C++ code) write out the data as
bytes, not by just dumping a struct to the network -- otherwise you are making
problems for yourself decoding a format that you don't actually understand.
You are correct to use the ((packed)) modifier -- at least you are if that
non-standard bit of compiler-dependent syntax does what I guess it does. But
does it ? Also, even if it does, what happens about rounding the /total/ size
of the struct ? Does it get rounded up at all ? Rounded up to a multiple of 2
? Rounded up to a multiple of 4 ? Rounded up to a multiple of 8 ? I don't
know, and I doubt if you do either (just from reading the code). So you are
unlikely to know what's being written to the network. Fix that /before/ you
try writing a decoder -- surely that's obvious ?

A related point is that if this struct is only being used as a "template" for
the format of data on-the-wire, then that's a bad idea (as I've said); but if
you are /also/ using it in your main program, then by declaring it "packed" you
are ensuring that most of the fields are (or would be, once you've fixed the
'static' problem) very badly aligned. That won't be helping your performace
any.

-- chris
 
D

Daniel Dyer

In reply to Daniel Dyer' port; DataInputStream, according to Sun's Java
API reference, has no method called readBytes - certainly not one that
returns a byte array.


Yes, I'm not sure where that came from, sorry, you should use:

read(byte[] b, int off, int len)

So replace my made up stuff with something like this:

byte[] bytes = new byte[5];
int noOfBytes = dataInput.read(bytes, 0, 5);
// Check number of bytes read here.

Should still work, my point was to make sure you don't read the whole
struct into a String.


Dan.
 
J

James Vanns

In reply to Chris Uppal;

I think the phrase rhymes with "Clucking Bell"!! Damn it, damn it, damn
it! Well, they say you learn something new everyday - and this was it!
My understanding of sizeof, byte order conversions (for networking) ,
struct padding where all fine! However, you hit the nail on smack on
the head with 'static'. A serious misunderstanding on my part of its
use within data structures! Changed that and all is good :) It wasn't
the Java after all! Twas my sucky C++ code :-( Well, at least I now
know - thanks Chris.

Regards

Jim Vanns (slightly embarrassed!)
 
E

Esmond Pitt

James said:
In reply to Esmond' post; perhaps you didn't read my update confirming
that I use all those C functions (of course!).

With regards to struct padding the GNU compiler collection (GCC) offers
this directive:

__attribute__ ((packed)) header_t;

This ensures structs are not padded - therefore making sizeof(mystruct)
the sum of all bytes within the struct only. I realise this is compiler
dependent - but this isn't a problem at the moment and AFAIK has
nothing to do with the problem I am posting about.

OK, Jim, now what are all the 'static' attributes for? The memory for
these is allocated once and outside the boundaries of any instance of
the header. All you are transmitting is the non-static attributes. I
hope that's all you are expecting to receive.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top