Java Serializing in C/C++?

James Kanze · Feb 27, 2008

On Feb 26, 9:30 pm, (e-mail address removed) wrote:

On Feb 25, 2:27 pm, k-e-n <[email protected]> wrote:

Click to expand...

[...]

Obviously another thing you can add is the total message
length, again at a fixed location near the beginning of the
message.
Do you always use total message lengths? I'm of the
opinion that if communication is happening behind a single
firewall, the total message lengths can be omitted. That
assumes an organization can trust it's employees. For the
most part I think that is a safe assumption, but there are
some exceptions.

Click to expand...

Interestingly, the major use of total message lengths I've
seen is between processes communicating over a pipe. It's
mainly an optimization measure, but it can make a
significant difference, and it can make the code easier to
write as well.

Click to expand...

I'm not sure why you describe it as an "optimization measure."
It seems to me that not calculating/sending/receiving a total
msg length is simpler if the context permits.

If you're communicating over a pipe, the channel doesn't break
the stream up into messages for you. If you have a fixed length
header with the length, you can read a message in two reads, one
for the header, and one for the rest of the message. Without
the length, you may have to read byte by byte, depending on how
you determine the end of the message. (And you must be able to
determine it, because if you attempt to read beyond the end,
you'll block until the next message.)

What I got out of the thread on clc++m about denial of service
and serialization is that using a total msg length is
important from a security perspective.

I'm not sure why. I can see that someone sending overly long
messages can cause problems, but putting the message length in
the message doesn't prevent that.

You know from the msgid and version number that the message has one
high-level element - a variable length array.

One or more high-level elements. The message might contain
several variable length arrays (and other data as well). (Note
that strings are often transmitted as a variable length array of
char. So any message which contains several strings is likely
to fall into this category.)

And the length of the array is prepended to the array data as
part of the payload.

So if the interface doesn't know about the record structure of
the data (e.g. a pipe or a file), then you have to read through
the length of the first array, then read the array, then read up
through the length of the second array, etc.

I guess there is header-like info embedded in the payload the
way I think about it. It could be made part of a header but
I'm don't think there is anything to be gained from that.

It depends on the overall context. Even if the communications
protocol supports the definition of records, it might be useful
to allow layering; the lower level components don't know the
high level structure, and use the length field. (This is how
TCP works, for example.)

If some messages don't have variable length data there is a
little bit of unnecessary overhead in having a element count.

If all messages have the same fixed size, it's definitely
unnecessary.

Paul M. Dubuc · Feb 27, 2008

James said:
That's one solution. In his case, from what little he's said,
it might be overkill.

What about using ICE (http://www.zeroc.com/ice.html)? I have no hands-on
experience with it, but it looks like something to consider for solving a
problem like this in place of Corba.

coal · Feb 27, 2008

If you're communicating over a pipe, the channel doesn't break
the stream up into messages for you. If you have a fixed length
header with the length, you can read a message in two reads, one
for the header, and one for the rest of the message.
Agreed.

Without
the length, you may have to read byte by byte, depending on how
you determine the end of the message. (And you must be able to
determine it, because if you attempt to read beyond the end,
you'll block until the next message.)

I assume by "read byte by byte" you mean a call to the OS.
I agree that there shouldn't be an attempt to read more than
is in the current msg. I don't think there has to be more
read calls this way, but you have to be prepared for data from
a subsequent msg being available and some of it possibly getting
into your buffer.

I'm not sure why. I can see that someone sending overly long
messages can cause problems, but putting the message length in
the message doesn't prevent that.

A system can be built/configured with a maximum msg length that
it will accept. Then when receiving msgs, the total msg lens can be
checked against that max. I think using that technique helps to
determine if something is fishy, but it may not be perfect.
By having a limit the application can prevent from going way
overboard due to a fake msg. The app could still be deceived, but
whoever is trying to cause problems would have to work harder to
accomplish his goal.

One or more high-level elements. The message might contain
several variable length arrays (and other data as well). (Note
that strings are often transmitted as a variable length array of
char. So any message which contains several strings is likely
to fall into this category.)

The way I would do it, there would be unique msg IDs for msgs
that consist of one string, two strings, or a variable number
of strings:

MsgManager
(string) @MSGID_1
(string, string) @MSGID_2
(vector<string>) @MSGID_3
}

Since the vector<string> msg could handle any of those msgs,
I might get rid of the first two. If the user chooses to use
total msg lengths, the header would consist of a msg ID and
the total msg len. There wouldn't be an element count as part
of the header because the msg ID determines that. MSGID_1 and MSGID_3
each have one high-level element. MSGID_2 has two.

So if the interface doesn't know about the record structure of
the data (e.g. a pipe or a file), then you have to read through
the length of the first array, then read the array, then read up
through the length of the second array, etc.

Yes, but it may be possible to "read ahead" and then just get
data from a buffer.

Brian Wood

Lew · Feb 27, 2008

MsgManager
(string) @MSGID_1
(string, string) @MSGID_2
(vector<string>) @MSGID_3
}

Since the vector<string> msg could handle any of those msgs,
I might get rid of the first two.

Presumably you will not use the java.util.Vector class but something like
java.util.ArrayList or another modern java.util.List implementation in your
Java implementation, correct?

coal · Feb 27, 2008

Presumably you will not use the java.util.Vector class but something like
java.util.ArrayList or another modern java.util.List implementation in your
Java implementation, correct?

I don't know if we will ever have a Java implementation. I'm
interested in providing an alternative to C++ serialization
libraries. If we're successful with that, we might consider
Java support. What little I know about Java though is that
it would be somewhat unfriendly to our approach. C++
permits a class definition to be placed in multiple files
but Java doesn't. What we generate is intended to be placed in
a file separate from hand-written code. I have no desire to
add computer-generated code to a file that contains hand-written
code. So Java is not a very high priority at this time.

Brian Wood

Lew · Feb 27, 2008

I don't know if we will ever have a Java implementation. I'm
interested in providing an alternative to C++ serialization
libraries. If we're successful with that, we might consider
Java support. What little I know about Java though is that
it would be somewhat unfriendly to our approach. C++
permits a class definition to be placed in multiple files
but Java doesn't. What we generate is intended to be placed in
a file separate from hand-written code. I have no desire to
add computer-generated code to a file that contains hand-written
code. So Java is not a very high priority at this time.

Oh. Given the subject heading, the original post saying,

with a Java or Oracle listening for queries

and the cross-post to a Java group, I had thought that it would be relevant,
at least for the part where you "send... it to the
server and unpack the multiple results" on the Java end.

Tim Smith · Feb 28, 2008

My client application is written in C/C++ and runs on Windows.

Then, on the server side I have Linux, with a Java or Oracle listening
for queries.

My problem is the large number of arguments going back and forth. What
I have been doing is to pass all the arguments in a colon-separated
string, or even in several lines (separated by <CR>). It has becoming
increasingly troublesome to keep the client and server programs in
sync.

I keep on hearing about serialization, but have never used it. This
seems to be the problem that serialization is supposed to solve.

I would like to serialize my data in C/C++ before sending it to the
server and unpack the multiple results.

Is there any package out there, written in C/C++ that will prepare the
serialized "packets" and unpack them?

I'm surprised that no one has suggest SOAP yet. SOAP provides a
client/server mechanism for a client to essentially call functions on
the server. The SOAP library handles serializing items from the
language you are writing in to a standard form for transmission over the
wire, and the SOAP library on the other end deserializes to items in the
language the server is written in. Same for things returned from the
server.

The main problem with SOAP is that SOAP in Java is, well, a complete
mess. There are many different implementations, each implementing
different subsets of the standard. They vary in their compatibility
with earlier versions (e.g., wsimport from the current version of Sun's
J2EE tools cannot generate clients from WSDL files from SOAP service
providers made with the prior version of Sun's J2EE tools. This is
annoying). The documentation of all of them is poor. (The various
Apache Java projects are particularly bad in this regard. There are
about a billion Java projects under the Apache umbrella, and the
documentation for each seems to be written with the assumption that you
are intimately familiar for all the others).

However, if, as in your case, you don't have any earlier SOAP clients or
servers you have to be compatible with, then things aren't as bad. I
*think* that if you used whatever it is that comes with the current J2EE
download from Sun on the server end, and gsoap
(<http://www.cs.fsu.edu/~engelen/soap.html>) on the C/C++ side, you'd be
fine.

EJP · Feb 28, 2008

James said:
This is news to all of those of us who do it regularly.

I was referring to Java Serialization, not to CORBA marshalling. The OP
should certainly consider using CORBA, if he has an ORB at the C++ end ...

Tim Smith · Feb 28, 2008

Do you always use total message lengths? I'm of the opinion that if
communication is happening behind a single firewall, the total message
lengths can be omitted. That assumes an organization can trust it's
employees. For the most part I think that is a safe
assumption, but there are some exceptions.

Knowing the message length can help your low level code when messages
don't fit in single network packets. Knowing at the start of the
message that you need to read 267412 bytes to have a complete message
allows for a simple loop:

while I don't have 267412 bytes
read another packet and append to buffer
give a complete message to the high level code

Now the parser at the higher level, which actually splits the message
into its parts and processes them, doesn't have to be written to handle
partial messages.

Without the message length, you'll probably end up using some kind of
incremental parser, which is going to be more complicated.

Having a message length adds a few bytes, but I think it is worth it to
allow a simpler architecture.

James Kanze · Feb 28, 2008

[...]

A system can be built/configured with a maximum msg length that
it will accept. Then when receiving msgs, the total msg lens can be
checked against that max. I think using that technique helps to
determine if something is fishy, but it may not be perfect.

It's a bit of additional redundant data, and so can be used to
verify coherence. But I don't quite see anything special about
it otherwise.

By having a limit the application can prevent from going way
overboard due to a fake msg. The app could still be deceived,
but whoever is trying to cause problems would have to work
harder to accomplish his goal.

I don't quite see where it would make a difference. One way or
another, the application has to know the length of any given
message (supposing a message oriented protocol, of course).
Having provided that, if there is a maximum message length, then
it can (and should) reject any message over that length. I
don't see how reading the length from a header is different in
this regard from reading a message type from a header, and
knowing that that type of message has a length of n. Or even
reading the message byte by byte, and determining the end by
some sort of internal message structure. If you decide that
your code will handle messages of no more than n bytes, then any
time you get a message of more than n bytes, you reject it.

The way I would do it, there would be unique msg IDs for msgs
that consist of one string, two strings, or a variable number
of strings:

MsgManager
(string) @MSGID_1
(string, string) @MSGID_2
(vector<string>) @MSGID_3
}

Since the vector<string> msg could handle any of those msgs, I
might get rid of the first two. If the user chooses to use
total msg lengths, the header would consist of a msg ID and
the total msg len. There wouldn't be an element count as part
of the header because the msg ID determines that. MSGID_1 and
MSGID_3 each have one high-level element. MSGID_2 has two.

But that doesn't change things, really. You now have a message
with a variable number of elements, each element having a
variable length.

Yes, but it may be possible to "read ahead" and then just get
data from a buffer.

Not from a pipe. If you ask to read 1000 bytes, the pipe will
block until there are 1000 bytes in it. If the actual message
was only 999 bytes, you won't return from the read() until the
following message is sent.

EJP · Feb 29, 2008

James said:
Not from a pipe. If you ask to read 1000 bytes, the pipe will
block until there are 1000 bytes in it. If the actual message
was only 999 bytes, you won't return from the read() until the
following message is sent.

If you're talking about java.io.Pipe or a socket this is untrue. It will
block until *something* is available and then read all the data that is
now present. Unless you're calling DataInputStream.readFully().

James Kanze · Feb 29, 2008

If you're talking about java.io.Pipe or a socket this is
untrue. It will block until *something* is available and then
read all the data that is now present. Unless you're calling
DataInputStream.readFully().

There's some confusion here, as this thread is cross posted. I
was talking about reading from a file device created with the
system call pipe(), under Unix. A lot lower level than anything
in Java.

Off hand, I don't know what java.io.Pipe is mapped to. It's
possible to determine the number of bytes available in a pipe
under Unix, and then read them, but it's still not true record
orientation---you may suddenly find two records available,
or---if the records are large enough---just part of a record.
(Unix only guarantees that writes to a pipe will be atomic up to
a certain size.) And of course, doing so requires extra
code---no problem if that code is already there, in some
library, but a bother if you have to write it yourself.

coal · Feb 29, 2008

It's a bit of additional redundant data, and so can be used to
verify coherence. But I don't quite see anything special about
it otherwise.

I don't quite see where it would make a difference. One way or
another, the application has to know the length of any given
message (supposing a message oriented protocol, of course).
Having provided that, if there is a maximum message length, then
it can (and should) reject any message over that length. I
don't see how reading the length from a header is different in
this regard from reading a message type from a header, and
knowing that that type of message has a length of n. Or even
reading the message byte by byte, and determining the end by
some sort of internal message structure. If you decide that
your code will handle messages of no more than n bytes, then any
time you get a message of more than n bytes, you reject it.

When you write "determining the end by some sort of internal
message structure," I think we are on the same page, but I
don't think it has to be read "byte by byte." (Assuming you
mean a system call when you say read.) Instead of having one
buffer used for both output and input, there could be a buffer
dedicated to sending data and another buffer used only to
receive data. If you happen to read more than is needed by
the current msg, the extra data should be used to build the
next msg. This approach requires more memory for the separate
buffers, but it doesn't have to calculate/send/recv the total
msg length for every msg.

But that doesn't change things, really. You now have a message
with a variable number of elements, each element having a
variable length.

Yes, but given the internal msg structure why do you need an element-
count in the msg hdr? It seems like the msg ID wouldn't have much
meaning if an element-count is also needed. Say the msg consists of
a vector with 70 strings in it, would the element-count be 1 or 70?
K-e-n didn't say anything about element-counts but just an element-
count. Are you saying that if the msg consisted of a vector<string>
and a list<int> there would be two element-counts in the header?

Brian Wood
Ebenezer Enterprises
www.webebenezer.net

James Kanze · Mar 1, 2008

[I've taken the comp.lang.java.programmer out of the
cross-posting, because my comments here are really only
relevant to the low level accesses which can only be done in
C++.]

When you write "determining the end by some sort of internal
message structure," I think we are on the same page, but I
don't think it has to be read "byte by byte." (Assuming you
mean a system call when you say read.)

I do, and yes, depending on the structure, you might be able to
read it in some number of blocks, with less reads than bytes.
But for something like a variable length array of variable
length strings, you're going to need at least twice as many
reads as there are strings---in fact, a few more.

Instead of having one buffer used for both output and input,
there could be a buffer dedicated to sending data and another
buffer used only to receive data. If you happen to read more
than is needed by the current msg, the extra data should be
used to build the next msg.

If you're reading from a Unix pipe, you can't attempt a read
unless you know that all of the bytes you attempt to read will
be part of the message. If you try to read 100 bytes from a
Unix pipe, your process will block until it has actually read
100 bytes (or the pipe was closed on the write side, or you get
a signal, or a couple of other things which don't concern us
here). If the message only contains 80 more bytes, then you
will block until another message has been sent.

It's possible to determine how many bytes are in the pipe (using
stat()), but that is, again, another system call, and an
additional complication.

This approach requires more memory for the separate buffers,
but it doesn't have to calculate/send/recv the total msg
length for every msg.

Yes, but given the internal msg structure why do you need an
element- count in the msg hdr? It seems like the msg ID
wouldn't have much meaning if an element-count is also needed.

If the element contains one or more variable length arrays, just
knowing the message id isn't sufficient. (But what I was
talking about putting in the header was the message length. In
bytes. So you read the header, decode the length, and read
that. And have all of the message, and nothing but the message.)

Say the msg consists of a vector with 70 strings in it, would
the element-count be 1 or 70? K-e-n didn't say anything about
element-counts but just an element- count. Are you saying
that if the msg consisted of a vector<string> and a list<int>
there would be two element-counts in the header?

I'm saying that usually, I would expect to see the message
length in the header. Along with the message id. The message
length permits me to read all of the rest of the message in one
system request. The mesage id defines the structure of the
message. Elements in the message which consist of a variable
number of sub-elements will likely be encoded with the number of
sub-elements, possibly recursively, although other schemes are
possible; e.g. an element which is an array will start with the
number of elements in the array, and a string will start with
the number of characters in the string. (XDR would be a simple
example of a binary encoding using this scheme.)

gpderetta · Mar 1, 2008

[...]
If you're reading from a Unix pipe, you can't attempt a read
unless you know that all of the bytes you attempt to read will
be part of the message. If you try to read 100 bytes from a
Unix pipe, your process will block until it has actually read
100 bytes (or the pipe was closed on the write side, or you get
a signal, or a couple of other things which don't concern us
here). If the message only contains 80 more bytes, then you
will block until another message has been sent.

Hum, the POSIX documentation about read says:

"[...]
The value returned may be less than nbyte if the number of bytes left
in the file is less than nbyte, if the read() request was interrupted
by a signal, or if the file is a pipe or FIFO or special file and has
fewer than nbyte bytes immediately available for reading. For example,
a read() from a file associated with a terminal may return one typed
line of data.
"

So it should block only if there were *no* message to read in the
first place.
Asking for more bytes than are really available won't block and simply
will return what is available.

The spec say 'may' and not must, so probably a posix conformant system
may legally block even if there are some bytes available,
but do real systems do that? I guess many applications would break...

I think that the trick described by Brian Wood, that is, keeping
unconsumed read data in a buffer just in case you over read, should
actually work in practice.

Unless I'm missing something of course

coal · Mar 1, 2008

[I've taken the comp.lang.java.programmer out of the
cross-posting, because my comments here are really only
relevant to the low level accesses which can only be done in
C++.]

When you write "determining the end by some sort of internal
message structure," I think we are on the same page, but I
don't think it has to be read "byte by byte." (Assuming you
mean a system call when you say read.)

Click to expand...

I do, and yes, depending on the structure, you might be able to
read it in some number of blocks, with less reads than bytes.
But for something like a variable length array of variable
length strings, you're going to need at least twice as many
reads as there are strings---in fact, a few more.

See below.

If you're reading from a Unix pipe, you can't attempt a read
unless you know that all of the bytes you attempt to read will
be part of the message. If you try to read 100 bytes from a
Unix pipe, your process will block until it has actually read
100 bytes (or the pipe was closed on the write side, or you get
a signal, or a couple of other things which don't concern us
here). If the message only contains 80 more bytes, then you
will block until another message has been sent.

It's possible to determine how many bytes are in the pipe (using
stat()), but that is, again, another system call, and an
additional complication.

Yes, I use ioctl with sockets. It's another system call, but the
complication isn't that much. The Give function in this file:
http://home.seventy7.com/Buffer.hh makes the ioctl call. Sorry
about the goto... I would guess that ioctl is a relatively fast
system call since there isn't data copying required as with read(). I
also doubt it has to updates anything much.) If the network is
running smoothly, there will be plenty of bytes available and the
application can grab a chunk - maybe more than is needed by the
current msg.

If the element contains one or more variable length arrays, just
knowing the message id isn't sufficient. (But what I was
talking about putting in the header was the message length. In
bytes. So you read the header, decode the length, and read
that. And have all of the message, and nothing but the message.)

I think that is important also. On this page:
http://home.seventy7.com/cgi-bin/samb.cgi

The user has the option of using total msg lengths or not.
(If you use the lynx browser, the default will be "Yes" to
use total msg lengths.) However, I think in some contexts,
which I mentioned earlier in the thread, it might make sense
to not calculate and use total msg lengths. I know of one
application that doesn't use total msg lengths and it works
fine without them. But some comparison of throughput of the
two approaches in other applications would be helpful.

I'm saying that usually, I would expect to see the message
length in the header. Along with the message id. The message
length permits me to read all of the rest of the message in one
system request. The mesage id defines the structure of the
message.

We don't have a formal msg header at this point, but the msg id is
the first item and, if used, the msg length will be the second item.
I agree that the msg id defines the structure of the msg. If you
send the Middle code I wrote a couple of posts ago:

MsgManager
(string) @MSGID_1
(string, string) @MSGID_2
(vector<string>) @MSGID_3
}

, the output will include some lines like this:

unsigned int const MSGID_1 = 4201;
unsigned int const MSGID_2 = 4202;
unsigned int const MSGID_3 = 4203;

I can't remember for sure if it starts at 4201 or another value.
Currently it isn't possible to override the start value used for
msg ids. Perhaps it should be possible for users to supply that
value?

Elements in the message which consist of a variable
number of sub-elements will likely be encoded with the number of
sub-elements, possibly recursively, although other schemes are
possible; e.g. an element which is an array will start with the
number of elements in the array, and a string will start with
the number of characters in the string. (XDR would be a simple
example of a binary encoding using this scheme.)

I use the latter where the length of a string precedes the string
data.

Brian Wood

James Kanze · Mar 2, 2008

[...]
If you're reading from a Unix pipe, you can't attempt a read
unless you know that all of the bytes you attempt to read will
be part of the message. If you try to read 100 bytes from a
Unix pipe, your process will block until it has actually read
100 bytes (or the pipe was closed on the write side, or you get
a signal, or a couple of other things which don't concern us
here). If the message only contains 80 more bytes, then you
will block until another message has been sent.

Click to expand...

Hum, the POSIX documentation about read says:

"[...]
The value returned may be less than nbyte if the number of bytes left
in the file is less than nbyte, if the read() request was interrupted
by a signal, or if the file is a pipe or FIFO or special file and has
fewer than nbyte bytes immediately available for reading. For example,
a read() from a file associated with a terminal may return one typed
line of data."

So it should block only if there were *no* message to read in the
first place.

It didn't work in classical Unix

.

Pipes are a fairly complicated under modern Unix, for historical
reasons; most modern Unix implement them using streams (That's
Unix streams, not C++ iostreams, of course.) Which means that an
application *can* write to them using send, and they *can* block
the stream into messages, in which case, they may (and in fact
will) always return at the end of a message. In practice, that
doesn't correspond to the usual use, however. A system might
also treat each individual "write" as a message (Linux seems
to), although this isn't required, and doesn't correspond to the
historical behavior.

Asking for more bytes than are really available won't block
and simply will return what is available.

Provided at least 1 is available, of course.

That's still not really ideal, since it may return more than one
message.

The spec say 'may' and not must, so probably a posix
conformant system may legally block even if there are some
bytes available, but do real systems do that? I guess many
applications would break...

Such as... In classical Unix, a pipe would block, and I imagine
most real systems that use pipes use them in the "classical"
way; otherwise, they'd probably use sockets.

I think that the trick described by Brian Wood, that is,
keeping unconsumed read data in a buffer just in case you over
read, should actually work in practice.

I don't know. I'll admit that I'm basing my statements on
somewhat dated experience -- in the 1980's and early 1990's,
pipes would block in such cases. Since then, the only times
I've used pipes is when one process is streaming data (no real
records) to another. Any time I've needed records, I've used
sockets or named pipes (which are a form of sockets). So maybe
I'm taking avoidance procedures for something that isn't a
problem any more.

coal · Mar 2, 2008

[...]
If you're reading from a Unix pipe, you can't attempt a read
unless you know that all of the bytes you attempt to read will
be part of the message. If you try to read 100 bytes from a
Unix pipe, your process will block until it has actually read
100 bytes (or the pipe was closed on the write side, or you get
a signal, or a couple of other things which don't concern us
here). If the message only contains 80 more bytes, then you
will block until another message has been sent.

Click to expand...

Hum, the POSIX documentation about read says:
"[...]
The value returned may be less than nbyte if the number of bytes left
in the file is less than nbyte, if the read() request was interrupted
by a signal, or if the file is a pipe or FIFO or special file and has
fewer than nbyte bytes immediately available for reading. For example,
a read() from a file associated with a terminal may return one typed
line of data."
So it should block only if there were *no* message to read in the
first place.

Click to expand...

It didn't work in classical Unix.

Pipes are a fairly complicated under modern Unix, for historical
reasons; most modern Unix implement them using streams (That's
Unix streams, not C++ iostreams, of course.) Which means that an
application *can* write to them using send, and they *can* block
the stream into messages, in which case, they may (and in fact
will) always return at the end of a message. In practice, that
doesn't correspond to the usual use, however. A system might
also treat each individual "write" as a message (Linux seems
to), although this isn't required, and doesn't correspond to the
historical behavior.

Asking for more bytes than are really available won't block
and simply will return what is available.

Click to expand...

Provided at least 1 is available, of course.

That's still not really ideal, since it may return more than one
message.

Ideally the application won't crash and will eventually want the
data. You mentioned earlier how using the msg length can minimize
the number of reads. Reading more than the current msg length could
reduce the number of reads further in some cases. I guess that
someone has already compared the two approaches. I don't know for
sure which one is better, but my guess is the approach that doesn't
use msg lengths will, at least in some applications, have higher
throughput.

Brian Wood

James Kanze · Mar 2, 2008

On Mar 2, 5:52 am, James Kanze <[email protected]> wrote:

[...]

Ideally the application won't crash and will eventually want the
data. You mentioned earlier how using the msg length can minimize
the number of reads. Reading more than the current msg length could
reduce the number of reads further in some cases. I guess that
someone has already compared the two approaches. I don't know for
sure which one is better, but my guess is the approach that doesn't
use msg lengths will, at least in some applications, have higher
throughput.

In some cases, almost certainly. *IF* the Linux behavior is
typical for Unix pipes today, *AND* your messages are guaranteed
to be smaller than the maximum size write to a pipe that is
guaranteed to be atomic (no idea what that is today---it was 4K
the one time I looked at it, but that was in the 1980's), then
reading all that is available and processing all the messages in
it is probably the way to go. In other contexts, it probably
depends. I don't think that there is one absolute rule. (TCP
puts the message length in the header, but that's a very special
context.)

EJP · Mar 3, 2008

You seem to be agreeing with me. It wouldn't make sense for a read on a
pipe to block until all N requested bytes are available. Consider e.g.
what happens if the final piece of data is M < N. It has to return M.

Serializing classes from Web Services	1	Aug 10, 2004
socket between java client and c++ server	2	Dec 14, 2006
C++, Win32 API, .NET, Java, J2ee in UK-Work permit	3	May 6, 2007
Memory leak when calling java code from C using JNI	0	Aug 28, 2009
Java RMI questions and MyEclipse	8	Oct 2, 2012
Is C++ faddish like Java?	25	Aug 4, 2005
Jython / Java / C / C++ interoperability	1	Feb 12, 2007
C++ to Java Conversion Utility	18	Jan 15, 2008

Java Serializing in C/C++?

James Kanze

Paul M. Dubuc

coal

Lew

coal

Lew

Tim Smith

EJP

Tim Smith

James Kanze

EJP

James Kanze

coal

James Kanze

gpderetta

coal

James Kanze

coal

James Kanze

EJP

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads