Portability and marshalling integral data

Brian · Nov 24, 2009

In the past there's been discussion about how using memcpy
to assist marshalling integral data should be avoided due to
portability reasons. From some very limited testing themore
portable approach that uses bit shifting to output/marshall
integral data is roughly 10 to 15% slower than using memcpy.

I'm wondering if it is possible to set things up so that the
more portable functions are used in a more limited way than
what I've seen suggested here. For example if a server
has both big endian and little endian (Intel) clients, would
it work to have the server send a "big endian" stream to
the big endian clients and a "little endian" stream to
Intel clients -- then both of those types of clients could
use memcpy to read/demarshall integral data types?

I believe it is possible to use memcpy without having
alignment problems. I know it is possible to show
examples of alignment problems using memcpy, but I
don't think those are relevant to marshalling of data
structures that a compiler has set up to it's liking.

What am I missing? Something to do with compiler
flags?

Brian Wood
http://webEbenezer.net

Brian · Nov 24, 2009

(e-mail address removed):

Are you seriously claiming that a data stream can be pumped over the
network faster than the client can do some bitshifting?

No, I'm saying it seems like there's a way to eliminate the
need for bit shifting when reading data. It isn't clear to
me what (portability) is lost by doing it that way.

Brian Wood
http://webEbenezer.net

James Kanze · Nov 25, 2009

No, I'm saying it seems like there's a way to eliminate the
need for bit shifting when reading data. It isn't clear to me
what (portability) is lost by doing it that way.

It's possible for the server to systematically send and receive
data in the internal format of the client, yes. Provided it
knows this format (the client tells it). Does it buy you
anything? I'm not really sure.

The issue is particularly important with regards to floating
point; if the protocol specifies a format with a fixed maximum
precision (e.g. IEEE double), and the machines at both ends
both support higher precision, then information is lost
unnecessarily.

In this regard, you might be interested in the BER encoding of
RealType. In the case of BER, the goal is to not loose
precision unless necessary, so the rule is always to use the
sender's format, with the receiver interpreting it. In
practice, I've never really seen it fully implemented, because
handling multiple formats (even if only on the read side) is so
complicated. In your case, you seem to be only addressing byte
ordering---I've seen several protocols already which support two
different byte orderings, and it's not that complicated. But
IMHO, it still doesn't buy you that much.

James Kanze · Nov 25, 2009

When adding a new client platform with 36 or 64 bit ints, or
32-bit ints in shuffled byte order, for example, one has to
alter the server code for preparing the data for them. Also
the client has to communicate its exact data model to the
server, and the server has to understand it. This makes both
the client and server more complicated.

Of course, this can be done, but the goal has to be stated.
The performance increase probably is not measurable (does not
exceed the network speed fluctuations), so it does not qualify
as a valid goal. There might be no performance increase at
all if the server and client are using different data models
as the conversion has to be done somewhere anyway.

There is one non-performance motivation which might come into
play: if you expect a lot of different people, some very naive,
writing client code. During the connection, you have them send
a certain number of values in an "int", and you interpret this
int to determine native format, and adapt (or reject the
connection because you don't understand the format). In this
case, by adapting to the client format, you can work even if the
authors' of the client code are relatively naive, and don't
really understand the issues. (On the other hand, most such
programmers are probably programming for a Windows platform, so
just adopting Intel format might be sufficient.)

dragan · Nov 25, 2009

Brian said:
In the past there's been discussion about how using memcpy
to assist marshalling integral data should be avoided due to
portability reasons. From some very limited testing themore
portable approach that uses bit shifting to output/marshall
integral data is roughly 10 to 15% slower than using memcpy.

I'm wondering if it is possible to set things up so that the
more portable functions are used in a more limited way than
what I've seen suggested here. For example if a server
has both big endian and little endian (Intel) clients, would
it work to have the server send a "big endian" stream to
the big endian clients and a "little endian" stream to
Intel clients -- then both of those types of clients could
use memcpy to read/demarshall integral data types?

I believe it is possible to use memcpy without having
alignment problems. I know it is possible to show
examples of alignment problems using memcpy, but I
don't think those are relevant to marshalling of data
structures that a compiler has set up to it's liking.

What am I missing? Something to do with compiler
flags?

What you are doing is specifying a protocol. IP, for example, specifies (the
wrong

) byte order for packet information and such, hence all those
htonl-like macros. If you say the spec is little endian, than whatever the
other end of the wire is doesn't matter: THEY have to convert the incoming
data to conform to your protocol specification. There is more than just
endianness to consider, of course. You have to maintain the spec with what
you send out by controlling it with whatever you have to, like compiler
pragmas to control alignment and padding.

James Kanze · Nov 25, 2009

What you are doing is specifying a protocol. IP, for example,
specifies (the wrong ) byte order for packet information
and such, hence all those htonl-like macros.

There is no right or wrong (although I too prefer little
endian). IP is big endian because all of the backbones of the
network (where such protocols are developed) and the machines on
which large scale servers are run are all big endian. (Also,
htonl was an early hack, before all of the issues are
understood. I wouldn't expect to see it in modern code.)

If you say the spec is little endian, than whatever the other
end of the wire is doesn't matter: THEY have to convert the
incoming data to conform to your protocol specification.

Well, usually you negociate with your clients, and find a
consensus

. But yes, there is (or should be) a protocol
specification which specifies exactly how an integral value
should be represented (byte order, but also size and
representation). And everyone has to convert incoming data to
conform to it, and generate it correctly in outgoing data.

There is more than just endianness to consider, of course. You
have to maintain the spec with what you send out by
controlling it with whatever you have to, like compiler
pragmas to control alignment and padding.

You typically can't control it enough just by flipping a few
compiler switches.

Brian · Nov 25, 2009

It's possible for the server to systematically send and receive
data in the internal format of the client, yes. Provided it
knows this format (the client tells it). Does it buy you
anything? I'm not really sure.

Since the task is handled by the server, the clients have an
easier time of it. Presumably the server developer wants
to have clients, so it seems to me a small step to make the
service more attractive. The expectation is for servers to
be built with ample hardware so this just continues that.
(I think the numbers I wrote in the original post are a
little high. After checking into this more I would say that
the bit shifting version is generally less than 11% slower
than the memcpy version for writing/output. It would be a
good idea to check what the difference is like for reading
as well, but for now I'm guessing it is similar.)

The issue is particularly important with regards to floating
point; if the protocol specifies a format with a fixed maximum
precision (e.g. IEEE double), and the machines at both ends
both support higher precision, then information is lost
unnecessarily.

That's interesting. I'm working on getting things working
with integers first. After that I hope to work on floating
point issues. It may not work to have the same sort of
principle for floating point as for integers, but I don't
think that will affect the integer approach.

Brian Wood
http://webEbenezer.net

Brian · Nov 25, 2009

Since the task is handled by the server, the clients have an
easier time of it. Presumably the server developer wants
to have clients, so it seems to me a small step to make the
service more attractive. The expectation is for servers to
be built with ample hardware so this just continues that.

I guess what I'm considering wouldn't do that as much as
possible, since both clients and servers would be possibly
formatting the data with bit shifts when sending. Some of
it is just the simplicity of the implementation -- all
reads/input are handled the same way. That helps me to
think about it without going crazy.

Brian Wood
http://webEbenezer.net

dragan · Nov 25, 2009

James said:
There is no right or wrong (although I too prefer little
endian).

It seems that NOBODY here knows how to parse this->

You typically can't control it enough just by flipping a few
compiler switches.

If you write the spec to assume the conventions of the platform the software
was written on, then you can. Flip a few switches and whatever comes out is
to spec. Call it "an implicit specification" then. Then, the high-level spec
is: little endian, integer sizes, no padding, all integer data on natural
boundaries. Anything else anyone wants to know will be specified later.

Of course, nothing need to be specified if developer B is using the same
platform (compiler, os, hardware).

James Kanze · Nov 26, 2009

[...]

If you write the spec to assume the conventions of the
platform the software was written on, then you can.

If you write the spec to assume the conventions of the platform
you're working on, you may not have to flip any switches. But
of course, your code won't be portable; it won't implement the
same spec on a different platform.

Historically, this mistake was made a lot in the early days of
networking. XDR, for example, follows exactly the conventions
of the Motorola 68000, and until fairly recently, you could just
memory dump data from a 32 bit Sparc, and it would be conform
to the protocol. (I think more modern Sparc's require 8 byte
alignment of double, whereas the protocol only requires 4 byte
alignment.) Of course, if you're on an Intel platform, you'll
need some extra work. (This works against Brian's idea, since
most servers are still big-endian, where as Intel dominates
the client side overwhelmingly.)

Flip a few switches and whatever comes out is to spec. Call it
"an implicit specification" then. Then, the high-level spec
is: little endian, integer sizes, no padding, all integer data
on natural boundaries. Anything else anyone wants to know will
be specified later.

2's complement as well, of course

.

Lucky for us today, none of the people designing protocols were
on Unisys platforms

. (In fact, most of the early work on IP
and TCP was done on DEC platforms. And the byte oriented DECs
are little endian. But I think when the work was being done, it
was still largely DEC 10's: word addressed, with 36 bit words,
and by convention, 5 seven bit bytes in each word, with one left
over bit. At least they didn't base the protocol on that.)

Of course, nothing need to be specified if developer B is
using the same platform (compiler, os, hardware).

Nothing needs to be specified if you can guarantee that for all
time, all future developers will be using the same platform
(compiler, including version, os, hardware). As soon as you
admit that some future version might be compiled with a later
version of the compiler, you're running a risk. (Note that the
byte order in a long changed from one version to the next in
Microsoft compilers for Intel. And the size of a long depends
on a compiler switch with most Unix systems.)

dragan · Nov 27, 2009

James said:
James said:

On Nov 25, 11:45 am, "dragan" <[email protected]> wrote:

Click to expand...

[...]

There is more than just endianness to consider, of course.
You have to maintain the spec with what you send out by
controlling it with whatever you have to, like compiler
pragmas to control alignment and padding.
You typically can't control it enough just by flipping a few
compiler switches.

Click to expand...

Click to expand...

If you write the spec to assume the conventions of the
platform the software was written on, then you can.

Click to expand...

If you write the spec to assume the conventions of the platform
you're working on, you may not have to flip any switches.

Well let's not be TOO lazy now. There is a line of practicality apart from
the EXTREMES.

But
of course, your code won't be portable; it won't implement the
same spec on a different platform.

The research needs to be done. "Blind faith" is never an option (unless it's
a homework assignment and you have more important classes to worry about).

Historically, this mistake was made a lot in the early days of
networking. XDR, for example, follows exactly the conventions
of the Motorola 68000, and until fairly recently, you could just
memory dump data from a 32 bit Sparc, and it would be conform
to the protocol. (I think more modern Sparc's require 8 byte
alignment of double, whereas the protocol only requires 4 byte
alignment.) Of course, if you're on an Intel platform, you'll
need some extra work. (This works against Brian's idea, since
most servers are still big-endian, where as Intel dominates
the client side overwhelmingly.)

Knowing that, you then shouldn't have "suggested" to "make the platform the
spec", something I never implied whatsoever but that you chose to take to
the extreme.

2's complement as well, of course.

I don't remember. I go back and read about that stuff time and again, and
time and again it never sticks in my mind, not like the easy-to-remember
endian/padding/alignment things. So yes, maybe, I'm not sure. Certainly
something to decide before deployment of a protocol though. I'd settle for
the "80%" solution way before I'd go to the OTHER EXTREME of trying to be
platform-agnostic.

Lucky for us today, none of the people designing protocols were
on Unisys platforms. (In fact, most of the early work on IP
and TCP was done on DEC platforms. And the byte oriented DECs
are little endian.

Really? Then why did they choose big-endian as the wire endian-ness?

But I think when the work was being done, it
was still largely DEC 10's: word addressed, with 36 bit words,
and by convention, 5 seven bit bytes in each word, with one left
over bit. At least they didn't base the protocol on that.)

Nothing needs to be specified if you can guarantee that for all
time, all future developers will be using the same platform
(compiler, including version, os, hardware). As soon as you
admit that some future version might be compiled with a later
version of the compiler, you're running a risk.

That is understood. No risk, no gain.

(Note that the
byte order in a long changed from one version to the next in
Microsoft compilers for Intel. And the size of a long depends
on a compiler switch with most Unix systems.)

What's "a long"??

I only recognize well-defined integer types. Not some
nebulous language specification vaguery.

Brian · Nov 27, 2009

[...]

If you write the spec to assume the conventions of the
platform the software was written on, then you can.

Click to expand...

If you write the spec to assume the conventions of the platform
you're working on, you may not have to flip any switches. But
of course, your code won't be portable; it won't implement the
same spec on a different platform.

Historically, this mistake was made a lot in the early days of
networking. XDR, for example, follows exactly the conventions
of the Motorola 68000, and until fairly recently, you could just
memory dump data from a 32 bit Sparc, and it would be conform
to the protocol. (I think more modern Sparc's require 8 byte
alignment of double, whereas the protocol only requires 4 byte
alignment.) Of course, if you're on an Intel platform, you'll
need some extra work. (This works against Brian's idea, since
most servers are still big-endian, where as Intel dominates
the client side overwhelmingly.)

I don't know what you mean. It seems to me that if the modern
sparcs require 8 byte alignment for doubles that will be handled
by the compiler. Are you thinking of packing? With integers,
a memcpy starting from an odd address is OK I think as long as
the 'to' address is kosher and the bytes have been ordered
correctly for that machine.

I've implemented this idea now for integers and it is
available on line. The next step is to work on making
the byte order a template parameter here
http://webEbenezer.net/Buffer.hh rather than
a run time decision -- see the functions named
Receive. Alternatively, I could just leave it
that way for now and introduce some classes
as you suggested once in another thread:
Writer, LEWriter and BEWriter. LE == little endian.
I suppose there are quite a few clients that interact
with only one server or that only interact with one
type of server be it big or little endian. In those
cases, a template parameter makes sense. I've also
heard of chips that can toggle back and forth between
big and little endian.

2's complement as well, of course.

Lucky for us today, none of the people designing protocols were
on Unisys platforms. (In fact, most of the early work on IP
and TCP was done on DEC platforms. And the byte oriented DECs
are little endian. But I think when the work was being done, it
was still largely DEC 10's: word addressed, with 36 bit words,
and by convention, 5 seven bit bytes in each word, with one left
over bit. At least they didn't base the protocol on that.)

Nothing needs to be specified if you can guarantee that for all
time, all future developers will be using the same platform
(compiler, including version, os, hardware). As soon as you
admit that some future version might be compiled with a later
version of the compiler, you're running a risk. (Note that the
byte order in a long changed from one version to the next in
Microsoft compilers for Intel. And the size of a long depends
on a compiler switch with most Unix systems.)

Yes, that's obviously too rigid an approach.
I saw a story earlier about a 98 year old woman that lives
in Rochester, Minnesota. She volunteers at the Mayo Clinic
every day and she measured that she walks 9 miles a day.
I'd be curious to know how many characters you type in a
day.

Brian Wood
http://webEbenezer.net

James Kanze · Nov 27, 2009

James said:
James said:

James Kanze wrote:
On Nov 25, 11:45 am, "dragan" <[email protected]> wrote:

Click to expand...

[...]
Historically, this mistake was made a lot in the early days
of networking. XDR, for example, follows exactly the
conventions of the Motorola 68000, and until fairly
recently, you could just memory dump data from a 32 bit
Sparc, and it would be conform to the protocol. (I think
more modern Sparc's require 8 byte alignment of double,
whereas the protocol only requires 4 byte alignment.) Of
course, if you're on an Intel platform, you'll need some
extra work. (This works against Brian's idea, since most
servers are still big-endian, where as Intel dominates the
client side overwhelmingly.)

Click to expand...

Knowing that, you then shouldn't have "suggested" to "make the
platform the spec", something I never implied whatsoever but
that you chose to take to the extreme.

I have never suggested such a thing. Just the opposite. (But I
know: in these long threads, it's often difficult to keep track
of who said what. And others definitely have proposed such a
thing.)

I don't remember.

You've probably never worked on a machine which wasn't two's
complement. (The first machine I worked on was decimal, not
binary!)

I go back and read about that stuff time and again, and time
and again it never sticks in my mind, not like the
easy-to-remember endian/padding/alignment things.

The name isn't so important. The important thing is to specify
how you represent negative numbers. In most protocols, it's not
just adding a bit for the sign.

So yes, maybe, I'm not sure. Certainly something to decide
before deployment of a protocol though. I'd settle for the
"80%" solution way before I'd go to the OTHER EXTREME of
trying to be platform-agnostic.

It depends on your market. I've worked on applications where
100% Windows was fine. I've also worked on some where we
literally weren't told what machines would be used.

Really? Then why did they choose big-endian as the wire
endian-ness?

I've never figured that one out myself. At the bit level, all
of the protocols I know are little-endian---when sending a byte,
they send the LSB first. Then they turn around and send the
bytes in a word big-endian.

I suspect that IBM may have had some influence here. They were
the first to make byte addressable hardware, and for a long
time, they were the leaders in data transmission and protocols.
And they're resolutely big-endian, even to the point of labeling
the bits from 1 to n (rather than from 0 to n-1) and starting at
the MSB.

What's "a long"?? I only recognize well-defined integer
types. Not some nebulous language specification vaguery.

So call it a uint32_t. The byte order still changed (from 2301
to 0123, where each digit is a power of 256).

James Kanze · Nov 27, 2009

James Kanze wrote:
On Nov 25, 11:45 am, "dragan" <[email protected]> wrote:

Click to expand...

[...]

There is more than just endianness to consider, of
course. You have to maintain the spec with what you
send out by controlling it with whatever you have to,
like compiler pragmas to control alignment and padding.
You typically can't control it enough just by flipping a few
compiler switches.
If you write the spec to assume the conventions of the
platform the software was written on, then you can.

Click to expand...

If you write the spec to assume the conventions of the
platform you're working on, you may not have to flip any
switches. But of course, your code won't be portable; it
won't implement the same spec on a different platform.
Historically, this mistake was made a lot in the early days
of networking. XDR, for example, follows exactly the
conventions of the Motorola 68000, and until fairly
recently, you could just memory dump data from a 32 bit
Sparc, and it would be conform to the protocol. (I think
more modern Sparc's require 8 byte alignment of double,
whereas the protocol only requires 4 byte alignment.) Of
course, if you're on an Intel platform, you'll need some
extra work. (This works against Brian's idea, since most
servers are still big-endian, where as Intel dominates the
client side overwhelmingly.)

Click to expand...

I don't know what you mean. It seems to me that if the modern
sparcs require 8 byte alignment for doubles that will be
handled by the compiler. Are you thinking of packing? With
integers, a memcpy starting from an odd address is OK I think
as long as the 'to' address is kosher and the bytes have been
ordered correctly for that machine.

The problem is dumping the bytes of a struct directly to memory,
or reading it back. In XDR, a struct:
struct { int i; double d; };
will take 12 bytes, and have no padding. Which is exactly what
the compiler generated with the earlier machines. Today, you
can't simply memcpy this line data into the struct on a Sparc,
because the compiler will insert padding between the int and the
double, and if it didn't you'd have an alignment error.

[...]

I'd be curious to know how many characters you type in a day.

It depends on the day, but I do touch type, so I can output text
almost as fast as I can think it.

Brian · Nov 27, 2009

James Kanze wrote:
[...]
There is more than just endianness to consider, of
course. You have to maintain the spec with what you
send out by controlling it with whatever you have to,
like compiler pragmas to control alignment and padding.
You typically can't control it enough just by flipping a few
compiler switches.
If you write the spec to assume the conventions of the
platform the software was written on, then you can.
If you write the spec to assume the conventions of the
platform you're working on, you may not have to flip any
switches. But of course, your code won't be portable; it
won't implement the same spec on a different platform.
Historically, this mistake was made a lot in the early days
of networking. XDR, for example, follows exactly the
conventions of the Motorola 68000, and until fairly
recently, you could just memory dump data from a 32 bit
Sparc, and it would be conform to the protocol. (I think
more modern Sparc's require 8 byte alignment of double,
whereas the protocol only requires 4 byte alignment.) Of
course, if you're on an Intel platform, you'll need some
extra work. (This works against Brian's idea, since most
servers are still big-endian, where as Intel dominates the
client side overwhelmingly.)

Click to expand...

I don't know what you mean. It seems to me that if the modern
sparcs require 8 byte alignment for doubles that will be
handled by the compiler. Are you thinking of packing? With
integers, a memcpy starting from an odd address is OK I think
as long as the 'to' address is kosher and the bytes have been
ordered correctly for that machine.

Click to expand...

The problem is dumping the bytes of a struct directly to memory,
or reading it back. In XDR, a struct:
struct { int i; double d; };
will take 12 bytes, and have no padding. Which is exactly what
the compiler generated with the earlier machines. Today, you
can't simply memcpy this line data into the struct on a Sparc,
because the compiler will insert padding between the int and the
double, and if it didn't you'd have an alignment error.

Well, I don't advocate memcpy'ing whole structs like that.
I treat each member separately. Given that, I don't
perceive a problem with using memcpy recursively on a
member by member basis to read/demarshall data that
has been correctly formatted for the machine doing
the reading. I believe this works fine for integral
data, but have yet to figure out if it works for
floating point.

Brian Wood
http://webEbenezer.net

dragan · Nov 28, 2009

James Kanze said:
James said:

James Kanze wrote:
[...]
Historically, this mistake was made a lot in the early days
of networking. XDR, for example, follows exactly the
conventions of the Motorola 68000, and until fairly
recently, you could just memory dump data from a 32 bit
Sparc, and it would be conform to the protocol. (I think
more modern Sparc's require 8 byte alignment of double,
whereas the protocol only requires 4 byte alignment.) Of
course, if you're on an Intel platform, you'll need some
extra work. (This works against Brian's idea, since most
servers are still big-endian, where as Intel dominates the
client side overwhelmingly.)

Click to expand...

Click to expand...

Knowing that, you then shouldn't have "suggested" to "make the
platform the spec", something I never implied whatsoever but
that you chose to take to the extreme.

Click to expand...

I have never suggested such a thing. Just the opposite. (But I
know: in these long threads, it's often difficult to keep track
of who said what. And others definitely have proposed such a
thing.)

"I dunno".. if "credibility" would equal your sorry lame ass, well it would
explain the downfall of the american economy! Hello. Bitch.

Brian · Dec 8, 2009

It's possible for the server to systematically send and receivedatain the internal format of the client, yes. Provided it
knows this format (the client tells it). Does it buy you
anything? I'm not really sure.

The issue is particularly important with regards to floating
point; if the protocol specifies a format with a fixed maximum
precision (e.g. IEEE double), and the machines at both ends
both support higher precision, then information is lost
unnecessarily.

.... so now I'm starting to work on the floating point
support and wondering how to check if "both support
higher precision." With integral data I know you just
have to be sure the format/byte order is the same and
the type has the same size on both machines. When that
is the case it is possible to use memcpy rather than
the bit shifting when sending data.

Also from this file --
http://webEbenezer.net/Buffer.hh

I have taken the following two snippets:

#if defined(_MSC_VER) || defined(WIN32) || defined(_WIN32) || defined
(__WIN32__) || defined(__CYGWIN__)
char* buf_;
#else
unsigned char* buf_;
#endif

and

#if defined(_MSC_VER) || defined(WIN32) || defined(_WIN32) || defined
(__WIN32__) || defined(__CYGWIN__)
buf_ = new char[bufsize_];
#else
buf_ = new unsigned char[bufsize_];
#endif

There's currently one more place in that file where I
use ifdefs for the same thing. On Windows, if buf_ is
unsigned char*, the compiler gives an error that it
can't convert from unsigned char* to char* on a call
to recv. I could possibly use char* on UNIX or
perhaps use unsigned char* on Windows and use a
reinterpret_cast. The reinterpret_cast satisfies the
compiler and the program seems to run fine still,
but I'm not sure of all the details around this.
Is it safe to use reinterpret_cast<char*> (buf_)
in the call to recv where buf_ is unsigned char* ?

On c.l.c++.m, Bart van Ingen Schenau posted the
following functions.

ostream& write_ieee(ostream& os, double val)
{
int power;
double significand;
unsigned char sign;
unsigned long long mantissa;
unsigned char bytes[8];

if(val<0)
{
sign=1;
val = -val;
}
else
{
sign=0;
}
significand = frexp(val,&power);

if (power < -1022 || power > 1023)
{
cerr << "ieee754: exponent out of range" << endl;
os.setstate(ios::failbit);
}
else
{
power += 1022;
}
mantissa = (significand-0.5) * pow(2,53);

bytes[0] = ((sign & 0x01) << 7) | ((power & 0x7ff) >> 4);
bytes[1] = ((power & 0xf)) << 4 |
((mantissa & 0xfffffffffffffLL) >> 48);
bytes[2] = (mantissa >> 40) & 0xff;
bytes[3] = (mantissa >> 32) & 0xff;
bytes[4] = (mantissa >> 24) & 0xff;
bytes[5] = (mantissa >> 16) & 0xff;
bytes[6] = (mantissa >> 8) & 0xff;
bytes[7] = mantissa & 0xff;
return os.write(reinterpret_cast<const char*>(bytes), 8);

}

istream& read_ieee(istream& is, double& val)
{
unsigned char bytes[8];

is.read(reinterpret_cast<char*>(bytes), 8);
if (is)
{
int power;
double significand;
unsigned char sign;
unsigned long long mantissa;

mantissa = ( ((unsigned long long)bytes[7]) |
(((unsigned long long)bytes[6]) << 8) |
(((unsigned long long)bytes[5]) << 16) |
(((unsigned long long)bytes[4]) << 24) |
(((unsigned long long)bytes[3]) << 32) |
(((unsigned long long)bytes[2]) << 40) |
(((unsigned long long)bytes[1]) << 48) )
& 0xfffffffffffffLL;
significand = (mantissa/pow(2,53)) + 0.5;
power = (((bytes[1] >> 4) |
(((unsigned int)bytes[0]) << 4)) & 0x7ff) - 1022;
sign = bytes[0] >> 7;
val = ldexp(significand, power);
if (sign) val = -val;
}
return is;

}

---------------------------------------

I plan to use them as the basis of the floating
point support I'm working on. In the write function
he has:

bytes[1] = ((power & 0xf)) << 4 |
((mantissa & 0xfffffffffffffLL) >> 48);

Would it be equivalent to write it like this:
bytes[1] = ((power & 0xf)) << 4 |
((mantissa >> 48) & 0xf);

?

Please let me know if anyone detects problems
with the above functions.

Brian Wood
http://webEbenezer.net

Brian · Dec 25, 2009

It's possible for the server to systematically send and receive
data in the internal format of the client, yes. Provided it
knows this format (the client tells it). Does it buy you
anything? I'm not really sure.

The issue is particularly important with regards to floating
point; if the protocol specifies a format with a fixed maximum
precision (e.g.IEEEdouble), and the machines at both ends
both support higher precision, then information is lost
unnecessarily.

In this regard, you might be interested in theBERencoding of
RealType. In the case ofBER, the goal is to not loose
precision unless necessary, so the rule is always to use the
sender's format, with the receiver interpreting it.

In retrospect, I wish I had done things this way. Instead I
made senders format for byte order. I'm in the process
now of changing it to "always use the sender's format."

A week or so ago I made my Buffer class a class template
parameterized on the byte order.
http://webEbenezer.net/misc/Buffer.hh
http://webEbenezer.net/misc/Formatting.hh

I've also added a tar file now that has these files
http://webEbenezer.net/misc/direct.tar.bz2

Anyway, I thought everything was going fine with that.
For example, http://webEbenezer.net/misc/File.hh, shows
how using function templates handle things nicely for a
stream constructor and a Send function. But then I
remembered that the Send functions can be virtual.
Since C++ doesn't currently support mixing function
templates and virtual functions, I was forced to
change my approach.

Thankfully, there seems to be a way to avoid resorting
to class templates (for all of the classes involved in
marshalling) which permit the use of virtual functions.
I'm thinking about splitting up the Buffer class into
two classes: SendBuffer and ReceiveBuffer. (The names
may change.) SendBuffer would be a plain class and
ReceiveBuffer would be a class template parameterized on
the byte order. The stream constructors would be
function templates still, but the Send functions wouldn't
be and they could still be virtual. I've liked having one
Buffer class that was used for both sending and receiving,
but I really dislike the idea of requiring all user
classes (that are marshalled) to be class templates.
This seems like a reasonable way to deal with the
various factors. I'm curious whether others have one or
two classes involved in buffering. So now I'm hoping to
not do any formatting when sending data and leave that
up to the receiving code.

I spent quite a bit of time working things out the
other way, so it is taking me a while to rework
things. I don't think of it as totally wasted time,
since the two implementations are similar. Perhaps
though I should have caught this earlier and avoided
the need to rework things. So often I'm reminded of
the saying, "Better late than never."

Brian Wood
http://webEbenezer.net

Portable marshalling of floating point data	3	Dec 10, 2009
reinterpret_cast portability/alignment issues	10	Dec 31, 2006
size and nomenclature of integral types	4	Apr 3, 2004
Q about endian-ness/portability	3	Jan 13, 2004
C++ Middleware Writer version 1.10 is now on line	0	Nov 29, 2009
PORTING Applications from 32 bit to 64 bit Architecture	5	Apr 1, 2009
porting application from 32 bit to 64 bit architecture	0	Apr 1, 2009
Hello, Complex World	5	Jan 12, 2010

Portability and marshalling integral data

Brian

Brian

James Kanze

James Kanze

dragan

James Kanze

Brian

Brian

dragan

James Kanze

dragan

Brian

James Kanze

James Kanze

Brian

dragan

Brian

Brian

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads