platform independent serialization of a long

R

RA Scheltema

hi all,


A small question about serializing and deserializing a long in a platform
independent manner. Can this be done with the following code ?:


char buf[4];
long val = 35456;

/* serialize ... on for example intel */
buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

/* deserialize ... on for example mac */
val = 0;
val = val | ((unsigned long) buf[0]) << 24;
val = val | ((unsigned long) buf[1]) << 16;
val = val | ((unsigned long) buf[2]) << 8;
val = val | ((unsigned long) buf[3]) << 0;


According to a collegue of mine, the & (in the first part of the code)
ensures that the least significant and most significant byte is always
intact on whatever platform the buffer is deserialized. I don't agree, any
suggestions ?


kind regards,
richard
 
T

tom_usenet

hi all,


A small question about serializing and deserializing a long in a platform
independent manner. Can this be done with the following code ?:

No, the code assumes that sizeof(long) == 4 (not true on some 64-bit
platforms) and that CHAR_BIT == 8 (not true on some other platforms)
and that all platforms store negative numbers in the same way (not
true on 1s complement platforms, etc.), and use all bits in the value
representation of long.
char buf[4];
long val = 35456;

/* serialize ... on for example intel */
buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

/* deserialize ... on for example mac */
val = 0;
val = val | ((unsigned long) buf[0]) << 24;
val = val | ((unsigned long) buf[1]) << 16;
val = val | ((unsigned long) buf[2]) << 8;
val = val | ((unsigned long) buf[3]) << 0;


According to a collegue of mine, the & (in the first part of the code)
ensures that the least significant and most significant byte is always
intact on whatever platform the buffer is deserialized. I don't agree, any
suggestions ?

Your collegue is correct. Note that the code assumes that all
platforms use the same type of longs, barring byte order. This isn't
true - e.g. sign-magnitude, 1s-complement, 16-bit chars, 64-bit longs,
etc. It is true on most 32-bit desktop platforms though, they have
8-bit chars, 32-bit longs and use 2s-complement for negative numbers.

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
T

Tom St Denis

tom_usenet said:
char buf[4];
long val = 35456;

/* serialize ... on for example intel */
buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

/* deserialize ... on for example mac */
val = 0;
val = val | ((unsigned long) buf[0]) << 24;
val = val | ((unsigned long) buf[1]) << 16;
val = val | ((unsigned long) buf[2]) << 8;
val = val | ((unsigned long) buf[3]) << 0;


According to a collegue of mine, the & (in the first part of the code)
ensures that the least significant and most significant byte is always
intact on whatever platform the buffer is deserialized. I don't agree, any
suggestions ?

Your collegue is correct. Note that the code assumes that all
platforms use the same type of longs, barring byte order. This isn't
true - e.g. sign-magnitude, 1s-complement, 16-bit chars, 64-bit longs,
etc. It is true on most 32-bit desktop platforms though, they have
8-bit chars, 32-bit longs and use 2s-complement for negative numbers.

I don't see this as something that can fail [regardless of how the actual
data is stored]. If you have a type which is at least 32-bits then
val&0xFF000000UL is always "defined". All this means is that on platforms
where they store integer types using fluxums and kawalachums instead of bits
they will have to EMULATE!

It's just like platforms with no FPU or support for 32-bit types. They have
to emulate them with stuff they do have.

So yes, you can portably store/load any integer type in an array of unsigned
chars.

Tom
 
M

Martijn Lievaart

I don't see this as something that can fail [regardless of how the actual
data is stored]. If you have a type which is at least 32-bits then
val&0xFF000000UL is always "defined". All this means is that on platforms
where they store integer types using fluxums and kawalachums instead of bits
they will have to EMULATE!

No, you are assuming that all computers use the same layout for binary
numbers. That assumption is not true. Computers that use ones-complement
(do these exist in reality any more?) store numbers in a different way
than computers using two complement. If you use this method of
transporting between one- and two-complement machines, it will only work
for positive numbers.

Also, transporting this way when there are more than 32 bits will lose
information. Again, this will not work for nagative numbers, even in the
more common two's complement. And becuase the OP mentioned that this was
about transporting a long, there are machines out there that have 64 bit
long.
It's just like platforms with no FPU or support for 32-bit types. They have
to emulate them with stuff they do have.

Not a real comparison. We're talking about systems that have the required
integer types, but happen to store them differently. A better comparison
is to portably store/load floating point types. As the underlying
representations differ from implementation to implementation, this cannot
be done.
So yes, you can portably store/load any integer type in an array of unsigned
chars.

No, you can at most portably store/load positive integers. This is
guarenteed by both C and C++ IIRC. The C++ standard has some vague wording
oon the requirements on integer types that boil down to "unsigned integer
types must use normal binary encoding, positive integers stored in signed
integer types must have the same bit pattern as their unsigned
counterpart". I don't have the C standard, but I know it has a slightly
different wording that basically boils down to the same.

Now in practice, all computers nowadays use two's complement, so in
practice this will work
- between machines that use 32-bit longs.
- when your values are positive and have no more than 32 bits (provided
you zeroed out the extra bits beforehand).

HTH
M4
 
M

Martijn Lievaart

hi all,


A small question about serializing and deserializing a long in a platform
independent manner. Can this be done with the following code ?:


char buf[4];
long val = 35456;

/* serialize ... on for example intel */
buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

/* deserialize ... on for example mac */
val = 0;
val = val | ((unsigned long) buf[0]) << 24;
val = val | ((unsigned long) buf[1]) << 16;
val = val | ((unsigned long) buf[2]) << 8;
val = val | ((unsigned long) buf[3]) << 0;


According to a collegue of mine, the & (in the first part of the code)
ensures that the least significant and most significant byte is always
intact on whatever platform the buffer is deserialized. I don't agree, any
suggestions ?

See my other reply in this thread on why whis is a bad idea. It only works
in some situations.

Three other solutions come to mind.

- If your platform has htonl/ntohl (most do), it is an easy way to achieve
the same and much more portably.

- Use integer arithmetic instead of bitwise operations.

- My favorite: transport as text, not binary.

HTH,
M4
 
T

Tom St Denis

Martijn Lievaart said:
I don't see this as something that can fail [regardless of how the actual
data is stored]. If you have a type which is at least 32-bits then
val&0xFF000000UL is always "defined". All this means is that on platforms
where they store integer types using fluxums and kawalachums instead of bits
they will have to EMULATE!

No, you are assuming that all computers use the same layout for binary
numbers. That assumption is not true. Computers that use ones-complement
(do these exist in reality any more?) store numbers in a different way
than computers using two complement. If you use this method of
transporting between one- and two-complement machines, it will only work
for positive numbers.

I don't see that as being valid. "unsigned long" must have at least 32-bits
of precision.

By your logic

unsigned long x, y;

y = 255UL*256UL*256UL*256UL;
x = some_func();
x &= y;
x >>= 24;

Is undefined because x/y may not be a 2s complement?

WRONG. The value of X will lie in 0..255 and will be the bits 23..31 of the
return of some_func(). In reality this "might use walazaums for bits"
comes into play if you memcpy or otherwise directly copy. So on a 1s
complement machine it would have to emulate as appropriate.

For example, ARMv4 processors don't have FPUs. By your logic

float x = 4.0;

is undefined?
Also, transporting this way when there are more than 32 bits will lose
information. Again, this will not work for nagative numbers, even in the
more common two's complement. And becuase the OP mentioned that this was
about transporting a long, there are machines out there that have 64 bit
long.

Yeah you have to specify precision. However, many algorithms use fixed
precision (re: block ciphers).

Tom
 
M

Martijn Lievaart

I don't see that as being valid. "unsigned long" must have at least 32-bits
of precision.
Yes.


By your logic

unsigned long x, y;

Hey, where did that unsigned creep in? Maybe you want to reread what I
said.
y = 255UL*256UL*256UL*256UL;
x = some_func();
x &= y;
x >>= 24;

Is undefined because x/y may not be a 2s complement?

I said no such thing.
WRONG. The value of X will lie in 0..255 and will be the bits 23..31 of

I'm not wrong, you are reading wrong. And please loose the caps, it's
annoying.
the return of some_func(). In reality this "might use walazaums for
bits" comes into play if you memcpy or otherwise directly copy. So on a
1s complement machine it would have to emulate as appropriate.

There is nothing to emulate on a ones complement machine. It can just use
it native types, which happen to have different representations for
negative numbers than the more common twos complement. Completely valid
in both C and C++, no walazaums involved anywhere.

You might want to read up on what happens when converting negative signed
long values to unsigned long, because that is exactily what we are facing
here.
For example, ARMv4 processors don't have FPUs. By your logic

float x = 4.0;

is undefined?

What twist of logic are you trying to achieve here? I'm positively baffled
by your conlusion, I cannot follow you.
Yeah you have to specify precision. However, many algorithms use fixed
precision (re: block ciphers).

Obvious. When transporting between machines you'll always have to specify
the valid ranges.

M4
 
T

tom_usenet

Martijn Lievaart said:
I don't see this as something that can fail [regardless of how the actual
data is stored]. If you have a type which is at least 32-bits then
val&0xFF000000UL is always "defined". All this means is that on platforms
where they store integer types using fluxums and kawalachums instead of bits
they will have to EMULATE!

No, you are assuming that all computers use the same layout for binary
numbers. That assumption is not true. Computers that use ones-complement
(do these exist in reality any more?) store numbers in a different way
than computers using two complement. If you use this method of
transporting between one- and two-complement machines, it will only work
for positive numbers.

I don't see that as being valid. "unsigned long" must have at least 32-bits
of precision.

He just said it is valid for positive numbers! What has "unsigned
long" got to do with negative numbers?
By your logic

unsigned long x, y;

Where did "unsigned long" come from? The OP was using "long".
y = 255UL*256UL*256UL*256UL;
x = some_func();
x &= y;
x >>= 24;

Is undefined because x/y may not be a 2s complement?

2s complement doesn't apply to unsigned types. It is a convenient way
of representing negative numbers in binary.

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
D

Dan Pop

In said:
A small question about serializing and deserializing a long in a platform
independent manner. Can this be done with the following code ?:

It still assumes that longs are 32-bit entities (4 bytes x 8 bits) on
both platforms. There is no easy way of eliminating this assumption,
short of using a textual representation of the value, instead of a binary
one, i.e. serialise with sprintf and deserialise with sscanf and convert
the native strings to and from BCD (to also remove the assumption that
both platforms use the same character set).
char buf[4];

MUST be unsigned char.
long val = 35456;

MUST be either an unsigned long or contain a positive value. Otherwise,
see below.
/* serialize ... on for example intel */
buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

All the casts to unsigned char are superfluous.
/* deserialize ... on for example mac */
val = 0;
val = val | ((unsigned long) buf[0]) << 24;

If the original value was negative, additional assumptions are needed:
both platforms use the same representation for negative values and the
conversion of an unsigned long value that cannot be represented by a long
preserves the bit pattern. Both assumptions are reasonable, but neither
is guaranteed by the language.
val = val | ((unsigned long) buf[1]) << 16;
val = val | ((unsigned long) buf[2]) << 8;
val = val | ((unsigned long) buf[3]) << 0;

According to a collegue of mine, the & (in the first part of the code)
ensures that the least significant and most significant byte is always
intact on whatever platform the buffer is deserialized. I don't agree, any
suggestions ?

He is perfectly right. Because you're operating on the full
representation of the value, you can be sure that buf[0] will contain
the most significant byte of the value, regardless of the byte order.
And because the value is reconstructed using arithmetic operations,
you can also be sure that the result is correct, again regardless of the
byte order. But getting the byte order right is not enough if you need
to deal with negative values, too.

The proper handling of negative values without the additional assumptions
mentioned above is easy if the implementation also supports long long's
or some other form of integer that provides more than 32 bits. The
first step requires assigning val to uval, an unsigned long variable.
The result is independent of the way nagative values are represented.
Serialise and deserialise uval.

typedef long long big_t;

if ((uval & 0x80000000) != 0)
val = (big_t)uval - (big_t)ULONG_MAX - 1;
else
val = uval;

As you can see, doing the job right even in not a 100% platform
independent way is more complex than just taking care of the byte order.

Dan
 
J

Joona I Palaste

Sean Kelly <[email protected]> scribbled the following
You might also want to look at the socket calls htonl() and ntohl().

Which aren't part of either C or C++, but rather an implementation-
specific extension.

--
/-- Joona Palaste ([email protected]) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"'So called' means: 'There is a long explanation for this, but I have no
time to explain it here.'"
- JIPsoft
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top