platform independent serialization of a long

Discussion in 'C Programming' started by RA Scheltema, Jan 23, 2004.

  1. RA Scheltema

    RA Scheltema Guest

    hi all,


    A small question about serializing and deserializing a long in a platform
    independent manner. Can this be done with the following code ?:


    char buf[4];
    long val = 35456;

    /* serialize ... on for example intel */
    buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
    buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
    buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
    buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

    /* deserialize ... on for example mac */
    val = 0;
    val = val | ((unsigned long) buf[0]) << 24;
    val = val | ((unsigned long) buf[1]) << 16;
    val = val | ((unsigned long) buf[2]) << 8;
    val = val | ((unsigned long) buf[3]) << 0;


    According to a collegue of mine, the & (in the first part of the code)
    ensures that the least significant and most significant byte is always
    intact on whatever platform the buffer is deserialized. I don't agree, any
    suggestions ?


    kind regards,
    richard
    RA Scheltema, Jan 23, 2004
    #1
    1. Advertising

  2. RA Scheltema

    tom_usenet Guest

    On Fri, 23 Jan 2004 12:37:23 +0100, "RA Scheltema"
    <r.a.scheltema[viral][p]@[m]dacolian.nl> wrote:

    >hi all,
    >
    >
    >A small question about serializing and deserializing a long in a platform
    >independent manner. Can this be done with the following code ?:


    No, the code assumes that sizeof(long) == 4 (not true on some 64-bit
    platforms) and that CHAR_BIT == 8 (not true on some other platforms)
    and that all platforms store negative numbers in the same way (not
    true on 1s complement platforms, etc.), and use all bits in the value
    representation of long.

    >char buf[4];
    >long val = 35456;
    >
    >/* serialize ... on for example intel */
    >buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
    >buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
    >buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
    >buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);
    >
    >/* deserialize ... on for example mac */
    >val = 0;
    >val = val | ((unsigned long) buf[0]) << 24;
    >val = val | ((unsigned long) buf[1]) << 16;
    >val = val | ((unsigned long) buf[2]) << 8;
    >val = val | ((unsigned long) buf[3]) << 0;
    >
    >
    >According to a collegue of mine, the & (in the first part of the code)
    >ensures that the least significant and most significant byte is always
    >intact on whatever platform the buffer is deserialized. I don't agree, any
    >suggestions ?


    Your collegue is correct. Note that the code assumes that all
    platforms use the same type of longs, barring byte order. This isn't
    true - e.g. sign-magnitude, 1s-complement, 16-bit chars, 64-bit longs,
    etc. It is true on most 32-bit desktop platforms though, they have
    8-bit chars, 32-bit longs and use 2s-complement for negative numbers.

    Tom

    C++ FAQ: http://www.parashift.com/c -faq-lite/
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    tom_usenet, Jan 23, 2004
    #2
    1. Advertising

  3. RA Scheltema

    Tom St Denis Guest

    "tom_usenet" <> wrote in message
    news:...
    > >char buf[4];
    > >long val = 35456;
    > >
    > >/* serialize ... on for example intel */
    > >buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
    > >buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
    > >buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
    > >buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);
    > >
    > >/* deserialize ... on for example mac */
    > >val = 0;
    > >val = val | ((unsigned long) buf[0]) << 24;
    > >val = val | ((unsigned long) buf[1]) << 16;
    > >val = val | ((unsigned long) buf[2]) << 8;
    > >val = val | ((unsigned long) buf[3]) << 0;
    > >
    > >
    > >According to a collegue of mine, the & (in the first part of the code)
    > >ensures that the least significant and most significant byte is always
    > >intact on whatever platform the buffer is deserialized. I don't agree,

    any
    > >suggestions ?

    >
    > Your collegue is correct. Note that the code assumes that all
    > platforms use the same type of longs, barring byte order. This isn't
    > true - e.g. sign-magnitude, 1s-complement, 16-bit chars, 64-bit longs,
    > etc. It is true on most 32-bit desktop platforms though, they have
    > 8-bit chars, 32-bit longs and use 2s-complement for negative numbers.


    I don't see this as something that can fail [regardless of how the actual
    data is stored]. If you have a type which is at least 32-bits then
    val&0xFF000000UL is always "defined". All this means is that on platforms
    where they store integer types using fluxums and kawalachums instead of bits
    they will have to EMULATE!

    It's just like platforms with no FPU or support for 32-bit types. They have
    to emulate them with stuff they do have.

    So yes, you can portably store/load any integer type in an array of unsigned
    chars.

    Tom
    Tom St Denis, Jan 23, 2004
    #3
  4. On Fri, 23 Jan 2004 13:39:04 +0000, Tom St Denis wrote:

    >
    > I don't see this as something that can fail [regardless of how the actual
    > data is stored]. If you have a type which is at least 32-bits then
    > val&0xFF000000UL is always "defined". All this means is that on platforms
    > where they store integer types using fluxums and kawalachums instead of bits
    > they will have to EMULATE!


    No, you are assuming that all computers use the same layout for binary
    numbers. That assumption is not true. Computers that use ones-complement
    (do these exist in reality any more?) store numbers in a different way
    than computers using two complement. If you use this method of
    transporting between one- and two-complement machines, it will only work
    for positive numbers.

    Also, transporting this way when there are more than 32 bits will lose
    information. Again, this will not work for nagative numbers, even in the
    more common two's complement. And becuase the OP mentioned that this was
    about transporting a long, there are machines out there that have 64 bit
    long.

    > It's just like platforms with no FPU or support for 32-bit types. They have
    > to emulate them with stuff they do have.


    Not a real comparison. We're talking about systems that have the required
    integer types, but happen to store them differently. A better comparison
    is to portably store/load floating point types. As the underlying
    representations differ from implementation to implementation, this cannot
    be done.

    > So yes, you can portably store/load any integer type in an array of unsigned
    > chars.


    No, you can at most portably store/load positive integers. This is
    guarenteed by both C and C++ IIRC. The C++ standard has some vague wording
    oon the requirements on integer types that boil down to "unsigned integer
    types must use normal binary encoding, positive integers stored in signed
    integer types must have the same bit pattern as their unsigned
    counterpart". I don't have the C standard, but I know it has a slightly
    different wording that basically boils down to the same.

    Now in practice, all computers nowadays use two's complement, so in
    practice this will work
    - between machines that use 32-bit longs.
    - when your values are positive and have no more than 32 bits (provided
    you zeroed out the extra bits beforehand).

    HTH
    M4
    Martijn Lievaart, Jan 23, 2004
    #4
  5. On Fri, 23 Jan 2004 12:37:23 +0100, RA Scheltema wrote:

    > hi all,
    >
    >
    > A small question about serializing and deserializing a long in a platform
    > independent manner. Can this be done with the following code ?:
    >
    >
    > char buf[4];
    > long val = 35456;
    >
    > /* serialize ... on for example intel */
    > buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
    > buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
    > buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
    > buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);
    >
    > /* deserialize ... on for example mac */
    > val = 0;
    > val = val | ((unsigned long) buf[0]) << 24;
    > val = val | ((unsigned long) buf[1]) << 16;
    > val = val | ((unsigned long) buf[2]) << 8;
    > val = val | ((unsigned long) buf[3]) << 0;
    >
    >
    > According to a collegue of mine, the & (in the first part of the code)
    > ensures that the least significant and most significant byte is always
    > intact on whatever platform the buffer is deserialized. I don't agree, any
    > suggestions ?


    See my other reply in this thread on why whis is a bad idea. It only works
    in some situations.

    Three other solutions come to mind.

    - If your platform has htonl/ntohl (most do), it is an easy way to achieve
    the same and much more portably.

    - Use integer arithmetic instead of bitwise operations.

    - My favorite: transport as text, not binary.

    HTH,
    M4
    Martijn Lievaart, Jan 23, 2004
    #5
  6. RA Scheltema

    Tom St Denis Guest

    "Martijn Lievaart" <> wrote in message
    news:p...
    > On Fri, 23 Jan 2004 13:39:04 +0000, Tom St Denis wrote:
    >
    > >
    > > I don't see this as something that can fail [regardless of how the

    actual
    > > data is stored]. If you have a type which is at least 32-bits then
    > > val&0xFF000000UL is always "defined". All this means is that on

    platforms
    > > where they store integer types using fluxums and kawalachums instead of

    bits
    > > they will have to EMULATE!

    >
    > No, you are assuming that all computers use the same layout for binary
    > numbers. That assumption is not true. Computers that use ones-complement
    > (do these exist in reality any more?) store numbers in a different way
    > than computers using two complement. If you use this method of
    > transporting between one- and two-complement machines, it will only work
    > for positive numbers.


    I don't see that as being valid. "unsigned long" must have at least 32-bits
    of precision.

    By your logic

    unsigned long x, y;

    y = 255UL*256UL*256UL*256UL;
    x = some_func();
    x &= y;
    x >>= 24;

    Is undefined because x/y may not be a 2s complement?

    WRONG. The value of X will lie in 0..255 and will be the bits 23..31 of the
    return of some_func(). In reality this "might use walazaums for bits"
    comes into play if you memcpy or otherwise directly copy. So on a 1s
    complement machine it would have to emulate as appropriate.

    For example, ARMv4 processors don't have FPUs. By your logic

    float x = 4.0;

    is undefined?

    > Also, transporting this way when there are more than 32 bits will lose
    > information. Again, this will not work for nagative numbers, even in the
    > more common two's complement. And becuase the OP mentioned that this was
    > about transporting a long, there are machines out there that have 64 bit
    > long.


    Yeah you have to specify precision. However, many algorithms use fixed
    precision (re: block ciphers).

    Tom
    Tom St Denis, Jan 23, 2004
    #6
  7. On Fri, 23 Jan 2004 15:01:06 +0000, Tom St Denis wrote:

    >> No, you are assuming that all computers use the same layout for binary
    >> numbers. That assumption is not true. Computers that use ones-complement
    >> (do these exist in reality any more?) store numbers in a different way
    >> than computers using two complement. If you use this method of
    >> transporting between one- and two-complement machines, it will only work
    >> for positive numbers.

    >
    > I don't see that as being valid. "unsigned long" must have at least 32-bits
    > of precision.


    Yes.

    >
    > By your logic
    >
    > unsigned long x, y;


    Hey, where did that unsigned creep in? Maybe you want to reread what I
    said.

    >
    > y = 255UL*256UL*256UL*256UL;
    > x = some_func();
    > x &= y;
    > x >>= 24;
    >
    > Is undefined because x/y may not be a 2s complement?


    I said no such thing.

    >
    > WRONG. The value of X will lie in 0..255 and will be the bits 23..31 of


    I'm not wrong, you are reading wrong. And please loose the caps, it's
    annoying.

    > the return of some_func(). In reality this "might use walazaums for
    > bits" comes into play if you memcpy or otherwise directly copy. So on a
    > 1s complement machine it would have to emulate as appropriate.


    There is nothing to emulate on a ones complement machine. It can just use
    it native types, which happen to have different representations for
    negative numbers than the more common twos complement. Completely valid
    in both C and C++, no walazaums involved anywhere.

    You might want to read up on what happens when converting negative signed
    long values to unsigned long, because that is exactily what we are facing
    here.

    >
    > For example, ARMv4 processors don't have FPUs. By your logic
    >
    > float x = 4.0;
    >
    > is undefined?


    What twist of logic are you trying to achieve here? I'm positively baffled
    by your conlusion, I cannot follow you.

    >
    >> Also, transporting this way when there are more than 32 bits will lose
    >> information. Again, this will not work for nagative numbers, even in
    >> the more common two's complement. And becuase the OP mentioned that
    >> this was about transporting a long, there are machines out there that
    >> have 64 bit long.

    >
    > Yeah you have to specify precision. However, many algorithms use fixed
    > precision (re: block ciphers).


    Obvious. When transporting between machines you'll always have to specify
    the valid ranges.

    M4
    Martijn Lievaart, Jan 23, 2004
    #7
  8. RA Scheltema

    tom_usenet Guest

    On Fri, 23 Jan 2004 15:01:06 GMT, "Tom St Denis" <>
    wrote:

    >
    >"Martijn Lievaart" <> wrote in message
    >news:p...
    >> On Fri, 23 Jan 2004 13:39:04 +0000, Tom St Denis wrote:
    >>
    >> >
    >> > I don't see this as something that can fail [regardless of how the

    >actual
    >> > data is stored]. If you have a type which is at least 32-bits then
    >> > val&0xFF000000UL is always "defined". All this means is that on

    >platforms
    >> > where they store integer types using fluxums and kawalachums instead of

    >bits
    >> > they will have to EMULATE!

    >>
    >> No, you are assuming that all computers use the same layout for binary
    >> numbers. That assumption is not true. Computers that use ones-complement
    >> (do these exist in reality any more?) store numbers in a different way
    >> than computers using two complement. If you use this method of
    >> transporting between one- and two-complement machines, it will only work
    >> for positive numbers.

    >
    >I don't see that as being valid. "unsigned long" must have at least 32-bits
    >of precision.


    He just said it is valid for positive numbers! What has "unsigned
    long" got to do with negative numbers?

    >
    >By your logic
    >
    >unsigned long x, y;


    Where did "unsigned long" come from? The OP was using "long".

    >
    >y = 255UL*256UL*256UL*256UL;
    >x = some_func();
    >x &= y;
    >x >>= 24;
    >
    >Is undefined because x/y may not be a 2s complement?


    2s complement doesn't apply to unsigned types. It is a convenient way
    of representing negative numbers in binary.

    Tom

    C++ FAQ: http://www.parashift.com/c -faq-lite/
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    tom_usenet, Jan 23, 2004
    #8
  9. RA Scheltema

    Dan Pop Guest

    In <40110774$0$329$4all.nl> "RA Scheltema" <r.a.scheltema[viral][p]@[m]dacolian.nl> writes:

    >A small question about serializing and deserializing a long in a platform
    >independent manner. Can this be done with the following code ?:


    It still assumes that longs are 32-bit entities (4 bytes x 8 bits) on
    both platforms. There is no easy way of eliminating this assumption,
    short of using a textual representation of the value, instead of a binary
    one, i.e. serialise with sprintf and deserialise with sscanf and convert
    the native strings to and from BCD (to also remove the assumption that
    both platforms use the same character set).

    >char buf[4];


    MUST be unsigned char.

    >long val = 35456;


    MUST be either an unsigned long or contain a positive value. Otherwise,
    see below.

    >/* serialize ... on for example intel */
    >buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
    >buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
    >buf[2] = (unsigned char) ((val & 0x0000ff00) >> 8);
    >buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);


    All the casts to unsigned char are superfluous.

    >/* deserialize ... on for example mac */
    >val = 0;
    >val = val | ((unsigned long) buf[0]) << 24;


    If the original value was negative, additional assumptions are needed:
    both platforms use the same representation for negative values and the
    conversion of an unsigned long value that cannot be represented by a long
    preserves the bit pattern. Both assumptions are reasonable, but neither
    is guaranteed by the language.

    >val = val | ((unsigned long) buf[1]) << 16;
    >val = val | ((unsigned long) buf[2]) << 8;
    >val = val | ((unsigned long) buf[3]) << 0;
    >
    >According to a collegue of mine, the & (in the first part of the code)
    >ensures that the least significant and most significant byte is always
    >intact on whatever platform the buffer is deserialized. I don't agree, any
    >suggestions ?


    He is perfectly right. Because you're operating on the full
    representation of the value, you can be sure that buf[0] will contain
    the most significant byte of the value, regardless of the byte order.
    And because the value is reconstructed using arithmetic operations,
    you can also be sure that the result is correct, again regardless of the
    byte order. But getting the byte order right is not enough if you need
    to deal with negative values, too.

    The proper handling of negative values without the additional assumptions
    mentioned above is easy if the implementation also supports long long's
    or some other form of integer that provides more than 32 bits. The
    first step requires assigning val to uval, an unsigned long variable.
    The result is independent of the way nagative values are represented.
    Serialise and deserialise uval.

    typedef long long big_t;

    if ((uval & 0x80000000) != 0)
    val = (big_t)uval - (big_t)ULONG_MAX - 1;
    else
    val = uval;

    As you can see, doing the job right even in not a 100% platform
    independent way is more complex than just taking care of the byte order.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
    Dan Pop, Jan 23, 2004
    #9
  10. RA Scheltema

    Sean Kelly Guest

    You might also want to look at the socket calls htonl() and ntohl().


    Sean
    Sean Kelly, Jan 23, 2004
    #10
  11. Sean Kelly <> scribbled the following
    on comp.lang.c:
    > You might also want to look at the socket calls htonl() and ntohl().


    Which aren't part of either C or C++, but rather an implementation-
    specific extension.

    --
    /-- Joona Palaste () ------------- Finland --------\
    \-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
    "'So called' means: 'There is a long explanation for this, but I have no
    time to explain it here.'"
    - JIPsoft
    Joona I Palaste, Jan 24, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ibeetb
    Replies:
    1
    Views:
    434
    Steve C. Orr [MVP, MCSD]
    Jun 1, 2004
  2. Harald Hein
    Replies:
    9
    Views:
    416
    Andrew Thompson
    Aug 17, 2003
  3. RA Scheltema
    Replies:
    10
    Views:
    543
    Joona I Palaste
    Jan 24, 2004
  4. MK
    Replies:
    1
    Views:
    824
    Peter Hansen
    Jun 25, 2003
  5. Gandalf
    Replies:
    1
    Views:
    293
    Lawrence Oluyede
    Aug 18, 2004
Loading...

Share This Page