Re: Serializing binary data for use accross different platforms

Discussion in 'C++' started by Victor Bazarov, May 5, 2010.

  1. On 5/5/2010 2:40 PM, Peter Olcott wrote:
    > If only integer base types are used, then it seems that
    > serializing data for cross platform use requires only two
    > things:
    > (1) Decomposing aggregate types into sequences of integral
    > types.
    > (2) Accounting for Endianess (Big, Little, Mixed).
    >
    > Endianess can be determined at run time by casting known
    > values (such as 0x12345678, and 0x1234) for unsigned int and
    > unsigned short into char*.
    >
    > Is there anything that I am missing here?


    Uh... Only the possible problems with different representations of
    negative values (one's complement vs two's complement vs signed magnitude).

    V
    --
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, May 5, 2010
    #1
    1. Advertising

  2. Victor Bazarov

    James Kanze Guest

    On May 5, 7:51 pm, Victor Bazarov <> wrote:
    > On 5/5/2010 2:40 PM, Peter Olcott wrote:


    > > If only integer base types are used, then it seems that
    > > serializing data for cross platform use requires only two
    > > things:
    > > (1) Decomposing aggregate types into sequences of integral
    > > types.
    > > (2) Accounting for Endianess (Big, Little, Mixed).


    > > Endianess can be determined at run time by casting known
    > > values (such as 0x12345678, and 0x1234) for unsigned int and
    > > unsigned short into char*.


    > > Is there anything that I am missing here?


    > Uh... Only the possible problems with different
    > representations of negative values (one's complement vs two's
    > complement vs signed magnitude).


    And the fact that the number of bits in a char can vary, and
    that the size isn't always 2,4,8... And that there can be bits
    which don't contribute to the representation, but must have some
    particular value. And that even for a 4 byte 2's complement,
    there are quite a few different orderings possible: I've seen at
    least three on popular machines.

    --
    James Kanze
     
    James Kanze, May 5, 2010
    #2
    1. Advertising

  3. On 05.05.2010 21:32, * Geoff:
    > On Wed, 5 May 2010 14:13:07 -0500, "Peter Olcott"
    > <> wrote:
    >
    >>
    >> "Victor Bazarov"<> wrote in
    >> message news:hrsenu$hmq$-september.org...
    >>> On 5/5/2010 2:40 PM, Peter Olcott wrote:
    >>>> If only integer base types are used, then it seems that
    >>>> serializing data for cross platform use requires only two
    >>>> things:
    >>>> (1) Decomposing aggregate types into sequences of
    >>>> integral
    >>>> types.
    >>>> (2) Accounting for Endianess (Big, Little, Mixed).
    >>>>
    >>>> Endianess can be determined at run time by casting known
    >>>> values (such as 0x12345678, and 0x1234) for unsigned int
    >>>> and
    >>>> unsigned short into char*.
    >>>>
    >>>> Is there anything that I am missing here?
    >>>
    >>> Uh... Only the possible problems with different
    >>> representations of negative values (one's complement vs
    >>> two's complement vs signed magnitude).

    >>
    >> What do you mean by [signed magnitude] ???
    >>

    >
    > More properly called sign-magnitude. The representation is a sign bit
    > plus binary digits representing the absolute value of the number.
    >
    > sign-magnitude in 8 bits:
    >
    > 00000101 = 5
    > 10000101 = -5
    >
    > as opposed to two-compliment:


    More properly called two's complement form.


    > 00000101 = 5
    > 10000011 = -5


    <example>
    C:\test> py3 -c print('{:08b}'.format(256-5))
    11111011

    C:\test> _
    </example>


    Cheers & hth.,

    - Alf
     
    Alf P. Steinbach, May 5, 2010
    #3
  4. On 5/5/2010 3:32 PM, Geoff wrote:
    > On Wed, 5 May 2010 14:13:07 -0500, "Peter Olcott"
    > <> wrote:
    >
    >>
    >> "Victor Bazarov"<> wrote in
    >> message news:hrsenu$hmq$-september.org...
    >>> On 5/5/2010 2:40 PM, Peter Olcott wrote:
    >>>> If only integer base types are used, then it seems that
    >>>> serializing data for cross platform use requires only two
    >>>> things:
    >>>> (1) Decomposing aggregate types into sequences of
    >>>> integral
    >>>> types.
    >>>> (2) Accounting for Endianess (Big, Little, Mixed).
    >>>>
    >>>> Endianess can be determined at run time by casting known
    >>>> values (such as 0x12345678, and 0x1234) for unsigned int
    >>>> and
    >>>> unsigned short into char*.
    >>>>
    >>>> Is there anything that I am missing here?
    >>>
    >>> Uh... Only the possible problems with different
    >>> representations of negative values (one's complement vs
    >>> two's complement vs signed magnitude).

    >>
    >> What do you mean by [signed magnitude] ???
    >>

    >
    > More properly called sign-magnitude. [..]
    >


    Not sure of your sources of propriety of different terms, I just use the
    C++ Standard (see [basic.fundamental]/7, in brackets). They don't spell
    'two' or 'one', though.

    V
    --
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, May 5, 2010
    #4
  5. Victor Bazarov

    James Kanze Guest

    On 5 May, 20:37, "Peter Olcott" <> wrote:
    > "Geoff" <> wrote in message
    > >>> Uh... Only the possible problems with different


    [...]
    > >>> representations of negative values (one's complement vs
    > >>> two's complement vs signed magnitude).


    > >>What do you mean by [signed magnitude] ???


    > > More properly called sign-magnitude. The representation is
    > > a sign bit
    > > plus binary digits representing the absolute value of the
    > > number.


    > > sign-magnitude in 8 bits:


    > > 00000101 = 5
    > > 10000101 = -5


    > > as opposed to two-compliment:


    > > 00000101 = 5
    > > 10000011 = -5


    > Most modern machines are twos complement, right?


    Most, but not all. I know of at least two machines still being
    sold which are not two's complement.

    In the end, you have to decide what degree of portability you
    need. Assuming 2's complement and IEEE floating point will
    cover PC's and most (probably all) Unixes. Drop the IEEE
    floating point for IBM mainframes, and both assumptions (along
    with 8 bit char's) for general portability to mainframes. For
    embedded system, you might also have to drop the assumtion that
    char is 8 bits.

    --
    James Kanze
     
    James Kanze, May 7, 2010
    #5
  6. Victor Bazarov

    James Kanze Guest

    On 5 May, 20:36, "Peter Olcott" <> wrote:
    > "James Kanze" <> wrote in message


    > news:...
    > > On May 5, 7:51 pm, Victor Bazarov
    > > <> wrote:
    > >> On 5/5/2010 2:40 PM, Peter Olcott wrote:


    > >> > If only integer base types are used, then it seems that
    > >> > serializing data for cross platform use requires only two
    > >> > things:
    > >> > (1) Decomposing aggregate types into sequences of
    > >> > integral types.
    > >> > (2) Accounting for Endianess (Big, Little, Mixed).


    > >> > Endianess can be determined at run time by casting known
    > >> > values (such as 0x12345678, and 0x1234) for unsigned int
    > >> > and unsigned short into char*.


    > >> > Is there anything that I am missing here?


    > >> Uh... Only the possible problems with different
    > >> representations of negative values (one's complement vs
    > >> two's complement vs signed magnitude).


    > > And the fact that the number of bits in a char can vary, and
    > > that the size isn't always 2,4,8... And that there can be
    > > bits


    > I think that this may be rare enough to ignore.


    It depends on your requirements. You specifically asked about
    cross platform stuff. For many applications, they are rare
    enough to ignore. For many applications, everything but PC's
    are "rare enough to ignore"; for many others, PC's and a few
    mainstream Unix (Linux on PC, Solaris on Sparc, etc.) will
    suffice. For others, IBM mainframes must be considered (with
    non IEEE floating point and EBCDIC characters). And at least
    one commercial mainframe today uses 48 bit sign-magnitude ints,
    and another 36 bit 1's complement.

    > > which don't contribute to the representation, but must have
    > > some particular value. And that even for a 4 byte 2's
    > > complement, there are quite a few different orderings
    > > possible: I've seen at least three on popular machines.


    > Are you talking about byte orderings, Endianess?


    Byte order. There are 24 different ways to order 4 bytes. In
    practice, I've only seen three. But treating such ints as
    values always works, regardless of the representation and byte
    order.

    --
    James Kanze
     
    James Kanze, May 7, 2010
    #6
  7. Victor Bazarov

    Brian Guest

    On May 7, 11:47 am, James Kanze <> wrote:
    > On 5 May, 20:36, "Peter Olcott" <> wrote:
    >
    >
    >
    >
    >
    > > "James Kanze" <> wrote in message
    > >news:....
    > > > On May 5, 7:51 pm, Victor Bazarov
    > > > <> wrote:
    > > >> On 5/5/2010 2:40 PM, Peter Olcott wrote:
    > > >> > If only integer base types are used, then it seems that
    > > >> > serializing data for cross platform use requires only two
    > > >> > things:
    > > >> > (1) Decomposing aggregate types into sequences of
    > > >> > integral types.
    > > >> > (2) Accounting for Endianess (Big, Little, Mixed).
    > > >> > Endianess can be determined at run time by casting known
    > > >> > values (such as 0x12345678, and 0x1234) for unsigned int
    > > >> > and unsigned short into char*.
    > > >> > Is there anything that I am missing here?
    > > >> Uh... Only the possible problems with different
    > > >> representations of negative values (one's complement vs
    > > >> two's complement vs signed magnitude).
    > > > And the fact that the number of bits in a char can vary, and
    > > > that the size isn't always 2,4,8...  And that there can be
    > > > bits

    > > I think that this may be rare enough to ignore.

    >
    > It depends on your requirements.  You specifically asked about
    > cross platform stuff.  For many applications, they are rare
    > enough to ignore.  For many applications, everything but PC's
    > are "rare enough to ignore"; for many others, PC's and a few
    > mainstream Unix (Linux on PC, Solaris on Sparc, etc.) will
    > suffice.  For others, IBM mainframes must be considered (with
    > non IEEE floating point and EBCDIC characters).  And at least
    > one commercial mainframe today uses 48 bit sign-magnitude ints,
    > and another 36 bit 1's complement.
    >
    > > > which don't contribute to the representation, but must have
    > > > some particular value.  And that even for a 4 byte 2's
    > > > complement, there are quite a few different orderings
    > > > possible: I've seen at least three on popular machines.

    > > Are you talking about byte orderings, Endianess?

    >
    > Byte order.  There are 24 different ways to order 4 bytes.  In
    > practice, I've only seen three.  



    Are there any machines being newly introduced today that aren't
    big or little-endian?


    Brian Wood
    http://webEbenezer.net
    (651) 251-9384
     
    Brian, May 8, 2010
    #7
  8. Victor Bazarov

    James Kanze Guest

    On May 8, 5:57 pm, Brian <> wrote:
    > On May 7, 11:47 am, James Kanze <> wrote:


    [...]
    > > Byte order. There are 24 different ways to order 4 bytes. In
    > > practice, I've only seen three.


    > Are there any machines being newly introduced today that aren't
    > big or little-endian?


    Not sure what you mean by "machines"; there are certainly new
    models introduced for the machines without any byte order. And
    if you meant "architectures", the question would probably be
    "are there any architectures being newly introduced today",
    period.

    If the question concerned the probability of your encountering a
    machine which isn't strictly big-endian or little-endian.. What
    is the endian-ness of a system where sizeof(int) is 1?

    --
    James Kanze
     
    James Kanze, May 9, 2010
    #8
  9. Victor Bazarov

    Brian Guest

    On May 8, 6:31 pm, James Kanze <> wrote:
    > On May 8, 5:57 pm, Brian <> wrote:
    >
    > > On May 7, 11:47 am, James Kanze <> wrote:

    >
    >     [...]
    >
    > > > Byte order.  There are 24 different ways to order 4 bytes.  In
    > > > practice, I've only seen three.

    > > Are there any machines being newly introduced today that aren't
    > > big or little-endian?

    >
    > Not sure what you mean by "machines"; there are certainly new
    > models introduced for the machines without any byte order.  And
    > if you meant "architectures", the question would probably be
    > "are there any architectures being newly introduced today",
    > period.
    >
    > If the question concerned the probability of your encountering a
    > machine which isn't strictly big-endian or little-endian.. What
    > is the endian-ness of a system where sizeof(int) is 1?
    >


    I'd like to know whether hardware that is neither big nor
    little-endian is waxing or waning. I think it is waning,
    but am not positive.


    Brian Wood
     
    Brian, May 9, 2010
    #9
  10. Victor Bazarov

    James Kanze Guest

    On May 9, 6:10 pm, Brian <> wrote:
    > On May 8, 6:31 pm, James Kanze <> wrote:


    [...]
    > > If the question concerned the probability of your encountering a
    > > machine which isn't strictly big-endian or little-endian.. What
    > > is the endian-ness of a system where sizeof(int) is 1?


    > I'd like to know whether hardware that is neither big nor
    > little-endian is waxing or waning. I think it is waning,
    > but am not positive.


    It depends on what you consider the situation for systems wherer
    sizeof(int) is 1. I think those are waxing. Otherwise, I don't
    think you'll find a system today which isn't either big-endian
    or small-endian, but I can't be sure---if there is one, it's
    probably some small embedded system. They were extremely common
    in the past, however.

    What you *will* find (and I don't think their numbers are either
    waxing or waning) is machines with 9 bit bytes and with other
    than 2's complement.

    --
    James Kanze
     
    James Kanze, May 11, 2010
    #10
  11. Victor Bazarov

    Brian Guest

    On May 11, 11:57 am, James Kanze <> wrote:

    >
    > It depends on what you consider the situation for systems wherer
    > sizeof(int) is 1.  I think those are waxing.  Otherwise, I don't
    > think you'll find a system today which isn't either big-endian
    > or small-endian, but I can't be sure---if there is one, it's
    > probably some small embedded system.  They were extremely common
    > in the past, however.
    >


    I'm not sure what would be a good next step portability-wise
    as far as what I'm working on. I have this:

    #if CHAR_BIT != 8
    #error Only 8 bit char supported
    #endif

    in several files. IIRC sizeof(char) can be equal to sizeof(int),
    so what I have wouldn't strictly disqualify sizeof(int) being one,
    but I guess generally when sizeof(int) is one, there are more
    than 8 bits in a char.


    > What you *will* find (and I don't think their numbers are either
    > waxing or waning) is machines with 9 bit bytes and with other
    > than 2's complement.
    >


    I'm more optimistic about making CHAR_BIT == 32 work than
    dealing with it being 9.

    On the other hand I'm not sure if these matters should be
    higher priority than encryption support. I've kind of been
    avoiding the encryption stuff as it invovles choosing a
    good library -- a marriage of sorts -- and it hasn't been
    an easy decision.


    Brian Wood
     
    Brian, May 11, 2010
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter the Swede
    Replies:
    10
    Views:
    18,393
    Roedy Green
    Nov 15, 2003
  2. Hendrik Wendler

    comments accross platforms

    Hendrik Wendler, Aug 8, 2005, in forum: C++
    Replies:
    9
    Views:
    335
    Ben Pope
    Aug 9, 2005
  3. Jeff Flinn
    Replies:
    3
    Views:
    714
    Brian
    May 8, 2010
  4. Returning objects accross platforms

    , Nov 19, 2005, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    114
  5. Robert Feldt
    Replies:
    8
    Views:
    200
    Chad Fowler
    Aug 26, 2004
Loading...

Share This Page