endian conversion - composite type

Discussion in 'C++' started by ma740988, Jan 10, 2007.

  1. ma740988

    ma740988 Guest

    Data stored on a storage device is byte swapped. The data is big
    endian and my PC is little. At issue: There's a composite type ( a
    header ) at the front of the files that I'm trying to read in. I'm
    trying to _simulate_ the endian conversion in code below but I'm just
    wondering if there's an ideal way to do this besides what's shown?
    Padding produces some interesting results. Notice how the parameter d
    is different in the print outs . Serializing the data - at the
    present time - is not an option.
    An aside: Matlab is my prime analysis tool. With matlab I could pass
    a parameter to the fopen call and all's well. I'm trying to write
    code to do something similar. Thanks in advance

    #include <cstdio>
    #include <iostream>

    typedef unsigned char uc_type ;

    #define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
    void ByteSwap( unsigned char * b, int n)
    {
    register int i = 0;
    register int j = n - 1;
    while ( i < j )
    {
    std::swap( b[ i ], b[ j ] );
    i++, j--;
    }
    }


    struct foo { // lets try a simple struct
    short a; // works
    short b; // works
    unsigned d ; // introduced padding
    //char test [ 5 ] ; // swap these
    //double dd ;
    //float ar ;
    };


    void showBytes( foo *barp )
    {
    size_t i;
    unsigned char *cp = (unsigned char *)barp;

    for (i = 0 ; i < sizeof(*barp) ; ++i ) {
    printf("0x%02X ", (unsigned int)cp);
    }
    std::cout << std::endl;
    }

    void showBytes( foo& barp )
    {
    std::cout << barp.a << std::endl;
    std::cout << barp.b << std::endl;
    std::cout << barp.d << std::endl;
    }

    int main()
    {
    foo bar = {0x0102, 0x0304, 0x2030 };

    showBytes( &bar );
    ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
    ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
    ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;

    //showBytes( bar ) ;
    showBytes( &bar );

    return 0;
    }
    /*
    0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
    0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
    Press any key to continue
    */
    ma740988, Jan 10, 2007
    #1
    1. Advertising

  2. ma740988 wrote:

    > Data stored on a storage device is byte swapped. The data is big
    > endian and my PC is little. At issue: There's a composite type ( a
    > header ) at the front of the files that I'm trying to read in. I'm
    > trying to _simulate_ the endian conversion in code below but I'm just
    > wondering if there's an ideal way to do this besides what's shown?


    The best way to read binary files is to use an unsigned char buffer and
    convert from this buffer to the structure you use in the program for that
    data. You make the conversion as complex as your goal of portability are,
    considering endianess, type of sign enconding used...

    A bit more code to write at first, but avoids the need to worry about
    padding and many other issues.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jan 10, 2007
    #2
    1. Advertising

  3. ma740988

    Robert Mabee Guest

    Julián Albo wrote:
    > The best way to read binary files is to use an unsigned char buffer and
    > convert from this buffer to the structure you use in the program for that
    > data. You make the conversion as complex as your goal of portability are,
    > considering endianess, type of sign enconding used...
    >
    > A bit more code to write at first, but avoids the need to worry about
    > padding and many other issues.


    To clarify, the converting code needs to worry about padding inserted in
    the byte stream because the source wrote entire structs.

    I suggest making it look like a stream filter reading chars from an
    underlying stream so you won't ever deal with the buffer and boundary
    conditions. Each function to read a particular type needs to a) skip
    padding bytes that the source would have inserted to align that type;
    b) read and assemble the bytes of the object; c) perhaps do something
    really hard for floating-point data using a different representation,
    or for bitfield data; d) pick up the value as the correct type and
    return it. Sometimes you'll find shortcuts, as when 32 bit data only
    needs 16 bit alignment so can be fetched by two calls to the 16 bit
    fetcher.

    I would add separate functions to mark the beginning and end of each
    struct as there is additional padding there not related to the type of
    the next member. This will require you to analyze the struct so you
    can pass in the alignment the source machine will have assumed for the
    struct as a whole. At least you won't have to make every single pad
    explicit.

    Once, when faced with too much foreign data, I wrote functions to take
    a dense character string description of a struct like "ssslccl" and
    convert to and from the foreign form, knowing the padding requirements
    of both forms.

    I consider this a defect in the language. I should be able to declare
    the interface properties of the struct (padding, byte order, FP format)
    in a standard way and let the compiler choose to implement it or reject
    it or maybe half-implement it so special functions could be applied to
    the members that can't be accessed normally. We do it anyway for device
    drivers with memory-mapped I/O and for MMU structures, but fighting the
    compiler every step of the way.
    Robert Mabee, Jan 10, 2007
    #3
  4. ma740988

    bjeremy Guest

    ma740988 wrote:
    > Data stored on a storage device is byte swapped. The data is big
    > endian and my PC is little. At issue: There's a composite type ( a
    > header ) at the front of the files that I'm trying to read in. I'm
    > trying to _simulate_ the endian conversion in code below but I'm just
    > wondering if there's an ideal way to do this besides what's shown?
    > Padding produces some interesting results. Notice how the parameter d
    > is different in the print outs . Serializing the data - at the
    > present time - is not an option.
    > An aside: Matlab is my prime analysis tool. With matlab I could pass
    > a parameter to the fopen call and all's well. I'm trying to write
    > code to do something similar. Thanks in advance
    >
    > #include <cstdio>
    > #include <iostream>
    >
    > typedef unsigned char uc_type ;
    >
    > #define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
    > void ByteSwap( unsigned char * b, int n)
    > {
    > register int i = 0;
    > register int j = n - 1;
    > while ( i < j )
    > {
    > std::swap( b[ i ], b[ j ] );
    > i++, j--;
    > }
    > }
    >
    >
    > struct foo { // lets try a simple struct
    > short a; // works
    > short b; // works
    > unsigned d ; // introduced padding
    > //char test [ 5 ] ; // swap these
    > //double dd ;
    > //float ar ;
    > };
    >
    >
    > void showBytes( foo *barp )
    > {
    > size_t i;
    > unsigned char *cp = (unsigned char *)barp;
    >
    > for (i = 0 ; i < sizeof(*barp) ; ++i ) {
    > printf("0x%02X ", (unsigned int)cp);
    > }
    > std::cout << std::endl;
    > }
    >
    > void showBytes( foo& barp )
    > {
    > std::cout << barp.a << std::endl;
    > std::cout << barp.b << std::endl;
    > std::cout << barp.d << std::endl;
    > }
    >
    > int main()
    > {
    > foo bar = {0x0102, 0x0304, 0x2030 };
    >
    > showBytes( &bar );
    > ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
    > ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
    > ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;
    >
    > //showBytes( bar ) ;
    > showBytes( &bar );
    >
    > return 0;
    > }
    > /*
    > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
    > 0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
    > Press any key to continue
    > */


    why can't you just do a ntohs, ntohl once you read data off your
    storage device. If your pc is little endian, so the ntohl/ntohs
    shouldn't be a no-op, and they will swap the bytes for you. The only
    problem you may encounter is if your composite header uses nibbles in
    order to store data... each nibble would need to be manually swapped
    before you recompose your header.
    bjeremy, Jan 10, 2007
    #4
  5. Robert Mabee wrote:

    >> The best way to read binary files is to use an unsigned char buffer and
    >> convert from this buffer to the structure you use in the program for that
    >> data. You make the conversion as complex as your goal of portability are,
    >> considering endianess, type of sign enconding used...
    >>
    >> A bit more code to write at first, but avoids the need to worry about
    >> padding and many other issues.

    >
    > To clarify, the converting code needs to worry about padding inserted in
    > the byte stream because the source wrote entire structs.


    From the reader point of view this is unimportant. The padding from the
    writer's compiler can be seen the same as a FILLER in Cobol, a part of the
    organization of the file.

    > I suggest making it look like a stream filter reading chars from an
    > underlying stream so you won't ever deal with the buffer and boundary
    > conditions. Each function to read a particular type needs to a) skip
    > padding bytes that the source would have inserted to align that type;


    Is doable, but may be difficult to evaluate the padding conditions.

    > c) perhaps do something really hard for floating-point data using a
    > different representation, or for bitfield data;


    Yes, because of that I said that more or less effort will be needed
    depending of the portability goal.

    > Once, when faced with too much foreign data, I wrote functions to take
    > a dense character string description of a struct like "ssslccl" and
    > convert to and from the foreign form, knowing the padding requirements
    > of both forms.


    Some time ago I wrote a program that takes a description of the record and
    displayed the content of a file according to it. The same can be done
    inside a program, or in a program that generates code to be used in the
    program that deals with the data.

    > I consider this a defect in the language. I should be able to declare
    > the interface properties of the struct (padding, byte order, FP format)
    > in a standard way and let the compiler choose to implement it or reject
    > it or maybe half-implement it so special functions could be applied to
    > the members that can't be accessed normally.


    There is no need to make part of the language a thing perfectly doable
    without direct language support. This is a general design principle of C++.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jan 10, 2007
    #5
  6. ma740988

    ma740988 Guest

    Julián Albo wrote:
    > ma740988 wrote:
    >
    > > Data stored on a storage device is byte swapped. The data is big
    > > endian and my PC is little. At issue: There's a composite type ( a
    > > header ) at the front of the files that I'm trying to read in. I'm
    > > trying to _simulate_ the endian conversion in code below but I'm just
    > > wondering if there's an ideal way to do this besides what's shown?

    >
    > The best way to read binary files is to use an unsigned char buffer and
    > convert from this buffer to the structure you use in the program for that
    > data. You make the conversion as complex as your goal of portability are,
    > considering endianess, type of sign enconding used...


    Do you know of/have an example of this anywhere I could peruse?
    ma740988, Jan 11, 2007
    #6
  7. ma740988 wrote:

    >> The best way to read binary files is to use an unsigned char buffer and
    >> convert from this buffer to the structure you use in the program for that
    >> data. You make the conversion as complex as your goal of portability are,
    >> considering endianess, type of sign enconding used...

    > Do you know of/have an example of this anywhere I could peruse?


    I posted a sample code some time ago in this group, you can try to find it
    in google groups.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jan 11, 2007
    #7
  8. ma740988

    Grizlyk Guest

    ma740988 wrote:

    > Data stored on a storage device is byte swapped. The data is big
    > endian and my PC is little.


    > foo bar = {0x0102, 0x0304, 0x2030 };
    >
    > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00


    Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
    endian?

    0x2030 = = 0x00002030 is not the same as 0x20300000

    "0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
    high 16 bit big-endian word
    It looks like mixed endian (google sad - middle-endian(PDP-endian)). In
    the case you can not swap bytes in the same manner as words.

    for 0x50607080

    big endian is:
    word: low byte , high byte
    dword: low word, high word

    " 0x80, 0x70, 0x60, 0x50 "

    little endian must have been:
    word: high byte, low byte
    dword: high word, low word

    " 0x00, 0x00, 0x20, 0x30 "

    Use:
    ?#include <netinet/in.h>
    htons(), htonl(), ntohs(), ntohl() - POSIX functions.
    Grizlyk, Jan 14, 2007
    #8
  9. ma740988

    Grizlyk Guest

    Grizlyk wrote:

    Fuu, sorry, I see, i have mixed all in my poor head with the huge
    number of "endians" applied everywhere.

    I have replaced your PC's "endians" and your data's "endians", who is
    what and simultaneously replaced "little-endian" and "big-endian" names
    for byte order.

    > ma740988 wrote:
    >
    > > Data stored on a storage device is byte swapped. The data is big
    > > endian and my PC is little.

    >
    > > foo bar = {0x0102, 0x0304, 0x2030 };
    > >
    > > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00

    >
    > Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
    > endian?


    Yes, it is correct little endian data on little endian PC.

    > "0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
    > high 16 bit big-endian word


    No, "0x30 0x20" - low 16 bit little-endian word was placed befor "0x00
    0x00" - high 16 bit little-endian word, was correct placed for
    little-endian 32 bit dword.

    > It looks like mixed endian


    No, this is wrong

    > for 0x50607080
    >
    > big endian is:
    > word: low byte , high byte
    > dword: low word, high word
    >
    > " 0x80, 0x70, 0x60, 0x50 "


    No, this is little endian

    > little endian must have been:
    > word: high byte, low byte
    > dword: high word, low word
    >
    > " 0x00, 0x00, 0x20, 0x30 "
    >


    " 0x50, 0x60, 0x70, 0x80 "
    No, this is big endian

    It seems to me, the "endians" distribution are more correct. Or no?
    Grizlyk, Jan 14, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hicham
    Replies:
    2
    Views:
    9,017
    dxcoder
    Jul 2, 2003
  2. Ernst Murnleitner

    float: IEEE, big endian, little endian

    Ernst Murnleitner, Jan 13, 2004, in forum: C++
    Replies:
    0
    Views:
    862
    Ernst Murnleitner
    Jan 13, 2004
  3. invincible

    Little Endian to Big Endian

    invincible, Jun 14, 2005, in forum: C++
    Replies:
    9
    Views:
    14,343
    Old Wolf
    Jun 14, 2005
  4. invincible
    Replies:
    1
    Views:
    542
    red floyd
    Jun 14, 2005
  5. hicham

    convert from big-endian to little-endian

    hicham, Jun 30, 2003, in forum: C Programming
    Replies:
    0
    Views:
    1,522
    hicham
    Jun 30, 2003
Loading...

Share This Page