endian conversion - composite type

M

ma740988

Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?
Padding produces some interesting results. Notice how the parameter d
is different in the print outs . Serializing the data - at the
present time - is not an option.
An aside: Matlab is my prime analysis tool. With matlab I could pass
a parameter to the fopen call and all's well. I'm trying to write
code to do something similar. Thanks in advance

#include <cstdio>
#include <iostream>

typedef unsigned char uc_type ;

#define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
void ByteSwap( unsigned char * b, int n)
{
register int i = 0;
register int j = n - 1;
while ( i < j )
{
std::swap( b[ i ], b[ j ] );
i++, j--;
}
}


struct foo { // lets try a simple struct
short a; // works
short b; // works
unsigned d ; // introduced padding
//char test [ 5 ] ; // swap these
//double dd ;
//float ar ;
};


void showBytes( foo *barp )
{
size_t i;
unsigned char *cp = (unsigned char *)barp;

for (i = 0 ; i < sizeof(*barp) ; ++i ) {
printf("0x%02X ", (unsigned int)cp);
}
std::cout << std::endl;
}

void showBytes( foo& barp )
{
std::cout << barp.a << std::endl;
std::cout << barp.b << std::endl;
std::cout << barp.d << std::endl;
}

int main()
{
foo bar = {0x0102, 0x0304, 0x2030 };

showBytes( &bar );
ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;

//showBytes( bar ) ;
showBytes( &bar );

return 0;
}
/*
0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
Press any key to continue
*/
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

ma740988 said:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?

The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

A bit more code to write at first, but avoids the need to worry about
padding and many other issues.
 
R

Robert Mabee

Julián Albo said:
The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

A bit more code to write at first, but avoids the need to worry about
padding and many other issues.

To clarify, the converting code needs to worry about padding inserted in
the byte stream because the source wrote entire structs.

I suggest making it look like a stream filter reading chars from an
underlying stream so you won't ever deal with the buffer and boundary
conditions. Each function to read a particular type needs to a) skip
padding bytes that the source would have inserted to align that type;
b) read and assemble the bytes of the object; c) perhaps do something
really hard for floating-point data using a different representation,
or for bitfield data; d) pick up the value as the correct type and
return it. Sometimes you'll find shortcuts, as when 32 bit data only
needs 16 bit alignment so can be fetched by two calls to the 16 bit
fetcher.

I would add separate functions to mark the beginning and end of each
struct as there is additional padding there not related to the type of
the next member. This will require you to analyze the struct so you
can pass in the alignment the source machine will have assumed for the
struct as a whole. At least you won't have to make every single pad
explicit.

Once, when faced with too much foreign data, I wrote functions to take
a dense character string description of a struct like "ssslccl" and
convert to and from the foreign form, knowing the padding requirements
of both forms.

I consider this a defect in the language. I should be able to declare
the interface properties of the struct (padding, byte order, FP format)
in a standard way and let the compiler choose to implement it or reject
it or maybe half-implement it so special functions could be applied to
the members that can't be accessed normally. We do it anyway for device
drivers with memory-mapped I/O and for MMU structures, but fighting the
compiler every step of the way.
 
B

bjeremy

ma740988 said:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?
Padding produces some interesting results. Notice how the parameter d
is different in the print outs . Serializing the data - at the
present time - is not an option.
An aside: Matlab is my prime analysis tool. With matlab I could pass
a parameter to the fopen call and all's well. I'm trying to write
code to do something similar. Thanks in advance

#include <cstdio>
#include <iostream>

typedef unsigned char uc_type ;

#define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
void ByteSwap( unsigned char * b, int n)
{
register int i = 0;
register int j = n - 1;
while ( i < j )
{
std::swap( b[ i ], b[ j ] );
i++, j--;
}
}


struct foo { // lets try a simple struct
short a; // works
short b; // works
unsigned d ; // introduced padding
//char test [ 5 ] ; // swap these
//double dd ;
//float ar ;
};


void showBytes( foo *barp )
{
size_t i;
unsigned char *cp = (unsigned char *)barp;

for (i = 0 ; i < sizeof(*barp) ; ++i ) {
printf("0x%02X ", (unsigned int)cp);
}
std::cout << std::endl;
}

void showBytes( foo& barp )
{
std::cout << barp.a << std::endl;
std::cout << barp.b << std::endl;
std::cout << barp.d << std::endl;
}

int main()
{
foo bar = {0x0102, 0x0304, 0x2030 };

showBytes( &bar );
ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;

//showBytes( bar ) ;
showBytes( &bar );

return 0;
}
/*
0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
Press any key to continue
*/


why can't you just do a ntohs, ntohl once you read data off your
storage device. If your pc is little endian, so the ntohl/ntohs
shouldn't be a no-op, and they will swap the bytes for you. The only
problem you may encounter is if your composite header uses nibbles in
order to store data... each nibble would need to be manually swapped
before you recompose your header.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Robert said:
To clarify, the converting code needs to worry about padding inserted in
the byte stream because the source wrote entire structs.

From the reader point of view this is unimportant. The padding from the
writer's compiler can be seen the same as a FILLER in Cobol, a part of the
organization of the file.
I suggest making it look like a stream filter reading chars from an
underlying stream so you won't ever deal with the buffer and boundary
conditions. Each function to read a particular type needs to a) skip
padding bytes that the source would have inserted to align that type;

Is doable, but may be difficult to evaluate the padding conditions.
c) perhaps do something really hard for floating-point data using a
different representation, or for bitfield data;

Yes, because of that I said that more or less effort will be needed
depending of the portability goal.
Once, when faced with too much foreign data, I wrote functions to take
a dense character string description of a struct like "ssslccl" and
convert to and from the foreign form, knowing the padding requirements
of both forms.

Some time ago I wrote a program that takes a description of the record and
displayed the content of a file according to it. The same can be done
inside a program, or in a program that generates code to be used in the
program that deals with the data.
I consider this a defect in the language. I should be able to declare
the interface properties of the struct (padding, byte order, FP format)
in a standard way and let the compiler choose to implement it or reject
it or maybe half-implement it so special functions could be applied to
the members that can't be accessed normally.

There is no need to make part of the language a thing perfectly doable
without direct language support. This is a general design principle of C++.
 
M

ma740988

Julián Albo said:
The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

Do you know of/have an example of this anywhere I could peruse?
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

ma740988 said:
Do you know of/have an example of this anywhere I could peruse?

I posted a sample code some time ago in this group, you can try to find it
in google groups.
 
G

Grizlyk

ma740988 said:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little.
foo bar = {0x0102, 0x0304, 0x2030 };

0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00

Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
endian?

0x2030 = = 0x00002030 is not the same as 0x20300000

"0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
high 16 bit big-endian word
It looks like mixed endian (google sad - middle-endian(PDP-endian)). In
the case you can not swap bytes in the same manner as words.

for 0x50607080

big endian is:
word: low byte , high byte
dword: low word, high word

" 0x80, 0x70, 0x60, 0x50 "

little endian must have been:
word: high byte, low byte
dword: high word, low word

" 0x00, 0x00, 0x20, 0x30 "

Use:
?#include <netinet/in.h>
htons(), htonl(), ntohs(), ntohl() - POSIX functions.
 
G

Grizlyk

Grizlyk wrote:

Fuu, sorry, I see, i have mixed all in my poor head with the huge
number of "endians" applied everywhere.

I have replaced your PC's "endians" and your data's "endians", who is
what and simultaneously replaced "little-endian" and "big-endian" names
for byte order.
Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
endian?

Yes, it is correct little endian data on little endian PC.
"0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
high 16 bit big-endian word

No, "0x30 0x20" - low 16 bit little-endian word was placed befor "0x00
0x00" - high 16 bit little-endian word, was correct placed for
little-endian 32 bit dword.
It looks like mixed endian

No, this is wrong
for 0x50607080

big endian is:
word: low byte , high byte
dword: low word, high word

" 0x80, 0x70, 0x60, 0x50 "

No, this is little endian
little endian must have been:
word: high byte, low byte
dword: high word, low word

" 0x00, 0x00, 0x20, 0x30 "

" 0x50, 0x60, 0x70, 0x80 "
No, this is big endian

It seems to me, the "endians" distribution are more correct. Or no?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top