mathog said:
Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that
Mystract *instance2 = malloc(sizeof(Mystruct));
would have used, but will always be on a multiple of 2.
1. Will this always work?
(snip)
(Eric Sosman wrote
Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.
On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:
typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;
Taking into account the various answers in this thread to the preceding
question it would seem that there is still no safe (safe meaning here
"code that will work on any platform") method to use a C struct
to extract binary data directly from memory, since the compiler can
throw in padding and alignment requirements whenever it feels like it.
While I believe that is true, it is still most likely that you can
use it on any appropriate boundary.
Even if the data is reduced to byte representation, something like this
(I know it isn't the same as Mystruct4 above):
typedef struct {
uint8_t one[2]; /* actually uint16_t */
uint8_t two[4]; /* actually uint32_t */
int8_t three[2]; /* etc. */
int8_t four;
} Mystruct4b;
it seems not to be safe to pass a "Mystruct4b *" pointer to a function
which references this data at an arbitrary location in memory. Instead
the only safe method is to pass "char *" pointers and take the data
apart with memcpy() at a very low level, moving it from memory to the
structure, or vice versa.
Yes, that is always the most reliable way. Especially if you have
the possibility of different endianness. (Not to menion all the other
possible different representations.)
What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines
Well, pretty much it optimizes for passing data around within the
program.
For many years Fortran required, or at least it was believed to require,
that COMMON blocks be packed (no padding). On some machine, that was
just a little slow, on others it required run-time trap for the
misaligned access, copy the data, perform the operation, copy the data
back again, and then return. Much much slower.
Most RISC processors require data to be aligned, though some have
special instructions for access to misaligned data. (Faster than a byte
copy and performing the operation on the copy.)
/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;
Doesn't help if the actual representation, such as endianness,
is different.
Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:
myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}
and the compiler would do the "right thing", using memcpy or whatever,
to hide all of the cruft that is platform specific, up to and including
loading the magic N bytes of memory (as in the Alpha issue Glen
mentioned), shifting, and masking and so forth, to handle data types
smaller than is native for the CPU, without the programmer having to
ever care about the details.
In the cases where the compiler knows in advance, it isn't
so bad. But then if you pass a pointer to misaligned data, then it is
referenced in a place that doesn't expect it to be misaligned.
Or, the compiler has to generate slow (maybe much slower) code
for all accesses.
One might even dream that the compiler could be induced to do:
Mystruct4c native,ncopy;
Mystruct4b *foreign
Mystruct4b acopy;
/* field names/sizes must match for the following statement */
alternate_representations {Mystruct4b, Mystruct4c}
Well, when C was new there were still plenty of 36 bit, 48 bit,
and 60 bit machines around, and probably others that I don't know
about. In addition, there is at least (still in the standard) allowance
for sign magnitude or ones complement representation, and finally
endianness. But yes, it could be done.
char *buffer;
/* fill buffer from a file or network */
foreign = (Mystruct4b *)buffer[123];
native = *foreign; /* field to field copy, NOT a memcpy*/
/* change some data in native*/
*foreign = native; /* field to field copy */
ncopy = native; /* this is a memcpy */
acopy = *foreign; /* as is this */
*foreign = acopy; /* as is this */
ncopy = acopy; /* field to field copy */
Well, there is XDR
http://www.ietf.org/rfc/rfc4506.txt
which will do all the work for any reasonable, and also not
so reasonable representation.
That is, when a memstruct pointer is dereferenced it does not mean quite
the same thing as when a struct pointer is dereferenced. A few more
rules for the compiler, a lot less work for the programmer.
Well, as in another thread, consider C a portable assembler.
It helps you, but you still have to do some of the work.
-- glen