Yes, this was the point of discussion. This is also something I
couldn't get a grasp of. If I can store SomeStruct in p and UserStruct
in x, then why I can't I store UserStruct in q and SomeStruct in y?
What happens if I go ahead and do it? Will the system crash?
Others have gone into detail, and at this point it is not clear
what p, q, x, and y might be anyway, but ...
The fundamental problem here is the thing called "alignment".
When you do pointer arithmetic, if you start with a well-aligned
pointer, avoid casts, and avoid cheating with "void *", you always
end with a well-alignment pointer.
The malloc() function always either returns NULL (no more memory
available), or a "well-aligned" pointer, suitably aligned for use
as *any* data type, including user-defined types. So you start
with a "well-aligned" pointer in this case, and if you avoid casts
(and do not use "void *" intermediates as a "cheat" -- more on this
later), you will have a well-aligned pointer.
The C standard says that if you use a well-aligned pointer, it has
to work. So any compiler that can claim to implement C has to make
it work, no matter what that takes.
On the other hand, if you use casts to *change* pointer types while
doing arithmetic on them, the C standard does *not* say that the
result will be suitably aligned. The behavior becomes officially
"undefined".
In practice, the actual behavior depends on the implementation --
both the CPU and the compiler. So does what "suitably aligned"
means. On some hardware, *everything* is "suitably aligned", and
it always works no matter what. (For instance, the x86 is like
this. Sometimes the code runs a little slower, is all.) On
other hardware, different things happen.
But consider instead the PowerPC architectures (such as in a Mac
G4 or G5). Here, the machines are less forgiving. If you use a
"load byte" or "store byte" instruction, you can point to any byte
in memory at all. But if you use a "load halfword" or "store
halfword" instruction (to access a 16-bit item, e.g., a "short" in
most C compilers for these), the hardware refuses to do it if the
address is not even (zero mod 2, mathematically). If you use a
"load word" or "store word" instruction (to access a 32-bit item),
the hardware refuses to do it if the address is not a multiple of
4.
The form of the refusal -- its effect on your program when you run
it -- depends on various items. Using OS X or Linux, you will get
a signal that will normally terminate your program (usually with a
"core dump"). If your program runs deep enough in the operating
system, you could indeed get a "system crash", though.
On yet another kind of hardware, the ARM family of CPUs, something
even weirder happens. If you create a "poorly aligned" pointer,
and then ask the CPU to use it, the CPU *does* use it -- but it
first "shaves off" the bits that caues it to be poorly aligned,
resulting in a well-aligned pointer that it *can* use. The resulting
well-aligned pointer no longer points where the original, poorly-aligned
pointer used to point. As with the PowerPC, alignment requirements
are a function of data access size -- so the result is that:
*(int *)p = 42;
is equivalent to:
*(int *)((intptr_t)p & ~3) = 42;
Because the C standard says that the effect of an unaligned pointer
is "undefined", *all* of these behaviors are allowed. All of these
systems can call themselves "C implementations", even though the
x86, PowerPC, and ARM all do something *different*.
The C standard says, in effect, that it is your job -- the C
programmer's job -- to make sure you never produce an unaligned
pointer value. To help you, it makes a few promises: malloc()
returns a well-aligned pointer (for any use), and "simple" (castless)
pointer arithmetic keeps a well-aligned pointer (for whatever use
it has now) well-aligned. So:
T *p; /* for any type T */
p = malloc(N * sizeof *p); /* obtain room for N items of type T */
if (p == NULL) ... handle error ...
*(p + k) = value; /* is OK: there are no casts */
If you find you need a cast, for any reason, this is a sign that
you *may* be heading your "C boat" into shark-infested, reef-laden
waters.

So:
T *p;
T2 *q;
p = malloc(sizeof *p + N); /* room for 1 p, plus N extra bytes */
if (p == NULL) ... handle error ...
*p = whatever; /* OK */
q = (T2 *)(p + 1); /* danger */
While "p+1" is well-aligned for use as a type "T", it is not
*necessarily* well-aligned for use as a type "T2". It is not
necessarily badly-aligned either. The C standard simply does not
say, one way or the other (with one exception: bytes -- "char"s,
specifically unsigned char although plain and signed char get
carried along for the ride -- are always well-aligned, no matter
what else is going on).
Now, you can avoid the casts by "cheating" with a "void *"
temporary variable:
T *p;
void *tmp;
T2 *q;
p = malloc(sizeof *p + N); /* room for 1 p, plus N extra bytes */
if (p == NULL) ... handle error ...
*p = whatever; /* OK */
tmp = p + 1; /* OK */
q = tmp; /* danger */
Even though the cast has now been removed, the danger remains.
The problem is -- still -- that "p + 1" is well-aligned for use as
a "T" (because p came from malloc, so it was well-aligned to start
with), but not necessarily for use as a "T2".
Again, the underlying fundamental problem -- which is hardware
and sometimes compiler dependent -- is "alignment".
It would perhaps be nice if C had a tool for checking whether
something was "well-aligned":
tmp = p + 1;
if (IS_WELL_ALIGNED_FOR_USE_AS_A_T2(tmp))
... proceed ...
else
... something ...
except this leaves two problems, one obvious:
- what do we put in the "else"?
and one slightly less obvious:
- what would we need to do to *guarantee* that tmp is well-aligned
for use as a T2?
That second question is really the important one. Suppose that
you had a way to predict the alignment requirements for "T2"s.
We know (from C's basic design) that there must be some number
of bytes ("char"s) that we can use as padding, placed after a
T and before a T2, so that the T2 that comes after the (possibly
empty) padding and is now well-aligned:
<object of type T><padding><object of type T2>
which is what you would get if you did this:
struct combined {
T first;
/* compiler inserts padding here if needed */
T2 second;
};
Then you could do this:
char *x;
...
p = malloc(sizeof *p + sizeof(T2) + worst_case_padding);
... the usual check for NULL ...
x = (char *)(p + 1);
x += however_much_padding_is_required;
q = (T2 *)x;
But if you know the types up-front like this, you can always just
*make* a "struct combined", and replace all of the above with:
struct combined *comb;
comb = malloc(sizeof *comb);
if (comb == NULL) ...
p = &comb->first;
q = &comb->second;
So this is not quite as useful as it seems.
The real problem comes in when you do *not* know the types up-front.
Suppose, for instance, you are writing a replacement malloc() (which
you might name "nmalloc", perhaps). You will probably want to make
the same guarantee that C does about malloc(): that it returns a
pointer that is well-aligned for use as *any* type, no matter what
that type may be. But all malloc() -- and hence nalloc() -- gets
is a size in bytes. How can it find the type? Or, equivalently,
how can it find the "maximum" alignment required by the compiler
and hardware?
The only real Standard C answer is "it can't". You cannot write
malloc() in Standard C. You need at least one piece of "NonStandard
C": you need to know the underlying alignment constraints of the
hardware.
So, if you want to write a general-purpose allocator, you have
two choices:
- forget it, or
- "don't use Standard C".
The first one is often not satisfactory. Note that in abandoning
the C standard while writing this allocator, you need not abandon
Standard C *everywhere*. You can do it just in this one little
place, in this one little way, and hope (with good justification:
you know your implementor had to do this too) that this does not
cause the whole program to fall apart somehow.
Note, though, that the first approach ("forget it") is actually
often a good one. For instance, *if* you can make a "struct
combined", as I did above, you can then obtain your two pointers
(p and q above) in one malloc() call, and know that both are
well-aligned. The big limitation here is that you must already
have the types of all the sub-objects (T and T2, in this case) set
in stone.