According to Brian Inglis said:
This approach makes a lot of sense, as a translation unit can then be
compiled with architectural alignment internally, ABI alignment for
external interfaces, or with no alignment in packed structures, or
compiled for a slight variant of the architecture on which it runs,
suffering only a speed penalty when alignment restrictions are not
strictly obeyed, rather than failure due to an exception.
I prefer the Alpha way. Data access on the Alpha should be aligned;
unaligned access triggers an exception (there are also opcodes for
unaligned access, which do not fail, but which are a bit slower, with
somehow the same speed penalty then a Pentium doing an unaligned
access). Traditionaly, Unix-like OS on the Alpha trap the exception, and
perform the access by software, which implies a huge penalty (thousands
of clock cycles) but no process failure. The offending access is logged.
This way, one can use legacy code which performs unaligned access, and
the OS helps in tracking down such misbehaved programs. When I got an
Alpha as a desktop machine in 1998, I installed Linux on it, and at
that time many programs were faulty with regards to alignment (they
were designed for a 32-bit world, not a 64-bit world, and programmers
did quite evil things with their values); the most prominent villain
was the XFree server. I was quite thankful to the OS for the automatic
correction of misaligned access because XFree was quite necessary to my
daily work.
Of course, the x86 platform has a long history of allowing misaligned
access, and too much code which uses such accesses; hence, the cpu
vendors (Intel, AMD,...) have no choice but to add the necessary logic
to handle such accesses with a minimal speed penalty. From their point
of view, they would gladly do without unaligned accesses, but legacy
code is stronger.
--Thomas Pornin