Chris Torek said:
A more usual motivation is to make use of special machine
instructions the compiler would not generate on its own.
[...]
I agree with all of this; however, in some (sometimes significant)
cases (e.g., the actual implementation for a mutex), you may want
to have an inline expansion of the underlying atomic operation,
typically via a macro. For instance, if you have a mutex construct
that -- at least in the uncontested case -- is just a (possibly
locked) compare-and-swap, you may want the x86-specific version
of:
[...]
The tricky part lies not only in arranging for the assembly equivalent
to be inserted inline, but in *also* informing the compiler that
it must not move certain memory operations across the "special"
instruction(s).
[...]
Indeed!
http://groups.google.com/group/comp.arch/msg/c6f096adecdd0369
(refer to the last couple of paragraphs...)
;^)
FWIW, in order to correctly implement this kind of stuff, you simply have to
define exactly how you are going to address two fundamental problems:
1: Compiler Reordering
2: Hardware Reordering
--1-- The compiler reordering issue can "usually" be resolved by strictly
adhering to a design policy which declares that all functions that contain
"critical-sequences" of instructions that have to be executed in precise
order must be externally assembled. This is due to the current fact that the
C Standard doesn't think threads even exist. However, an Assembler is a
different story IMO simply because it gives you full access to the
architecture your targeting and it will not reorder any your assembly
statements; what you see is exactly what you get.
IMO, a typical C compiler is usually forced to treat any call into an
"unknown and external" function in a fairly "pessimistic" manor", which in
turn basically renders its behavior to something that is analogous to a
so-called "compiler barrier". However, please note that some compilers are
exploring link-time optimizations which can, and probably will, turn out to
be an annoying and potentially hazardous scenario to any function that
simply expects every instruction its made up of will be executed exactly
as-is. Period. Unfortunately, this definitely includes basically all
externally assembled functions that a lock-free library may export by
default.
;^(...
However, all is not list because it does seem like the compilers that do
support link-time optimizations' also provide some method for turning it
on/off. Usually, they allow you to decorate your "critical-function"
declarations with something that guarantees that they will never be
subjected to this particular type of optimization.
--2-- Hardware reordering is easily solved by placing the proper the memory
barrier instructions in the correct places throughout your externally
assembled lock-free library. The assembler won't reorder any instructions,
therefore, this is the only real solution wrt actually implementing this
kind of stuff.
Therefore, it is my theses that a safe method for ensuring that calls into
"critical-function" will not be tampered with must include a combination of
solutions that directly resolve all of the reordering issues that are
attributed to both the hardware your targeting, and the C compiler your
using...
Any thoughts?
[...]
The compiler may think the second version is superior (because it
uses less CPU time overall, e.g., due to reduced register pressure
or because it schedules better), but in fact, it is not.
;^)