Le 05/04/12 22:46, pembed2012 a écrit :
This horror is called "gnu inline assembler".
I have always refused to learn it.
You're being pretty silly because it is powerful and useful, allowing inline
assembly to dove-tail nicely with the code generated by the surrounding C.
GCC can do things like pick registers for you to use and prepare the operands
in the registers, and take care to reload cached operands.
You can advise the compiler that your operation clobbers certain registers or
memory, or other CPU state. That's very good, and we can see it going on in
the above example.
%0 and %1 are virtual registers. 0 is the output operand corresponding to rv,
%and %1 is an input operand corresponding to sl. GCC will choose those
registers and generate the code to prepare their values before your assembly
sequence. Terrific! Without this ability you would have to hard-code the
registers. And since you don't know if they are being used by the compiler or
not, you would have to save and restore them which is a waste of cycles
and cache traffic.
The code also advises GCC that the instruction sequence has an effect on
cr0 and on memory (the latter being needed in situations when the compiler
might have that memory operand cached in a register). The memory
advice is needed here even though the instruction sequence does *not* in fact
have any effect on some unmentioned memory operand: this is because it
is a locking primitive which semantically needs a memory barrier, and this
memory-clobber has the effect of "spooking" the compiler into providing the
necessary software memory barrier (register spill, reload) to go with the
hardware one (isync).
A few years ago I was working on MIPS and needed some
atomic instructions, making profitable use of this GCC inline syntax.
Actually, code very similar to what we are dealing with, but on another
architecture.
These instructions sequences were to be used in proprietary code so I didn't
want to just take GNU-licensed anything. But I had a look.
Other people were defining rigid code templates for the operations, like
acquiring a spinlock, or compare-and-swap, etc. (Very much like the above:
the whole thing is "canned": the load, comparison, store, loop and barrier).
Instead, I wrapped at a lower level: I created the inline primitives to just do
the "load_linked" and "store_conditional" MIPS operations, with a C interface.
Then I wrote the algorithms for things like acquiring a lock or
compare-swap in C syntax instead of assembly, as easy-to-read little
inline functions:
uint32_t x;
/* loop while lock is busy, or we are not successfully able
to flip it to busy with a store conditional */
do {
x = load_linked(&addr);
} while (x || store_conditional(&addr, 1))
read_barrier(); /* make sure we don't read stale cached data inside lock */
Guess what: GCC was able to optimize and shave some instructions out of stuff
like that, compared to the canned machine language sequences used in other
projects.
This is what I mean by dovetailing: you can do inline assembly on a fine
granularity and let GCC take it into the mix and optimize with it.