How to inline assembly in a C program?

swept.along.by.events · Mar 3, 2013

Hi everyone,
I've been reading about this for a few days but didn't find anything relevant or clear enough.

I'm trying to learn how to write inline x86 assembly for gcc in linux. My problem is not writing assembly, but how to make the assembly work in C. I'm starting with this tiny function that multiplies two 64bit integers, putting the high 64b in *rh and the low in *rl:

void Mul64c( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__uint128_t r = (__uint128_t)a * (__uint128_t)b;
*rh = (uint64_t)(r >> 64);
*rl = (uint64_t)(r);
}

After reading various manuals, I wrote this:

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,(%0);"
"mov %%rax,(%1);"
: "=D" (rh),
"=S" (rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

From what I read, integers and pointers are passed in registers %rdi, %rsi,%rdx, %rcx, so I put "=D", "=S", "d", "c" in the output/input constraints. But when I build the file with

gcc -O2 -c mul64asm.c

and analyze the result with objdump, I see this:

0000000000000000 <Mul64asm>:
0: f3 c3 repz retq

So basically it's thinking that my code is a NOP? Why is that?

Thanks.

Johannes Bauer · Mar 3, 2013

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,(%0);"
"mov %%rax,(%1);"
: "=D" (rh),
"=S" (rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

and analyze the result with objdump, I see this:

0000000000000000 <Mul64asm>:
0: f3 c3 repz retq

So basically it's thinking that my code is a NOP? Why is that?

You've passed two pointers to the assembly part, but didn't tell the
assembler that you've actually dereferenced them, so your code is
optimized out. You may want to clobber memory (you only clobber rax at
the moment) or use __asm__ __volatile__.

Best regards,
Johannes

--

Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>

Philip Lantz · Mar 5, 2013

Johannes said:
You've passed two pointers to the assembly part, but didn't tell the
assembler that you've actually dereferenced them, so your code is
optimized out. You may want to clobber memory (you only clobber rax at
the moment) or use __asm__ __volatile__.

I recommend letting gcc know that you are using a memory operand:

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,%0;"
"mov %%rax,%1;"
: "=m" (*rh),
"=m" (*rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

It's also preferable to let the compiler choose the operand locations,
instead of specifying them, except where a specific register is
required, and let gcc generate the loads and stores.

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__("mul %3"
: "=d" (*rh),
"=a" (*rl)
: "a" (a),
"rm" (b)
);
}

Your original code (and also my first rewrite above) neglects to tell
the compiler that it clobbers rdx. The second version above fixes that.
The compiler assumes that the value it put in rdx (the parameter a) will
still be there. Since a isn't used again, it seems like it wouldn't
matter, but if this function is inlined, the compiler will know what is
in that register and may use it again. I just found a bug a couple days
ago with that exact problem. (Note, you can't just add rdx to the
clobber list in your version, since you specify it as an input operand.)

swept.along.by.events · Mar 5, 2013

I too recommend this sort of style. (I am not very familiar with inline

assembly on x86, but have used it with other targets.) Let gcc handle

the moves - that lets it optimise the code better. This is particularly

important if "Mul64asm" is made "static inline" so that it is mixed in

directly with other code. gcc will then be able to take advantage of

things like having "a" or "b" already in a register, or using the

results "*rl" or "*rh" without actually storing them out to memory. It

will also be able to overlap the "mov" instructions for one Mul64asm

with other code (assuming your cpu has enough registers) for better

pipelining, and it will mix and match the choice of registers used

(again, if your cpu has that choice). And of course, avoiding general

memory clobbers and "asm volatile" is a big help to optimisation.

Generally speaking, you let gcc do as much as possible, and keep the

assembly code to a minimum. It's not as important in a register-poor,

non-orthogonal architecture like the x86 where so much of the work goes

through the bottleneck of a single "rax" register, but it can make a

very big difference on more modern processor architectures with large

numbers of general-purpose registers, or half-way architectures like

x86-64 with its 16 registers.

Thanks a lot to both, Philip's second version works like a charm both as a separate function and inlined. Could you tell me if I'm reading it correctly?

: "=d" (*rh), // it's saying that *rh comes from the %rdx register
"=a" (*rl) // same, must take *rl from %rax
: "a" (a), // I want parameter 'a' in %rax before the mul
"rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with

Thanks!

Philip Lantz · Mar 6, 2013

swept.along.by.events said:
Thanks a lot to both, Philip's second version works like a charm both
as a separate function and inlined. Could you tell me if I'm reading
it correctly?

: "=d" (*rh), // it's saying that *rh comes from the %rdx register
"=a" (*rl) // same, must take *rl from %rax
: "a" (a), // I want parameter 'a' in %rax before the mul
"rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with

Yes, I think you are understanding it correctly.

Another way of saying it: "=d" (*rh) means that the assembly code
generates a result in rdx, which should be stored into *rh; "rm" (b)
means that the assembly code uses b as an operand, and the operand can
be in either register or memory.

How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024
Correct understanding of C99's restrict?	9	Jun 5, 2010
how run c program in protect mode with out OS	2	Jan 20, 2008
Mapping C code to assembly instructions	4	Jan 18, 2005
Please, how to use a token to perform mathematicals operations in a C snipe program ?	1	Feb 19, 2006
How to group objects from static library in a section?	4	Mar 27, 2008
Problem in compiling a C code with MSVC++6.00	23	Jun 26, 2006
How to build a loadable tcl dll with visual studio (microsoft C compiler)?[crosspost in comp.lang.tc	5	Sep 19, 2007

How to inline assembly in a C program?

swept.along.by.events

Johannes Bauer

Philip Lantz

swept.along.by.events

Philip Lantz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads