How to inline assembly in a C program?

Discussion in 'C Programming' started by swept.along.by.events@gmail.com, Mar 3, 2013.

  1. Guest

    Hi everyone,
    I've been reading about this for a few days but didn't find anything relevant or clear enough.

    I'm trying to learn how to write inline x86 assembly for gcc in linux. My problem is not writing assembly, but how to make the assembly work in C. I'm starting with this tiny function that multiplies two 64bit integers, putting the high 64b in *rh and the low in *rl:

    void Mul64c( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    {
    __uint128_t r = (__uint128_t)a * (__uint128_t)b;
    *rh = (uint64_t)(r >> 64);
    *rl = (uint64_t)(r);
    }

    After reading various manuals, I wrote this:

    void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    {
    __asm__( "mov %2, %%rax;"
    "mul %3;"
    "mov %%rdx,(%0);"
    "mov %%rax,(%1);"
    : "=D" (rh),
    "=S" (rl)
    : "d" (a),
    "c" (b)
    : "%rax"
    );
    }

    From what I read, integers and pointers are passed in registers %rdi, %rsi,%rdx, %rcx, so I put "=D", "=S", "d", "c" in the output/input constraints. But when I build the file with

    gcc -O2 -c mul64asm.c

    and analyze the result with objdump, I see this:

    0000000000000000 <Mul64asm>:
    0: f3 c3 repz retq

    So basically it's thinking that my code is a NOP? Why is that?

    Thanks.
     
    , Mar 3, 2013
    #1
    1. Advertising

  2. On 03.03.2013 21:45, wrote:

    > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    > {
    > __asm__( "mov %2, %%rax;"
    > "mul %3;"
    > "mov %%rdx,(%0);"
    > "mov %%rax,(%1);"
    > : "=D" (rh),
    > "=S" (rl)
    > : "d" (a),
    > "c" (b)
    > : "%rax"
    > );
    > }
    >
    > and analyze the result with objdump, I see this:
    >
    > 0000000000000000 <Mul64asm>:
    > 0: f3 c3 repz retq
    >
    > So basically it's thinking that my code is a NOP? Why is that?


    You've passed two pointers to the assembly part, but didn't tell the
    assembler that you've actually dereferenced them, so your code is
    optimized out. You may want to clobber memory (you only clobber rax at
    the moment) or use __asm__ __volatile__.

    Best regards,
    Johannes

    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, Mar 3, 2013
    #2
    1. Advertising

  3. Philip Lantz Guest

    Johannes Bauer wrote:
    > swept.along.by.events wrote:
    >
    > > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    > > {
    > > __asm__( "mov %2, %%rax;"
    > > "mul %3;"
    > > "mov %%rdx,(%0);"
    > > "mov %%rax,(%1);"
    > > : "=D" (rh),
    > > "=S" (rl)
    > > : "d" (a),
    > > "c" (b)
    > > : "%rax"
    > > );
    > > }
    > >
    > > and analyze the result with objdump, I see this:
    > >
    > > 0000000000000000 <Mul64asm>:
    > > 0: f3 c3 repz retq
    > >
    > > So basically it's thinking that my code is a NOP? Why is that?

    >
    > You've passed two pointers to the assembly part, but didn't tell the
    > assembler that you've actually dereferenced them, so your code is
    > optimized out. You may want to clobber memory (you only clobber rax at
    > the moment) or use __asm__ __volatile__.


    I recommend letting gcc know that you are using a memory operand:

    void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    {
    __asm__( "mov %2, %%rax;"
    "mul %3;"
    "mov %%rdx,%0;"
    "mov %%rax,%1;"
    : "=m" (*rh),
    "=m" (*rl)
    : "d" (a),
    "c" (b)
    : "%rax"
    );
    }

    It's also preferable to let the compiler choose the operand locations,
    instead of specifying them, except where a specific register is
    required, and let gcc generate the loads and stores.

    void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    {
    __asm__("mul %3"
    : "=d" (*rh),
    "=a" (*rl)
    : "a" (a),
    "rm" (b)
    );
    }

    Your original code (and also my first rewrite above) neglects to tell
    the compiler that it clobbers rdx. The second version above fixes that.
    The compiler assumes that the value it put in rdx (the parameter a) will
    still be there. Since a isn't used again, it seems like it wouldn't
    matter, but if this function is inlined, the compiler will know what is
    in that register and may use it again. I just found a bug a couple days
    ago with that exact problem. (Note, you can't just add rdx to the
    clobber list in your version, since you specify it as an input operand.)
     
    Philip Lantz, Mar 5, 2013
    #3
  4. Guest

    On Tuesday, March 5, 2013 9:43:30 AM UTC+1, David Brown wrote:
    > On 05/03/13 07:36, Philip Lantz wrote:
    >
    > > Johannes Bauer wrote:

    >
    > >> swept.along.by.events wrote:

    >
    > >>

    >
    > >>> void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )

    >
    > >>> {

    >
    > >>> __asm__( "mov %2, %%rax;"

    >
    > >>> "mul %3;"

    >
    > >>> "mov %%rdx,(%0);"

    >
    > >>> "mov %%rax,(%1);"

    >
    > >>> : "=D" (rh),

    >
    > >>> "=S" (rl)

    >
    > >>> : "d" (a),

    >
    > >>> "c" (b)

    >
    > >>> : "%rax"

    >
    > >>> );

    >
    > >>> }

    >
    > >>>

    >
    > >>> and analyze the result with objdump, I see this:

    >
    > >>>

    >
    > >>> 0000000000000000 <Mul64asm>:

    >
    > >>> 0: f3 c3 repz retq

    >
    > >>>

    >
    > >>> So basically it's thinking that my code is a NOP? Why is that?

    >
    > >>

    >
    > >> You've passed two pointers to the assembly part, but didn't tell the

    >
    > >> assembler that you've actually dereferenced them, so your code is

    >
    > >> optimized out. You may want to clobber memory (you only clobber rax at

    >
    > >> the moment) or use __asm__ __volatile__.

    >
    > >

    >
    > > I recommend letting gcc know that you are using a memory operand:

    >
    > >

    >
    > > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )

    >
    > > {

    >
    > > __asm__( "mov %2, %%rax;"

    >
    > > "mul %3;"

    >
    > > "mov %%rdx,%0;"

    >
    > > "mov %%rax,%1;"

    >
    > > : "=m" (*rh),

    >
    > > "=m" (*rl)

    >
    > > : "d" (a),

    >
    > > "c" (b)

    >
    > > : "%rax"

    >
    > > );

    >
    > > }

    >
    > >

    >
    > > It's also preferable to let the compiler choose the operand locations,

    >
    > > instead of specifying them, except where a specific register is

    >
    > > required, and let gcc generate the loads and stores.

    >
    > >

    >
    > > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )

    >
    > > {

    >
    > > __asm__("mul %3"

    >
    > > : "=d" (*rh),

    >
    > > "=a" (*rl)

    >
    > > : "a" (a),

    >
    > > "rm" (b)

    >
    > > );

    >
    > > }

    >
    > >

    >
    > > Your original code (and also my first rewrite above) neglects to tell

    >
    > > the compiler that it clobbers rdx. The second version above fixes that.

    >
    > > The compiler assumes that the value it put in rdx (the parameter a) will

    >
    > > still be there. Since a isn't used again, it seems like it wouldn't

    >
    > > matter, but if this function is inlined, the compiler will know what is

    >
    > > in that register and may use it again. I just found a bug a couple days

    >
    > > ago with that exact problem. (Note, you can't just add rdx to the

    >
    > > clobber list in your version, since you specify it as an input operand.)

    >
    > >

    >
    >
    >
    > I too recommend this sort of style. (I am not very familiar with inline
    >
    > assembly on x86, but have used it with other targets.) Let gcc handle
    >
    > the moves - that lets it optimise the code better. This is particularly
    >
    > important if "Mul64asm" is made "static inline" so that it is mixed in
    >
    > directly with other code. gcc will then be able to take advantage of
    >
    > things like having "a" or "b" already in a register, or using the
    >
    > results "*rl" or "*rh" without actually storing them out to memory. It
    >
    > will also be able to overlap the "mov" instructions for one Mul64asm
    >
    > with other code (assuming your cpu has enough registers) for better
    >
    > pipelining, and it will mix and match the choice of registers used
    >
    > (again, if your cpu has that choice). And of course, avoiding general
    >
    > memory clobbers and "asm volatile" is a big help to optimisation.
    >
    >
    >
    > Generally speaking, you let gcc do as much as possible, and keep the
    >
    > assembly code to a minimum. It's not as important in a register-poor,
    >
    > non-orthogonal architecture like the x86 where so much of the work goes
    >
    > through the bottleneck of a single "rax" register, but it can make a
    >
    > very big difference on more modern processor architectures with large
    >
    > numbers of general-purpose registers, or half-way architectures like
    >
    > x86-64 with its 16 registers.



    Thanks a lot to both, Philip's second version works like a charm both as a separate function and inlined. Could you tell me if I'm reading it correctly?

    : "=d" (*rh), // it's saying that *rh comes from the %rdx register
    "=a" (*rl) // same, must take *rl from %rax
    : "a" (a), // I want parameter 'a' in %rax before the mul
    "rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with

    Thanks!
     
    , Mar 5, 2013
    #4
  5. Philip Lantz Guest

    swept.along.by.events wrote:
    >> Philip Lantz wrote:
    >>> void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
    >>> {
    >>> __asm__("mul %3"
    >>> : "=d" (*rh),
    >>> "=a" (*rl)
    >>> : "a" (a),
    >>> "rm" (b)
    >>> );
    >>> }

    >
    > Thanks a lot to both, Philip's second version works like a charm both
    > as a separate function and inlined. Could you tell me if I'm reading
    > it correctly?
    >
    > : "=d" (*rh), // it's saying that *rh comes from the %rdx register
    > "=a" (*rl) // same, must take *rl from %rax
    > : "a" (a), // I want parameter 'a' in %rax before the mul
    > "rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with


    Yes, I think you are understanding it correctly.

    Another way of saying it: "=d" (*rh) means that the assembly code
    generates a result in rdx, which should be stored into *rh; "rm" (b)
    means that the assembly code uses b as an operand, and the operand can
    be in either register or memory.
     
    Philip Lantz, Mar 6, 2013
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Abhi
    Replies:
    2
    Views:
    764
    E. Robert Tisdale
    Jul 3, 2003
  2. Alvin
    Replies:
    7
    Views:
    506
    E. Robert Tisdale
    May 6, 2005
  3. Replies:
    3
    Views:
    490
  4. Daniel Vallstrom
    Replies:
    2
    Views:
    2,039
    Kevin Bracey
    Nov 21, 2003
  5. Nish
    Replies:
    4
    Views:
    539
    Thomas Stegen
    Oct 8, 2004
Loading...

Share This Page