Jack said:
That fact that it works on the compilers where you have tested it, or
99% of all compilers, or even 101% of all compilers does not prove
anything.
Yes but if all the compilers I need to support don't need it, my code
doesn't need it either. In fact most embedded C code is only designed
to support one compiler and one architecture and has far more serious
portability issues than this - e.g. assumptions about endianess or
stucture packing or gobs of inline assembler and hardware access.
So this is a QOI issue, not a standard one.
??
I'd bet that the addresses passed to printf will be on the stack too.
Even on a 64 bit machine, where buf1[5] might fit in a register, the C
compiler would need to stack it before the call to printf.
What "stack"? On some platforms, the four arguments in that printf()
call would all be passed in registers. And the return address in a
dedicated register of its own. No "stack" need apply.
Every platform I've seen uses the stack for printf, because it's the
most efficient way to implement va_next - va_start can create a hidden
pointer to the args on the stack, and va_next can do *hidden_ptr++ and
some masking.
If the args were in registers, you'd need either a need a wierd
addressing mode like this
mov Rdest, Registers[Rcount] ; mov the register number Rcount into
Rdest
I've never seen a architectures that provides this addressing mode. As
a HW guy it looks hard to implement. You could do two loads from the
register file, one to read Rcount and one to read Registers[Rcount], or
you could have add _lots_ of read ports to the register file.
On a normal chip, the C compiler could use self modifying code to
synthesise it -
mov Rtmp, Rcount
shift Rtmp, #xx ; shift Rcount to line up with the register field
; in the instruction
or Rtmp, [label]
label:
mov Rdest, R0 ; R0 is a constant field in the instruction
; which the previous instruction overwrites
But self modifying code is slow because it flushes pipelines, and
causes problems if you have a read only code page. It's also worth
pointing out that both of these solutions will break if you try to pass
more arguments to printf than you have usuable argument registers.
E.g. consider a printf on a Risc chip which allows 4 registers to pass
arguments. One call to printf with 5 arguments would require a second
copy of the function with the stack based calling convention.
So even on Risc machines where the normal calling convention is to pass
args in registers, variadic functions still pass the args to printf on
the stack. It's probably fast too, because the top of the stack is
pretty much guaranteed to be in a cache or write buffer when the
variadic function calls va_next. Then again, if you want speed, you'd
make the function non variadic and declare it INLINE with inline
#defined to be something appropriate for the compiler
(
http://www.greenend.org.uk/rjk/2003/03/inline.html)