Compile errors?

D

David Brown

Le 17/02/2014 11:27, David Brown a écrit :


Adding 32 bytes of stack space is not really anything complex!
That space is required to save the values of the 4 registers that hold
the arguments in case it is needed to save them. Nothing really
extraordinary!

Adding stack space is not particularly complex - it is a standard
operation used in many functions. But it is obviously vastly more
complex than not adding stack space at all. A lot of functions can get
away without doing any stack allocation, especially on x86-64 which has
quite a few registers (and on other cpu architectures with more
registers). Omitting stack allocation, frame pointers, and stack
deallocation saves code. Forcing functions to allocate 32 bytes,
whether they are needed or not, means extra code and wasted time. If
the callee function needs to save the parameter registers on the stack,
then there is a high chance that it needs to put more on the stack - so
the pre-allocation of 32 bytes is wasted. With the Linux ABI, there
only needs to be /one/ stack allocation, of exactly the size actually
needed (and sometimes that can be omitted too, if the function is a leaf
function).

So no, allocating 32 bytes on the stack is not difficult or complex.
But it will almost always be a waste of effort - it is inefficient.
Excuse me but Intel did not say anything about calling conventions that
is a myth.

You are probably right on that one, as Intel played catch-up to AMD
here. So let me change that and say the calling convention was designed
by AMD to be the most efficient for their instruction set. Technically,
I believe the ABI made by AMD here was targeting *nix systems - but at
the time it was defined, *nix was the only system that was ready for
64-bit x86.
And there were NO compatibility reasons either because 64 bit
code is completely different than 32 bit code because there are 8 new
registers, and (at last) the possibility of using the registers to pass
arguments.

I can't think of any particular compatibility reasons here either - but
I can't think of any other reason why MS would pick a different ABI and
calling convention when they started Windows for x86-64. /Maybe/ the
compatibility was for their previous 64-bit Windows versions (such as
for the DEC Alpha betas, or for the Itanium) - I don't know any details
here, so I am guessing. I presume they had /some/ reason for picking
something more complicated and less efficient than the existing standard.

One ABI change that MS /did/ make for compatibility reasons is the size
of "long" - on most x86-64 systems, "long" is 64 bits, but on Win64 it
is 32 bits as an ease to existing Windows programmers who made
assumptions about the size of "long".
There is no basis for that but keep your opinion if you wish. I agree
that we disagree.

My basis is the facts of the calling conventions, which are well
documented on the web. Opinions about what effect this would have on
the real-world speed of the code are merely opinions, as I haven't
tested them (and as noted I don't think the differences will be very
significant for most code).
 
D

David Brown

Maybe. But the basic, underlying tools often have command-line
interfaces (I vaguely remember CL.EXE for MS' C compiler, and there is
CSC.EXE for their C# compiler. In fact all C compilers I've tried under
Windows, including gcc, have command-line interfaces (and I used these
to build into my own IDE)). So that is an irrelevant point. For the
actual installation and setup, I don't care how it works provided it
does work.

That's entirely true. I was merely saying that /many/ Windows
developers don't think that way - to them, "MSVS" is the "compiler" -
the "compiler" includes all libraries, the IDE, debugger, documentation,
etc., and they never see the command-line programs. I don't view you as
a "typical" windows developer in this sense.
Well, that is what I intend to find out. I'm not writing 'most code',
but have my bytecode interpreter in mind.

One possibility is to dedicate one or two of the additional registers to
particular key variables, such as the bytecode's instruction pointer or
a pointer to a state structure - if these are used enough, it could save
loading and storing such critical values.
I know that, from the point of view of writing x64 ASM code, this would
likely benefit from 64-bit load/store operations, 64-bit registers
instead of 32-bit, and the extra 64-bit registers, *if* I can keep the
underlying data structures the same size. I might even add some extra
'register'-based bytecode ops, instead of stack-based, if they can be
used advantageously.

But I also know that if I just blindly recompile the C sources I have
now, it's unlikely to be any faster, and could be slower. But I'm trying
to get to that point first!

(Not the best processor design in x64; I think there is less demand for
64-bit addressing than for 64-bit data, registers and operations. But
the former demands 64-bit pointers which are often unnecessary and will
slow things down.)

Actually, I don't think it is common to need 64-bit integer registers or
operations - except to speed up movement of data blocks. 64-bit
pointers are needed to access larger memory areas, but cost more memory
in themselves - and that hurts cache performance. The two key speed
improvements from 64-bit mode are the extra registers, and the saner
instruction coding.

As far as I can tell, 32-bit offsets are usually used when accessing
static data - that can be more efficient than when 64-bit pointers are used.
 
M

Melzzzzz

Le 17/02/2014 12:31, Melzzzzz a écrit :


???

Excuse me but you didn't notice that 32 is a multiple of 16?

Yes ;)
If your stack is 16 byte aligned, adding 16, 32, 48, 64 etc does NOT
change anything.

But maybe this is too complex for you?
:)

Perhaps ;)
Debugging is not really important of course. Only bad programmers
have bugs, and they are all under windows. Linux programmers are bug
free.

Heh ;)

To be serious, all in all under windows one has two registers less
to manipulate (rdi and rsi have to be preserved), and one instruction
more (to add 32 to stack) ...
 
B

BartC

David Brown said:
On 17/02/14 09:56, jacob navia wrote:

And I gather the major difference is about stack allocation in the
calling convention, rather than the number of parameters (I agree that
not many functions use more than four parameters). It is the Win64
calling convention that is more complex, however, as it requires the
caller to allocate 32 bytes of "shadow space" on the stack before
calling a function - whether or not that space is ever used. On Linux,
there is no such requirement - functions allocate stack space if and
when they need it (and apparently leaf functions can use 128 bytes of
stack space "for free").

One common thing I've seen with any C code in x86, is the need for the
caller to adjust the stack after every call, eg:

call fn
add esp,8

which always seemed odd to me as it's easy to enough to do a 'ret 8' in the
callee instead of 'ret', and save the 'add esp' instruction.

This seems to be on a par with doing 'sub rsp,32' or whatever in win64. It's
difficult anyway with any modern x86 to find whether any set of instructions
is faster or slower than any other; sometimes adding an instruction makes
things faster.

But I agree it could be marginally less efficient to do 'sub rsp,32' when
there are no parameters at all, like doing 'add esp,0'.
Linux uses the x86-64 calling convention designed by AMD and Intel as
their standard ABI, aimed to be most efficient general calling
convention they could make. Win64 uses an odd calling convention based
on their own ideas, and different from everybody else. I don't really
know why the did this - perhaps there are compatibility reasons somewhere.

So yes, I think the Linux x86-64-bit calling convention is more
efficient than the Windows one.

I admit I know virtually nothing about these ABIs, but surely you only
really need to worry about calling conventions when calling (or being called
from) foreign functions such as those residing in the OS? Within the code
generated by a compiler, can't it just do what it likes? Including *not*
reserving those 32 bytes.
 
M

Melzzzzz

Actually, I don't think it is common to need 64-bit integer registers
or operations - except to speed up movement of data blocks. 64-bit
pointers are needed to access larger memory areas, but cost more
memory in themselves - and that hurts cache performance. The two key
speed improvements from 64-bit mode are the extra registers, and the
saner instruction coding.

As far as I can tell, 32-bit offsets are usually used when accessing
static data - that can be more efficient than when 64-bit pointers
are used.

There is x32 ABI on Linux exactly for that purpose. Using 64 bit mode
but with 32 bit pointers.
Problem is that everything has to be recompiled as x32 elf and that
is not that practical as one usually needs to do that by him/her self.
Biggest advantage of 64 bit pointers is possibility to mmap huge
files (address space is not limited to 4GB).
 
D

David Brown

One common thing I've seen with any C code in x86, is the need for the
caller to adjust the stack after every call, eg:

call fn
add esp,8

which always seemed odd to me as it's easy to enough to do a 'ret 8' in the
callee instead of 'ret', and save the 'add esp' instruction.

This seems to be on a par with doing 'sub rsp,32' or whatever in win64.
It's
difficult anyway with any modern x86 to find whether any set of
instructions
is faster or slower than any other; sometimes adding an instruction makes
things faster.

As you say, it is not always clear what instruction sequence is actually
the fastest. I know that with a 68k processor I used many years ago,
the equivalent of "ret 8" was slower than doing the return and stack
pointer manipulation separately.
But I agree it could be marginally less efficient to do 'sub rsp,32'
when there are no parameters at all, like doing 'add esp,0'.

I can't say what stack pointer manipulation instructions are the fastest
on x86-64 - but I am pretty confident that /no/ stack manipulation is
faster than any method of allocating (and deallocating) 32 bytes!
I admit I know virtually nothing about these ABIs, but surely you only
really need to worry about calling conventions when calling (or being
called
from) foreign functions such as those residing in the OS? Within the code
generated by a compiler, can't it just do what it likes? Including *not*
reserving those 32 bytes.

You can only do that when the compiler has full knowledge of the code in
question. So when you are talking about functions declared "static"
within the same compilation unit, or inline functions, then the compiler
can do what it wants with calling conventions. (Although it may still
pick standard calling conventions, depending on optimisation settings
and debugger capabilities.) It can also optimise calls if you have some
sort of full-program optimisation (LTO in gcc). But for general
externally linked functions, the compiler has to stick to the standards.
 
D

David Brown

They love to do that in this group. Say things like: I don't know anything
about Windows (because I'm a hip Unix dude, doncha know - and Windows is
just so ... for the masses, doncha know...!), but ... (and then procede to
pontificate about something having to do with Windows).

If that is targeted at me, then I use Linux mainly because it is the
most efficient OS for the job I do, and the way I do it. (And I use it
for home use, from personal preference.) I also use Windows for some
things. /You/ may choose your tools based on what is "hip", but I
don't. I don't program in C on Windows - few people do. Most of my
Windows programming these days is in Python, as is most of my Linux
programming - I use C for embedded development.

But despite not compiling more than the occasional piece of test code in
C on Windows, I am perfectly capable of reading information sites which
explain the calling conventions used on Windows and Linux. And it
doesn't take much experience to conclude that allocating extra stack
space for no purpose is less efficient than /not/ allocating that extra
stack space.

If someone could give a clear explanation or reference as to how a
caller-allocated 32-byte stack space is actually useful and gives
smaller, faster, clearer, more efficient or more debuggable code, then I
would be very happy to listen.

(My take on this follows)
As Tom Lehrer said: If people can't communicate, the very least they could
do is to ... shut up!

That would be completely against the spirit of this newsgroup :)
 
D

David Brown

There is x32 ABI on Linux exactly for that purpose. Using 64 bit mode
but with 32 bit pointers.
Problem is that everything has to be recompiled as x32 elf and that
is not that practical as one usually needs to do that by him/her self.
Biggest advantage of 64 bit pointers is possibility to mmap huge
files (address space is not limited to 4GB).

Yes, and there are other uses of such a large address space - it is
(AFAIUI) used by some memory leakage tracking tools to help in debugging.
 
S

Stephen Sprunk

Yes, all you can say for sure is that results will vary.

Well, you can test any given application both ways and see which mode
performs better. Or you may have other reasons to choose one or the
other, regardless of the performance difference (if any).
But generally, unless you have an application that significantly
benefits from 64-bit mode features,

Nearly all code will benefit from having more registers, which eases
register pressure and related stack spills--a common performance
limitation in x86, as well as a register-based calling convention.

OTOH, wider pointers means they use more D-cache and the REX prefix
means instructions use more I-cache, so there is a cost as well, which
may or may not outweigh the above gains, depending on your code.
it is probably not worth switching to 64-bit just for speed.

"Switching to 64-bit"? 64-bit mode has been the default for years; the
question is whether it's worth switching to 32-bit mode for a given
project, and IMHO the answer is "no" in almost all cases simply due to
the benefits of consistency.
They also vary a bit between platforms, apparently - while Linux and
Windows share the same basic calling convention for 32-bit, and
generate very similar object code from the same compiler, there are
more differences in the 64-bit world as Linux has a more efficient
calling convention.

AMD worked closely with Linux and GCC folks to define the ISA and ABI;
Intel and Microsoft were latecomers that wanted the architecture to fail
for business reasons but were forced by customers to adopt it anyway.

S
 
S

Stephen Sprunk

One common thing I've seen with any C code in x86, is the need for
the caller to adjust the stack after every call, eg:

call fn
add esp,8

which always seemed odd to me as it's easy to enough to do a 'ret 8'
in the callee instead of 'ret', and save the 'add esp' instruction.

That only works if the callee knows exactly how much data was pushed
onto the stack before the call. If the callee gets it wrong, Bad
Things(tm) will happen. The caller obviously knows how much data it
pushed, though, so let it do the cleanup itself.
This seems to be on a par with doing 'sub rsp,32' or whatever in
win64. It's difficult anyway with any modern x86 to find whether any
set of instructions is faster or slower than any other; sometimes
adding an instruction makes things faster.

Exactly. Intel and AMD optimize their CPUs so that the most common
instructions are the fastest, so it's generally best to copy what
everyone else (in particular, MSVC and GCC) does unless you're willing
to invest an extraordinary amount of time learning exactly how dozens of
different CPU models behave with various instruction sequences.
But I agree it could be marginally less efficient to do 'sub rsp,32'
when there are no parameters at all, like doing 'add esp,0'.

On some CPUs, stack operations are "free"; on others, the OOO stuff
still generally does a good job of hiding it, so the cost is pretty
close to zero in most cases.

If you're optimizing tight loops where every cycle counts, you don't
want to be making function calls, so it's moot anyway.
I admit I know virtually nothing about these ABIs, but surely you
only really need to worry about calling conventions when calling (or
being called from) foreign functions such as those residing in the
OS? Within the code generated by a compiler, can't it just do what it
likes? Including *not* reserving those 32 bytes.

Only if you don't care about being compatible with object code created
by other compilers/linkers, which is a reasonable customer expectation.

ABIs exist for a reason; ignore them at your peril.

S
 
D

David Brown

Well, you can test any given application both ways and see which mode
performs better. Or you may have other reasons to choose one or the
other, regardless of the performance difference (if any).


Nearly all code will benefit from having more registers, which eases
register pressure and related stack spills--a common performance
limitation in x86, as well as a register-based calling convention.

OTOH, wider pointers means they use more D-cache and the REX prefix
means instructions use more I-cache, so there is a cost as well, which
may or may not outweigh the above gains, depending on your code.

Yes, it all adds up to "results will vary depending on the code". It
will also depend on the exact cpu in question - different Intel and AMD
processors will have different balances. That is one of the reasons why
speed alone is seldom a good reason for picking 64-bit compilation over
32-bit.
"Switching to 64-bit"? 64-bit mode has been the default for years; the
question is whether it's worth switching to 32-bit mode for a given
project, and IMHO the answer is "no" in almost all cases simply due to
the benefits of consistency.

That depends on your platform - 64-bit has been the default for many
years in Linux, both for the system itself and for the applications.
But 64-bit windows has only been the default for new windows systems for
a few years, and though I don't have the figures, I expect it is still
in the minority for current installations. And 64-bit has definitely
been in the minority for windows applications. In the Linux world, you
(typically) release source code and the distro can compile it as 32-bit
and 64-bit. But in the Windows world, you (typically) produce a binary
- if you make a 32-bit version, anyone can run it, but if you make a
64-bit version only something like half the user base can use it. So
you either stick to 32-bit, or you produce and maintain two binaries.
 
A

Alain Ketterlin

Stephen Sprunk said:
On 17-Feb-14 06:31, BartC wrote: [...]
I admit I know virtually nothing about these ABIs, but surely you
only really need to worry about calling conventions when calling (or
being called from) foreign functions such as those residing in the
OS? Within the code generated by a compiler, can't it just do what it
likes? Including *not* reserving those 32 bytes.

Only if you don't care about being compatible with object code created
by other compilers/linkers, which is a reasonable customer expectation.

ABIs exist for a reason; ignore them at your peril.

Maybe what BartC means is that: "The standard calling sequence
requirements apply only to global functions. Local functions that are
not reachable from other compilation units may use different
conventions." (from the AMD64 ABI Draft 0.99.4 -- Jan 13, 2010)

So, not "within code generated by a compiler", but "within a given
compilation unit, for local functions" only.

-- Alain.

P/S: the sentence right after the one quoted above is: "Nevertheless, it
is recommended that all functions use the standard calling sequence when
possible."
 
S

Stephen Sprunk

Maybe what BartC means is that: "The standard calling sequence
requirements apply only to global functions. Local functions that
are not reachable from other compilation units may use different
conventions." (from the AMD64 ABI Draft 0.99.4 -- Jan 13, 2010)

So, not "within code generated by a compiler", but "within a given
compilation unit, for local functions" only.

I was going to reply to this, but ...
P/S: the sentence right after the one quoted above is: "Nevertheless,
it is recommended that all functions use the standard calling
sequence when possible."

.... this is almost exactly what I was going to say.

Also, it's a lot of work to track which functions use which calling
convention to avoid any possible problems; it's easier to use the same
one all the time, unless you provide users a means to explicitly
decorate function declarations with the convention to be used, e.g.
__stdcall, __fastcall, __cdecl, etc.

S
 
I

Ian Collins

David said:
Linux uses the x86-64 calling convention designed by AMD and Intel as
their standard ABI, aimed to be most efficient general calling
convention they could make. Win64 uses an odd calling convention based
on their own ideas, and different from everybody else. I don't really
know why the did this - perhaps there are compatibility reasons somewhere.

It isn't just Linux. Most (if not all) Unix and UNIX-like operating
systems use the AMD64 calling convention. Omitting the frame pointer is
a common optimisation on those platforms.
 
K

Keith Thompson

Alain Ketterlin said:
Stephen Sprunk said:
On 17-Feb-14 06:31, BartC wrote: [...]
I admit I know virtually nothing about these ABIs, but surely you
only really need to worry about calling conventions when calling (or
being called from) foreign functions such as those residing in the
OS? Within the code generated by a compiler, can't it just do what it
likes? Including *not* reserving those 32 bytes.

Only if you don't care about being compatible with object code created
by other compilers/linkers, which is a reasonable customer expectation.

ABIs exist for a reason; ignore them at your peril.

Maybe what BartC means is that: "The standard calling sequence
requirements apply only to global functions. Local functions that are
not reachable from other compilation units may use different
conventions." (from the AMD64 ABI Draft 0.99.4 -- Jan 13, 2010)

So, not "within code generated by a compiler", but "within a given
compilation unit, for local functions" only.

-- Alain.

P/S: the sentence right after the one quoted above is: "Nevertheless, it
is recommended that all functions use the standard calling sequence when
possible."

C of course doesn't use the terms "local" and "global" for functions.

A compiler writer might be tempted to use a non-standard calling
sequence for *static* functions -- but it's still possible to take
the address of a static function and use that address to call it
from a different translation unit.

Probably the term "local functions" would apply only to functions that
the compiler can *prove* are never directly called from outside the
translation unit in which they're defined. That would apply to any
function defined as "static" whose address is never taken (except for
the implicit function-to-address conversion in a call).
 
G

glen herrmannsfeldt

David Brown said:
On 17/02/14 12:17, jacob navia wrote:
(snip)
Adding stack space is not particularly complex - it is a standard
operation used in many functions. But it is obviously vastly more
complex than not adding stack space at all.

Well, I would consider the ratio of complexity, including the rest
of the calling sequence, in which case it isn't all that much.
A lot of functions can get away without doing any stack
allocation, especially on x86-64 which has quite a few registers
(and on other cpu architectures with more registers).
Omitting stack allocation, frame pointers, and stack
deallocation saves code. Forcing functions to allocate 32 bytes,
whether they are needed or not, means extra code and wasted time.

For many years now, processors do enough instruction overlap that
you don't know that any time is wasted. If it sames more time later,
on average it might be less.
If the callee function needs to save the parameter registers
on the stack, then there is a high chance that it needs to put
more on the stack - so the pre-allocation of 32 bytes is wasted.

What actually matters is what happens on average, not what might
happen on a single call. If it helps in the average (weighted by
the number of calls) then it helps.
With the Linux ABI, there only needs to be /one/ stack allocation,
of exactly the size actually needed (and sometimes that can be
omitted too, if the function is a leaf function).
So no, allocating 32 bytes on the stack is not difficult or complex.
But it will almost always be a waste of effort - it is inefficient.

-- glen
 
G

glen herrmannsfeldt

BartC said:
(snip)
(snip)
One common thing I've seen with any C code in x86, is the need
for the caller to adjust the stack after every call, eg:
call fn
add esp,8
which always seemed odd to me as it's easy to enough to do a
'ret 8' in the callee instead of 'ret', and save the 'add esp'
instruction.

In the beginnings of the IBM PC, and with PASCAL and Fortran as the
more common compilers, callee pops the stack conventions, such as
using "ret 8" were used. I believe the current implementation is
the STDCALL convention.

When C, with varargs routines started to become popular, that became
a problem. With varargs, it is much harder for the called routine to
know the number of arguments, and to pop them. The number isn't a
constant, so the alternate ret instruction doesn't help. This is
often called CDECL, at least in the MS world.

With ANSI C, you can have different calling convention for varargs
and non-varargs routines, but that is rare.

For unix, where C was always a popular language, caller pops the
stack conventions were always more popular.

You could look at clock cycle counts for the 8086 and 8088 if you
wanted to know, but even then they did fetch overlap, with a
prefetch buffer, so it might not tell you much. Current processors
do so much overlap that you really have no idea which is faster.

-- glen
 
G

glen herrmannsfeldt

(snip, someone wrote)
Maybe what BartC means is that: "The standard calling sequence
requirements apply only to global functions. Local functions that
are not reachable from other compilation units may use different
conventions." (from the AMD64 ABI Draft 0.99.4 -- Jan 13, 2010)

I suppose, but even then you will confuse people using a debugger,
and expecting things to be in the usual place.
So, not "within code generated by a compiler", but "within a given
compilation unit, for local functions" only.

(snip)

-- glen
 
G

glen herrmannsfeldt

Keith Thompson said:
(snip)
(snip)

C of course doesn't use the terms "local" and "global" for functions.
A compiler writer might be tempted to use a non-standard calling
sequence for *static* functions -- but it's still possible to take
the address of a static function and use that address to call it
from a different translation unit.

Well, the compiler could check to see if you ever did that.

Also, there are languages, Fortran and PL/I as examples, and I
believe also PASCAL, with internal procedures. In PL/I, you could
pass a procedure pointer out, but again the compiler would know.

I think with Fortran 2008 you can now have pointers to internal
procedures. You couldn't previously.

It always seemed interesting to me that the faster computers get,
the more interest there is in speeding up calling conventions.
Probably the term "local functions" would apply only to functions that
the compiler can *prove* are never directly called from outside the
translation unit in which they're defined. That would apply to any
function defined as "static" whose address is never taken (except for
the implicit function-to-address conversion in a call).

Did anyone ever consider adding internal functions to C?

-- glen
 
G

glen herrmannsfeldt

David Brown said:
On 17/02/14 16:36, Stephen Sprunk wrote:
(snip)
(snip)
Yes, it all adds up to "results will vary depending on the code". It
will also depend on the exact cpu in question - different Intel and AMD
processors will have different balances. That is one of the reasons why
speed alone is seldom a good reason for picking 64-bit compilation over
32-bit.

(snip, someone wrote)
That depends on your platform - 64-bit has been the default for many
years in Linux, both for the system itself and for the applications.
But 64-bit windows has only been the default for new windows systems for
a few years, and though I don't have the figures, I expect it is still
in the minority for current installations.

Well, at least some Windows distributions give you two DVDs and you
decide which one to install. I believe Linux does that, too.

For many people "64" is bigger than "32" and so much be better.
And 64-bit has definitely been in the minority for windows
applications. In the Linux world, you (typically) release source
code and the distro can compile it as 32-bit and 64-bit.
But in the Windows world, you (typically) produce a binary
- if you make a 32-bit version, anyone can run it, but if you make a
64-bit version only something like half the user base can use it. So
you either stick to 32-bit, or you produce and maintain two binaries.

Well, last I knew 64 bit Linux would also run 32 bit binaries.
If you install the 64 bit OS, you have your choice for which programs
you can run.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,779
Messages
2,569,606
Members
45,239
Latest member
Alex Young

Latest Threads

Top