Compile errors?

K

Keith Thompson

glen herrmannsfeldt said:
But isn't parent and child what you call the relationship in
the call tree? Seems to me you should have a different word for
this case.

In the call tree I'd use the terms caller and callee.
I haven't done much Pascal for a while, but I think some might
have only allowed internal functions. That is, no separate compilation.

Yes, though some Pascal dialects support separately compiled modules.
(Ada certainly does.)
That sounds about right.

Also, I believe Fortran still only allows one level of nesting.
That is, no internal to internal procedures. That avoids some of the
complications of variable association.

I can't conform that restriction for Fortran, but yes, in that case the
static link and display mechanisms would reduce to the same thing.
 
S

Stephen Sprunk

Intel certainly would not have been interested, but MS worked with
AMD from pretty early on.

Not as early on as the Linux community, who helped define the ISA and
standard ABI. MS was a latecomer by far.
In fact it was MS who basically told Intel to do AMD64 because they
wouldn't support a second 64 bit x86 extension.

.... after realizing that AMD64 was going to be successful--and that
Itanic wasn't.
MS released a public AMD64 beta of XP something like nine months
after the first Opterons shipped. And pre-beta versions were
available well before that.

A public beta that was nearly impossible to get a copy of, had horrible
drivers (if they existed at all), had no support channel, and was not
followed by a GA product for _years_ afterward.

Meanwhile, Linux had production-quality code ready before the first
Opterons hit the store shelves.
In any event, any notion that AMD was not primarily courting *MS*
for AMD64 support overestimates the importance of Linux in the market
at the time by an order of magnitude.

AMD obviously needed MS's backing for it to stick in the marketplace,
but MS didn't seem interested until major customers demanded it.

Of course, MS had been burned by the PPC, MIPS and AXP ports in the
past, and they had just invested a ton of effort into the Itanic port,
so being reluctant to do yet another port was understandable.

S
 
J

Jorgen Grahn

Intel certainly would not have been interested, but MS worked with AMD
from pretty early on. In fact it was MS who basically told Intel to
do AMD64 because they wouldn't support a second 64 bit x86 extension.
MS has always had a somewhat uneasy partnership with Intel, and has
usually backed AMD as a balance - if often somewhat quietly.

MS released a public AMD64 beta of XP something like nine months after
the first Opterons shipped. And pre-beta versions were available well
before that.

In any event, any notion that AMD was not primarily courting *MS* for
AMD64 support overestimates the importance of Linux in the market at
the time by an order of magnitude.

Of course AMD aimed for the Windows market -- anything else would be
suicide. But if I designed a new CPU, I'd love to use Linux as a test
case. Get a gcc version with support for the instruction set, trick
someone into porting the Linux kernel, and you suddenly have gigabytes
of source code to play with (much of which had been running on SPARC64
already, so no stupid sizeof(int)==sizeof(int*) bugs to fix).

Of course, I don't know if that was what actually happened.

/Jorgen
 
B

BartC

BartC said:
Well, that is what I intend to find out. I'm not writing 'most code', but
have my bytecode interpreter in mind.
But I also know that if I just blindly recompile the C sources I have now,
it's unlikely to be any faster, and could be slower. But I'm trying to get
to that point first!

[Compiling existing programs to 64-bits]

Just been running some first tests. (Too many years of having
interchangeable ints and pointers have resulted in some lazy, sloppy code;
that took a while to sort out. If nothing else, the process has at least
improved my coding standards!)

As expected, speed was about the same, or a bit slower. With 32-bits, I
relied on an ASM module to provide the maximum performance. I might embark
on the same with 64-bits, but it's not an attractive prospect, and a lot of
work if I just end up matching the 32-bit speed.

There are some architectural changes I might try, however I've had so many
problems grappling with militant-minded C compilers that I might take a step
back and see if I can bypass more of the language; I'm at its mercy at the
minute...)
 
S

Seebs

I'm not disputing /a/ version of gcc generating 64-bit code exists. I was
asking, if Jorgen had such a compiler, if he knew where I could download a
copy. However I forgot that he was probably talking a non-Windows version.

Ahh, yes. Someone once showed me a surreal explanation of why mingw wasn't
doing 64-bit code.

-s
 
S

Seebs

Most commonly called nested functions, Pascal being somewhat famous
for them. The nested functions can usually access variables defined
in their parent functions (etc.). IMO, they have limited utility,
(private) member functions of a class present a much more general
solution.

They're very common in some languages. They're highly idiomatic in Lua,
because they allow you to hand around functions that maintain access to
local variables from the environment where they were declared.

-s
 
D

David Brown

The rationale given in their description
(http://msdn.microsoft.com/en-us/library/ms235286.aspx) is, "This aids
in the simplicity of supporting C unprototyped functions, and vararg
C/C++ functions."

Not sure I really buy it.

Thanks - that's the first reasoning I have heard of. I can see why this
would help in such cases - the called function can copy the register
parameters onto the stack, and then all the parameters are on the stack
in a nice row. But apart from printf and friends (which are always slow
- being a little slower will not be noticed), how many variable argument
C functions are used? And how many unprototyped C functions are used -
especially on a tool that is rarely used for C (since it is a rather
poor C compiler)? And how many vararg functions are used in C++ ? It
sounds like they have speeded up a tiny proportion of C/C++ functions
whose speed never mattered anyway, at the cost of the speed and size of
every other function call. That's not a good tradeoff.
 
D

David Brown

With an orthogonal register set--and x86(-64) mostly qualifies--this is
a non-issue. The compiler knows where each result needs to end up, so
it arranges the code above so that's where the result goes. It doesn't
calculate something, put the result in RBX and then copy that result to
RAX a few cycles later. Even if it did, that's free on OOO machines.

When a function is generated to follow the calling conventions, then
this register movement is /exactly/ what it does. The called function
will return the result in rax (or rax, rdx if necessary), and if the
caller function needs it in rdi (for another call, perhaps) it must move
it there. When inlining, the compiler can see that the result should be
put directly in rdi and thus skip some register movements.

On an OOO cpu with register renames, register-to-register movements are
cheap - but they are not free. They can often be handled within the
register rename and history mechanism and thus avoid any cycles or real
data movement, but every instruction means less buffer space, less L1
cache space, and less bus bandwidth available for other instructions.
So they are cheap - and perhaps insignificant in cost - but not
completely free.
(There are a few x86(-64) instructions with fixed registers, e.g. MUL
and DIV, but in general that hasn't been true for a long time.)


Indeed, and that can be a huge benefit--when inlining is possible.


OTOH, inlining can defeat instruction caches because the same code gets
loaded from different addresses, on top of any bloat issues.

Yes, once the inlining leads to bigger code, the space cost can mean a
speed cost. The best choice depends on the type of code, the amount of
times it is called, and the savings you can get from optimisations. For
small functions, the function call overhead (including the call itself,
register movements, and - for Win64 - the stack allocation) can be
bigger than the inlined code. In such cases, it is always best to
inline, even though the code is duplicated.
Still, the other optimizations plus a decent prefetcher usually make
inlining a net win.

It's a powerful tool - but if you want the best from your compiler, you
need to use it carefully, examine your code, and do serious profiling.
 
S

Stephen Sprunk

But if I designed a new CPU, I'd love to use Linux as a test case.
Get a gcc version with support for the instruction set, trick
someone into porting the Linux kernel,

I don't think they needed to "trick" anyone; if AMD hadn't embraced the
Linux community as it did, folks would have "borrowed" some AMD64 chips
and done the work anyway for the fun (and fame) of it.
and you suddenly have gigabytes of source code to play with

Gigabytes of code that had previously been ported to _other_ 64-bit
architectures, meaning they had a fully-functional testbed as soon as
GCC and the kernel were ready. They could verify real-world results of
design decisions very early in the CPU development cycle, i.e. while
there was still time to make major changes to the ISA, perhaps even on
simulators. There's no way they could have done that with Windows even
if MS's full weight had been behind them--and it wasn't.

Part of Itanic's epic failure could be blamed on Intel setting the ISA
in stone and then expecting MS to magically polish their turd, and it's
no surprise that AMD chose a radically different strategy: they can't
afford a spectacular disaster like Intel has every few years.
(much of which had been running on SPARC64 already, so no stupid
sizeof(int)==sizeof(int*) bugs to fix).

Was SPARC64 really the only one to predate AMD64? The list of ports
also includes Alpha, PA-RISC, z/Arch, Itanic and PPC64.

S
 
S

Stephen Sprunk

BartC said:
Well, that is what I intend to find out. I'm not writing 'most
code', but have my bytecode interpreter in mind.

But I also know that if I just blindly recompile the C sources I
have now, it's unlikely to be any faster, and could be slower. But
I'm trying to get to that point first!

[Compiling existing programs to 64-bits]

Just been running some first tests. (Too many years of having
interchangeable ints and pointers have resulted in some lazy, sloppy
code; that took a while to sort out. If nothing else, the process
has at least improved my coding standards!)

Sounds like your code needs to get out more, try some new and
interesting systems that it hasn't seen before.

I started out with C on AIX/POWER, and I had loads of "fun" when I
ported that code to Linux/x86, but it made my code better. I didn't
have any problems at all when I (much later) moved to x86-64.
As expected, speed was about the same, or a bit slower. With
32-bits, I relied on an ASM module to provide the maximum
performance. I might embark on the same with 64-bits, but it's not an
attractive prospect, and a lot of work if I just end up matching the
32-bit speed.

You compared 64-bit compiled code with 32-bit assembly code, and the
former was "about the same, or a bit slower"? That sounds like a
ringing endorsement of 64-bit mode's performance, given how much of an
advantage one would expect assembly to have in general.
There are some architectural changes I might try, however I've had so
many problems grappling with militant-minded C compilers that I might
take a step back and see if I can bypass more of the language; I'm at
its mercy at the minute...)

My experience is that if I bow to the militancy of the compiler (-W
-Wall -Werror), it ends up generating better code than if I try to
cheat. It's far smarter than I am.

S
 
B

BartC

Stephen Sprunk said:
BartC said:
Well, that is what I intend to find out. I'm not writing 'most
code', but have my bytecode interpreter in mind.

But I also know that if I just blindly recompile the C sources I
have now, it's unlikely to be any faster, and could be slower. But
I'm trying to get to that point first!

[Compiling existing programs to 64-bits]
As expected, speed was about the same, or a bit slower. With
32-bits, I relied on an ASM module to provide the maximum
performance. I might embark on the same with 64-bits, but it's not an
attractive prospect, and a lot of work if I just end up matching the
32-bit speed.

You compared 64-bit compiled code with 32-bit assembly code, and the
former was "about the same, or a bit slower"? That sounds like a
ringing endorsement of 64-bit mode's performance, given how much of an
advantage one would expect assembly to have in general.

No, I compared like with like. Typical figures look like this:

ASM LAB SW FN
gcc32 2.6 3.9 4.5 5.4
gcc64 ? 4.5 5.5 5.4

Where the columns represent different choices for the main loop of the
application (assembler main loop, using gcc's && label pointers, using a big
switch, and just a loop calling function pointers).

(However, the gcc64 was TDM gcc, while gcc32 was the official gcc; the
latter has slightly better optimisation anyway.)

To do the same trick with ASM on 64-bits, I first of all have to get down to
2.6 seconds, and then try and beat that to make it worthwhile. But
considering that gcc64 couldn't manage to get this code faster than gcc32,
I'm not sure I can do better, at least with everything else being the same.
That's why reviewing the internal structures, among other aspects, might be
necessary (the main datatype that is manipulated is 50% bigger with 64-bits,
and is also now an odd size).
 
B

BartC

Richard said:
[Compiling existing programs to 64-bits]
No, I compared like with like. Typical figures look like this:

ASM LAB SW FN
gcc32 2.6 3.9 4.5 5.4
gcc64 ? 4.5 5.5 5.4
That's why reviewing the internal structures, among other aspects, might

100% bigger.

In addition there is frequently and resource overhead with 64 bit over
32 bit.

Amazing how many people assume things will be faster.

Well, it is pointers that are 100% bigger; my data structure was 24 bytes
instead of 16. So I've done a quick test where it is also 16 bytes in
64-bits (which means there are things missing, so some kinds of input can't
be run, but enough to highlight any differences).

And /there was virtually no difference/. I supposed that if 16 bytes have to
be moved from A to B, the x86 is clever enough to do this in the most
efficient way whether it is invoked with N 64-bit operations or 2N 32-bit
ones.

Well, at least it saved me wasting more time on this. Although I will take a
closer look sometime at the actual compiler output, in case the generated
64-bit code isn't optimised as well as it could be.
 
S

Stephen Sprunk

When a function is generated to follow the calling conventions, then
this register movement is /exactly/ what it does. The called
function will return the result in rax (or rax, rdx if necessary),
and if the caller function needs it in rdi (for another call,
perhaps) it must move it there. When inlining, the compiler can see
that the result should be put directly in rdi and thus skip some
register movements.

If the argument comes directly from the return value of another
function, sure. OTOH, if the caller is computing the argument, it will
arrange for the result to end up in the right place.
On an OOO cpu with register renames, register-to-register movements
are cheap - but they are not free. They can often be handled within
the register rename and history mechanism and thus avoid any cycles
or real data movement, but every instruction means less buffer space,
less L1 cache space, and less bus bandwidth available for other
instructions. So they are cheap - and perhaps insignificant in cost -
but not completely free.

MOV reg, reg is a 2-3 byte instruction (depending on whether REX is
needed), so the I-cache cost is negligible. It's also a "simple"
instruction, so the decode cost is negligible as well. "Free" was an
overstatement, but it's still a big win over passing on the stack.
Yes, once the inlining leads to bigger code, the space cost can mean
a speed cost. The best choice depends on the type of code, the
amount of times it is called, and the savings you can get from
optimisations. For small functions, the function call overhead
(including the call itself, register movements, and - for Win64 - the
stack allocation) can be bigger than the inlined code. In such
cases, it is always best to inline, even though the code is
duplicated.

Modern compiler heuristics should take such issues into account.
It's a powerful tool - but if you want the best from your compiler,
you need to use it carefully, examine your code, and do serious
profiling.

Of course, but a human can only examine so much output, often limited to
the tightest of loops. Beyond that, we have to assume that the compiler
is doing the right thing--and that the folks who designed the ABI made
the best choices for typical workloads.

S
 
S

Stephen Sprunk

If I understand the history, things like FS and GS support in long
mode were pushed for by MS, not the *nix community.

I can't find any historical evidence either way, but glibc uses GS in x86
mode and FS in x86-64 mode for TLS data, so it seems unlikely they would
have thrown that away, only to have it brought back by MS. (IIRC, MS
uses FS for TLS and GS for CPU-specific data or vice versa.)
I think by then everyone had realized that IPF was not going to be
successful on the desktop, except perhaps Intel. On servers there
was still a sliver of hope.

MS and Intel must have both realized that the then-recent success of
Wintel servers was due to leveraging the economy of scale from Wintel
desktops. To use another architecture for only servers would negate
that and cede their most profitable customers to competitors.

MS and Intel have made a lot of bone-headed moves over the years, but
neither of them would be stupid enough to do that. As soon as Itanic's
failure on the desktop became obvious, it was dead for servers too.
OK, if I have my dates straight... Opteron first shipped April 2003,
the non-public (developers only) beta* for XP/64 was September 2003,
the public beta** was February (I think) 2004, and the production
release April 2005.

IIRC that was "XP Pro, x64 Edition"; XP64 was for Itanic. And you
couldn't get it installed on new PCs, even with Athlon64/Opterons, in
large part because there was no Home version.

Windows for x64 didn't hit the shelves until Vista, and Vista was such a
flop that x64 support wasn't mainstream until Windows 7, and that's
about how long it took for the drivers to catch up anyway.
*I don't really know how hard the public beta was to access for the,
*ahem*, public, but for developers both the initial beta and the
public beta (and I want to say there were two major builds between
those, but memory fails me), was pretty much trivial (just download
it from your MSDN subscription).

I had to torrent it because I didn't have an MSDN account, and that
requires far more effort/skill than the general public can handle.
**Yes, it sucked, unless you had exactly the right hardware, then it
was a pretty unremarkable version of Windows, except for the 64 bit
thing.

It was unremarkable until you noticed that none of your peripherals,
e.g. your printer, worked with it. Luckily I chose to dual boot, or I
would have being reinstalling XP32 the next day.
There's no doubt that Linux was there with usable systems first. And
to a certain extent MS didn't fully commit to AMD64 until Intel did,
at which point its success was assured, but they certainly threw a
lot of code at it before that point.

They had already ported to three other 32-bit systems* and one other
64-bit system, so most of that was just porting MSVC, tweaking the
kernel a bit and recompiling the OS; they didn't port anything else
(e.g. Office) until much later.

* The Alpha port was 32-bit, which is a story in itself.

S
 
J

Jorgen Grahn

I don't think they needed to "trick" anyone; if AMD hadn't embraced the
Linux community as it did, folks would have "borrowed" some AMD64 chips
and done the work anyway for the fun (and fame) of it.


Gigabytes of code that had previously been ported to _other_ 64-bit
architectures, meaning they had a fully-functional testbed as soon as
GCC and the kernel were ready.
Exactly.

....

Was SPARC64 really the only one to predate AMD64? The list of ports
also includes Alpha, PA-RISC, z/Arch, Itanic and PPC64.

I mentioned Sparc because (a) that's what I have experience with and
(b) because it seemed to me that Solaris was the most popular Unix in
the 1990s. Software which showed up on Linux had usually spent many
cycles on Sun hardware already ...

/Jorgen
 
G

glen herrmannsfeldt

I mentioned Sparc because (a) that's what I have experience with and
(b) because it seemed to me that Solaris was the most popular Unix in
the 1990s. Software which showed up on Linux had usually spent many
cycles on Sun hardware already ...

I remember Sparc being popular for web servers in the late 1990's
and early 00's. That was when it wasn't yet so popular for desktop
use, as it had enough bugs to discourage people. Possibly ones you
don't notice with servers.

I remember at the time, when we were running both SunOS 4.x and
Solaris systems, more exploits for Solaris than SunOS. Not because
bugs were easier to find, but because they were a bigger target.

That was also around the time that Solaris-x86 came out, where
the hardware was cheaper. We had both running.

-- glen
 
P

Phil Carmody

Ian Collins said:
It isn't just Linux. Most (if not all) Unix and UNIX-like operating
systems use the AMD64 calling convention. Omitting the frame pointer
is a common optimisation on those platforms.

Fomitting the frame pointer, with the appropriately ambiguous pronunciation,
please!

Phil
--
What Alice Hill, President at Slashdot Media, writes:
Proven track record innovating and improving iconic websites
(Slashdot.org, ...) while protecting their voice and brand integrity
What Alice Hill means:
2013: Completely fucked up Slashdot, and ignored almost endless
negative feedback about her unwanted changes. 2014: Killed slashdot.
 
B

BartC

BartC said:
"Richard" <[email protected]> wrote in message


[Compiling to 64-bit instead of 32-bit]
Well, it is pointers that are 100% bigger; my data structure was 24 bytes
instead of 16. So I've done a quick test where it is also 16 bytes in
64-bits (which means there are things missing, so some kinds of input
can't be run, but enough to highlight any differences).

And /there was virtually no difference/.

I've discovered that one of the x64 compilers, TDM gcc, has a switch -mx32
which compiles for 64-bits but keeps pointers at 32-bits.

Sounds ideal! Unfortunately compiling my code (even the fragment shown
below, and which I've submitted as a bug report) generates an internal
compiler error. I seem to be collecting these regularly (3 different
compilers over several weeks!).

Anyway my tests with this 64/32-bit model will have to wait (it will
probably be quicker to superimpose my own 32-bit pointer model, with loads
of casts each way, on the source code).

(This generates an internal compiler error, so obviously something not quite
right:

int *fn(void){
int *m;
return m;
}

)
 
I

Ian Collins

BartC said:
I've discovered that one of the x64 compilers, TDM gcc, has a switch -mx32
which compiles for 64-bits but keeps pointers at 32-bits.

Sounds ideal!

Not if you want to use any system or other third party libraries...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top