I know. Kinda makes ya gag just to mention it, doesn't it?
well, I actually use MSVC, but using MSVC and knowing ASM can make one
not exactly all that impressed with its ASM output.
That is rather oversimplifying the optimization options available.
it is very minimal, but this is what you get with basic optimization
options with MSVC.
AFAICT, these ones are on by default (without explicit optimization
settings), with GCC.
it is also a little annoying that it can't use optimization and
profile/debug settings at the same time.
however, there are a few merits:
it has full access to the Windows API;
it has Visual Studio;
it can do .NET stuff;
....
my 3D engine is also mostly GPU-bound, so being compiled with debug
settings doesn't really the hurt overall performance too badly.
What do you mean by the quotation marks?
these are actually fairly naive optimizations, compared with what is
possible.
their relative effectiveness implies either:
many commonly used compilers are not very good on this front;
more advanced optimizations tend not to actually buy a whole lot (it
more amounting to a case of diminishing returns).
was going to give an example from another area, but it turned out to be
awkwardly long: basically, pointing out the cost/benefit tradeoffs which
lead to the present near-dominance of Huffman compression over
Arithmetic Coding and High-Order context modeling (PAQ / PPMd / ...),
despite Huffman not compressing nearly as well.
although, to be fair, many more recent codecs (LZMA, H.264, ...) use AC,
so things may be shifting slightly in its favor (the added compression
outweighing the higher time-cost).
but, yeah, there is often a lot more that could be done, except that the
costs may make it unreasonable or impractical to do so.
How long is a "long ways" and compared to what?
0x40d0cc 1252 obuf[l0+0]=r0; obuf[l0+1]=g0; obuf[l0+2]=b0;
obuf[l0+3]=a; 0.68
0x40d0cc mov eax,[ebp-54h] 8B 45 AC 0.07
0x40d0cf add eax,[ebp-10h] 03 45 F0
0x40d0d2 mov cl,[ebp-78h] 8A 4D 88
0x40d0d5 mov [eax],cl 88 08 0.07
0x40d0d7 mov edx,[ebp-54h] 8B 55 AC 0.02
0x40d0da add edx,[ebp-10h] 03 55 F0
0x40d0dd mov al,[ebp-14h] 8A 45 EC 0.04
0x40d0e0 mov [edx+01h],al 88 42 01
0x40d0e3 mov ecx,[ebp-54h] 8B 4D AC 0.01
0x40d0e6 add ecx,[ebp-10h] 03 4D F0 0.08
0x40d0e9 mov dl,[ebp-08h] 8A 55 F8 0.02
0x40d0ec mov [ecx+02h],dl 88 51 02
0x40d0ef mov eax,[ebp-54h] 8B 45 AC 0.11
0x40d0f2 add eax,[ebp-10h] 03 45 F0 0.01
0x40d0f5 mov cl,[ebp-28h] 8A 4D D8 0.16
0x40d0f8 mov [eax+03h],cl 88 48 03 0.09
compiler = MSVC, source language = C.
it can actually get a lot worse than this, but it illustrates the basic
idea (without being too long).
for example, what if the compiler cached intermediate values in registers?
more likely the output above would look something more like:
mov eax,[ebp-54h]
add eax,[ebp-10h]
mov cl,[ebp-78h]
mov [eax],cl
mov dl,[ebp-14h]
mov [eax+01h],dl
mov cl,[ebp-08h]
mov [eax+02h],cl
mov dl,[ebp-28h]
mov [eax+03h],dl
many other (potentially more significant) optimizations are
higher-level, and
don't necessarily actually make much of a difference at the lower-levels.
What do you mean by "higher-level" and "lower-levels [sic]"?
Of which particular optimizations do you speak?
higher-level:
constant folding;
object lifetime analysis;
ability to skip out on certain safety checks;
scope visibility analysis and type-inference (mostly N/A to Java, more
relevant to languages like ECMASript);
....
lower-level:
register allocation strategies;
peephole optimization;
....
higher-level optimizations can be done usually in advance of generating
the output code, and they don't particularly depend on the type of
output being produced (target architecture, ...).
whereas things like register allocation depend much more on the target
architecture, and are more closely tied to the compiler output being
produced.
HotSpot and other Java JIT compilers have an advantage over static
optimizers such as you describe - they can account for current run-time
conditions.
For example, it might be that none but one thread are using a section of
code so all synchronization operations can be removed for a while.
Or perhaps there are no aliases extant for a given member variable, so
it is safe to enregister the value for a while, even though statically
it would not be safe.
HotSpot also will "unJIT" code - go back to the interpreted bytecode and
drop the machine-code compilation - when circumstances change.
I wasn't focusing solely on static compilers, as a lot of this applies
to JIT compilers as well.
yes, but the question would be how many of these would risk compromising
the ability of the VM to readily switch between the JIT output and bytecode.
very possibly, the JIT would be focusing more on optimizations which
would not hinder its own operation.
an example would be maintaining "sequential consistency", where
theoretically, an optimizer would alter the relative order in which
operations take place, or reorganize the control flow within a method, ...
although possible, this would hinder the ability to easily jump into or
out-of the JITed output code, so a JIT would likely refrain from doing
so (upholding the behavior that events take place in the native code in
the same relative order as they appear in the bytecode, ...).
very likely, the JIT would also arrange that the overall state is
consistent at points where it may jump into or out of the generated code
(all values properly stored in their respective variables, ...).