gcc 4.8 and SPEC benchmark

Johannes Bauer · Apr 19, 2013

Hi group,

with the 4.8 release gcc announces that it'll break some code of which
the correct compilation relied on UB:
http://gcc.gnu.org/gcc-4.8/changes.html

Namely, the SPEC 2006 is broken in that revision. An explanation of the
assumptions that gcc makes is given here:
http://blog.regehr.org/archives/918

From a language-standpoint, gcc is doing perfectly fine: Garbage in,
garbage out. My question is: why does a benchmark like SPEC (which is
quite popular) consist of code that actually includes UB? It sounds like
a recipe for disaster. This apparently also affects some real-world
H.264-code (ffmpeg et al). Is there a reason for this? Why?

Best regards,
Johannes

--

Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>

Eric Sosman · Apr 19, 2013

[...]
From a language-standpoint, gcc is doing perfectly fine: Garbage in,
garbage out. My question is: why does a benchmark like SPEC (which is
quite popular) consist of code that actually includes UB? [...]

My question is, "Why would bugs in benchmarks surprise anyone?"
Or, "Why would anyone expect that turning code into a benchmark
would rid it of bugs?"

The SPEC benchmarks were derived from real-world programs, those
real-world programs were probably not bug-free, some of those bugs
survived SPECification (and others may have been introduced in the
process), so the SPEC benchmarks have bugs. <Shrug.>

It seems to me your question is just a special case of "Why
does code have bugs?"

Johannes Bauer · Apr 19, 2013

It seems to me your question is just a special case of "Why
does code have bugs?"

Not really -- with normal bugs, you don't "fix" the compiler, you fix
the buggy code. Not in this case:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53265

Basically "fixes" the compiler to not optimize. This blows my mind.

Regards,
Johannes

--

Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>

Eric Sosman · Apr 19, 2013

Not really -- with normal bugs, you don't "fix" the compiler, you fix
the buggy code. Not in this case:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53265

Basically "fixes" the compiler to not optimize. This blows my mind.

Sorry for misunderstanding the thrust of your question. As
to why the GCC folks chose to kludge the compiler to let the buggy
code slide by, it seems more of a "marketing" decision than a
technical one. The new compiler exposed bugs that had (apparently)
gone undetected for a long time, in code that is both popular and
very difficult to change (fixing SPEC might invalidate a huge
pile of already-published measurements, or at least make them
incomparable with new ones). Or, "Hey, everybody: This new GCC
produces faster code than the old one. To see how much faster,
just run the SPEC benchmarks, oh, hey, wait, ..."

Compilers, like other programs, exist to satisfy a set of
needs, both technical and non-technical. Sometimes those needs
are in conflict.

Ken Brody · Apr 19, 2013

Sorry for misunderstanding the thrust of your question. As
to why the GCC folks chose to kludge the compiler to let the buggy
code slide by, it seems more of a "marketing" decision than a
technical one. The new compiler exposed bugs that had (apparently)

[...]

Consider, for example, when the IBM-PC was first introduced, there were bugs
in the BIOS. Clones, for the sake of "100% compatibility", purposely
included those same bugs in their BIOS, lest some program fail to run
because it depended on the buggy behavior.

glen herrmannsfeldt · Apr 19, 2013

Well, at the higher optimization levels it is not unusual for there
to be cases where the optimization shouldn't be done, but the compiler
has a difficult time detecting those cases.

A popular optimization is to move invariant expressions out of loops,
assuming that the expression will be executed a large number of times.

It is not always possible to decide if the move is safe.

Sorry for misunderstanding the thrust of your question. As
to why the GCC folks chose to kludge the compiler to let the buggy
code slide by, it seems more of a "marketing" decision than a
technical one. The new compiler exposed bugs that had (apparently) [...]

Click to expand...

Consider, for example, when the IBM-PC was first introduced, there were bugs
in the BIOS. Clones, for the sake of "100% compatibility", purposely
included those same bugs in their BIOS, lest some program fail to run
because it depended on the buggy behavior.

Q: What is the difference between a bug and a feature?

A: A feature is documented.

Since the BIOS commented assembly code was published (and copyrighted)
it was well documented. Still, some of the features were less useful
than they might have been.

There have been a number of cases where new hardware (or emulation in
hardware or software) had to implement bugs in the original, or failed
due to lack of such.

Some stories I remember had to do with incompatibilities between the
8080, Z80, and NEC V20 in 8080 emulation mode. Some I believe had to
do with flag bits that were left undefined in the 8080, but that some
programs (especially the early MS BASIC) depended on the undocumented
implementation on the 8080.

In the case of the IBM PC, many programs didn't use the BIOS calls, but
went directly to hardware, mostly for speed reasons. Microsoft Flight
Simulator was the favorite test case for clone hardware. So, not only do
you have to implement the BIOS bugs, but the hardware bugs as well.

The 20 bit address space of the 8088, would wrap when the
segment<<4+offset was too big, but on the 80286 and later, with a 24
bit address bus, would not wrap. Hardware was added to 80286 and later
machines to zero address bit 20 when running in the appropriate mode.

Certainly seems to me a bug for software to depend on address wrapping,
but a hardware fix was needed.

-- glen

Jorgen Grahn · Apr 24, 2013

Well, at the higher optimization levels it is not unusual for there
to be cases where the optimization shouldn't be done, but the compiler
has a difficult time detecting those cases.

A popular optimization is to move invariant expressions out of loops,
assuming that the expression will be executed a large number of times.

It is not always possible to decide if the move is safe.

Simple -- if it's not possible to decide, you don't do the optimization.

But I think you're really talking about optimizations which may not be
optimizations. The code still works, but performance-wise it was a
bad idea. *That* will always be something compilers suffer from.

/Jorgen

Ken Brody · Apr 25, 2013

Consider, for example, when the IBM-PC was first introduced, there were bugs
in the BIOS. Clones, for the sake of "100% compatibility", purposely
included those same bugs in their BIOS, lest some program fail to run
because it depended on the buggy behavior.

Click to expand...

Q: What is the difference between a bug and a feature?

A: A feature is documented.

[...]
Some stories I remember had to do with incompatibilities between the
8080, Z80, and NEC V20 in 8080 emulation mode. Some I believe had to
do with flag bits that were left undefined in the 8080, but that some
programs (especially the early MS BASIC) depended on the undocumented
implementation on the 8080.

[...]

Then there's the case where Intel specifically said "reserved for future
use" on hardware interrupt 5. Microsoft decided to use that interrupt for
the "print screen" function.

Lo and behold, the 286 came out, and used that interrupt for a hardware
fault. Now, every MS-DOS system had to put code at that interrupt trap that
would check the instruction just executed, and if it was an "INT 5" opcode,
go to the "print screen" code, otherwise, go to the fault handler.

Given that Microsoft documented their use of INT 5, I guess this qualifies
as a "feature"?

russell.gallop · Apr 30, 2013

It seems to me your question is just a special case of "Why

does code have bugs?"

I think it's even worse than benchmarks are code so have bugs. Code has bugs but the fact that benchmark code is preserved and kept the same for a long period of time means that the code lags the tools for finding bugs in a way that actively developed code doesn't.

I'd like to pretend I can spot all bugs in code I write but I'm probably more dependent on the tools I use.

Duplicate integer values in enum	6	Mar 25, 2014
Merging of string literals guaranteed by C std?	12	May 25, 2012
Signed mod unsigned	29	Jun 6, 2012
Possible bug with stability of mimetypes.guess_* function output	10	Feb 7, 2014
Greedy parsing of argparse/positional arguments	0	Nov 20, 2012
Looking for right idiom	8	Aug 23, 2012
Nice solution wanted: Hide internal interfaces	14	Oct 29, 2012
Relative imports in packages	0	Nov 9, 2012

gcc 4.8 and SPEC benchmark

Johannes Bauer

Eric Sosman

Johannes Bauer

Eric Sosman

Ken Brody

glen herrmannsfeldt

Jorgen Grahn

Ken Brody

russell.gallop

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads