Books for advanced C++ debugging

Anand Hariharan · Jul 10, 2009

On Jul 10, 3:05 am, (e-mail address removed) (Pascal J. Bourguignon)
wrote:
[snip]

The problem is that C compiler writers don't bother writting the
run-time checks that would detect these bugs, much less doing the type
inference that would be needed to detecht a small number of them at
compilation-time.

Click to expand...

You seem to be taking the opinion that compilers should catch all
undefined behavior.

No, he does not seem to. He did say "detecht [sic] a *SMALL NUMBER*
of them at
compilation-time." (my emphasis).

C++ is not Java. C++'s stated primary design goals
include
- runtime performance comparable with assembly
- don't pay for what you don't use
- portable
- easy to write code / programmer productivity (with less relative
emphasis on this one IMHO)

None of those goals come in the way of the *compiler* pointing out
dubious code. AFAICT, no one wishes C++ prevent all instances of UB.
Code such as -
int *p = reinterpret_cast<int *>(0x123abc);
*p = 0x456def;
- is well out of bounds of the standard, but there are several people
who would want to be able to write code like that. Except for being
an annoyance, I don't see why anyone would want the compiler to NOT
indicate such a code as dubious.

With these design goals in mind, it is not reasonable to expect a
compiler to catch all possible undefined behavior or errors. To do
that would necessarily restrict the language so that it's less
comparable to assembly in speed and/or you start paying for things you
don't use.

You seem to be under the impression that the compiler "catching
undefined behaviour" is synonymous with either *disallowing* undefined
behaviour or imposing a runtime penalty to track them. OP clearly
indicated that he only wishes the compiler to indicate to him that he
might be doing something that leads to UB.

In the C and C++ community, the assumption is that the programmer
knows what he's doing, and with that assumption, you can (relatively)
easily write really fast and portable code.

Assumption is made within reason, of course. Otherwise, the standard
would allow for many more implicit conversions than it currently
allows.

- Anand

Joshua Maurice · Jul 11, 2009

You seem to be under the impression that the compiler "catching
undefined behaviour" is synonymous with either *disallowing* undefined
behaviour or imposing a runtime penalty to track them. OP clearly
indicated that he only wishes the compiler to indicate to him that he
might be doing something that leads to UB.

Within C++ as the language rules stand, determining at compile-time if
the program can give undefined behavior through an aliasing violation
is in general undecidable, equivalent to the halting problem.

I did not claim such "catching undefined behavior" and "disallowing
certain constructs andor runtime checks" are not synonymous. However,
they are related.

I believe I was correct and reasonable when I interpreted that the OP
was asking for a compiler which caught all bad aliasing, and I believe
I was correct and reasonable when I stated that doing so is impossible
without disallowing certain kinds of casting or imposing additional
runtime checks (both of which are contrary to the design goals of C+
+). I noted that it's quite reasonable and desirable for a "debugging
compiler" to add runtime checks to catch all such aliasing errors in
development. I disagree with the overall theme of your reply: that I
was incorrect in my statement of fact or that I was incorrect in my
interpretation of the OP's desire to have all aliasing violations
caught.

Joshua Maurice · Jul 11, 2009

I did not claim such "catching undefined behavior" and "disallowing
certain constructs andor runtime checks" are not synonymous. However,
they are related.

I meant: "I did not claim they are synonymous". That's what I get for
typing late at night.

jacob navia · Jul 11, 2009

Joshua said:
jacob navia said:

"Your code has aliasing problems. This is not the place to
educate you about C/C++"
[snip]

Since we do NOT rewrite all our software every time the C++ standard
changes, how can we find this kind of bugs?

Click to expand...

I'm sorry that you're working with code which violates the C89
standard and the C++ standard. Someone has to fix it, and that someone
appears to be you. I don't have much to add beyond that which has been
mentioned already in this thread as to strategies to accomplish this.
Either way, expecting help from gcc developers from a false bug report
is unreasonable. Maybe in a feature request ...

Look.

Using the gcc compiler without any optimizations produces perfectly
valid code that works as intended. Using the 64 bit
gcc compiler (versions 3.3 to 4.3) produces the intended
result even with maximum optimization.

Using the PowerPC IBM compiler works with optimizations
and without them.

This code was working with gcc-3.3.6 and stopped working only with
gcc 4.1.2 with optimization levels higher than 2 and only in linux 32
bits. MSVC windows 32 compiler compiles that code correctly.

How are the maintainers supposed to know that?

Because after 2 weeks of work and work finally we examined the
gcc generated assembler and discovered that gcc generates code
to read from an UNINITIALIZED memory location.

When I write

char tab[5];
char *p = tab;

p += 10;
char c = *p;

this is UB too but will be UB in debug mode AND in release mode.
The value in C will be undefined, but it will be CONSISTENT.

You are just saying the obvious:

C++ is not maintainable without huge efforts.

It is very easy to laugh at the maintenance programmers here. They
are just stupid of course, since if they weren't, they wouldn't be
in maintenance of course!

> Either way, expecting help from gcc developers from a false bug report
> is unreasonable. Maybe in a feature request ...
>

yes, "

"

VERY funny.

gcc (and this is a feature of course, not a bug) generates code that it
is impossible to follow with -O2 or -O3. Then, the gcc compiler
considers that it has the right to generate code that reads from
an uninitialized memory location without even caring to see if they
could (at least) emit a warning.

Vaclav Haisman · Jul 11, 2009

jacob navia wrote, On 11.7.2009 14:27:

Joshua said:
Joshua said:

jacob navia said:

"Your code has aliasing problems. This is not the place to
educate you about C/C++"
[snip]

Since we do NOT rewrite all our software every time the C++ standard
changes, how can we find this kind of bugs?

Click to expand...

I'm sorry that you're working with code which violates the C89
standard and the C++ standard. Someone has to fix it, and that someone
appears to be you. I don't have much to add beyond that which has been
mentioned already in this thread as to strategies to accomplish this.
Either way, expecting help from gcc developers from a false bug report
is unreasonable. Maybe in a feature request ...

Click to expand...

Look.

Using the gcc compiler without any optimizations produces perfectly
valid code that works as intended. Using the 64 bit
gcc compiler (versions 3.3 to 4.3) produces the intended
result even with maximum optimization.

I think you are misunderstand what UB means. There is no such thing as
"perfactly valid code" when you are invoking UB. Not from the POV of the
standard.

Using the PowerPC IBM compiler works with optimizations
and without them.

This code was working with gcc-3.3.6 and stopped working only with
gcc 4.1.2 with optimization levels higher than 2 and only in linux 32
bits. MSVC windows 32 compiler compiles that code correctly.

How are the maintainers supposed to know that?

Maintainers are supposed to know the language. struct { void* a; void* b; }
x; int64_t y = *(int64_t*)&x; is glaring bug screaming UB.

Because after 2 weeks of work and work finally we examined the
gcc generated assembler and discovered that gcc generates code
to read from an UNINITIALIZED memory location.

When I write

char tab[5];
char *p = tab;

p += 10;
char c = *p;

this is UB too but will be UB in debug mode AND in release mode.
The value in C will be undefined, but it will be CONSISTENT.

That is not true. It might be the case for some combinations of OS and
compiler but it is not universal.

Ian Collins · Jul 11, 2009

jacob said:
Using the gcc compiler without any optimizations produces perfectly
valid code that works as intended. Using the 64 bit
gcc compiler (versions 3.3 to 4.3) produces the intended
result even with maximum optimization.

By chance, if the construct invokes UB.

Using the PowerPC IBM compiler works with optimizations
and without them.

By chance, if the construct invokes UB.

This code was working with gcc-3.3.6 and stopped working only with
gcc 4.1.2 with optimization levels higher than 2 and only in linux 32
bits. MSVC windows 32 compiler compiles that code correctly.

That's what happens if the construct invokes UB. Any tool change can
break the fragile code.

How are the maintainers supposed to know that?

Should they care?

Because after 2 weeks of work and work finally we examined the
gcc generated assembler and discovered that gcc generates code
to read from an UNINITIALIZED memory location.

Post the source and the generated assembler.

When I write

char tab[5];
char *p = tab;

p += 10;
char c = *p;

this is UB too but will be UB in debug mode AND in release mode.
The value in C will be undefined, but it will be CONSISTENT.

No, it won't. It's undefined. p might point at a location that was
written by the last programme to run. Even if the value did appear
consistent, as soon as the surrounding code changes, it is likely to change.

You are just saying the obvious:

C++ is not maintainable without huge efforts.

Poorly written code in any language that relies on undefined behaviour
is not maintainable. C and C++ just happen to give you more rope to
hang your self. At least C++ has attempted to shorten the rope by
adding specific and easily searchable casts.

It is very easy to laugh at the maintenance programmers here. They
are just stupid of course, since if they weren't, they wouldn't be
in maintenance of course!

Boy you have a flea up your arse this weekend. Most people here are
probably maintenance programmers. Anyone not working on a green field
project can be considered a maintenance programmer.

Stuart Redmann · Jul 14, 2009

[snipped discussion about run-time error when OP accessed
uninitialized memory. OP complained that C++ compiler (gcc) could not
detect this even though its warning level was set to highest]

You seem to be taking the opinion that compilers should catch all
undefined behavior. C++ is not Java. C++'s stated primary design goals
include
- runtime performance comparable with assembly
- don't pay for what you don't use
- portable
- easy to write code / programmer productivity (with less relative
emphasis on this one IMHO)

With these design goals in mind, it is not reasonable to expect a
compiler to catch all possible undefined behavior or errors. To do
that would necessarily restrict the language so that it's less
comparable to assembly in speed and/or you start paying for things you
don't use.

In the C and C++ community, the assumption is that the programmer
knows what he's doing, and with that assumption, you can (relatively)
easily write really fast and portable code.

Just to add my two cents:
1. C++ lets you do everything, so chances are not bad that you can go
beyond your depth. In contrast to this, JAVA restricts your abilities
(no messing around with pointers), which makes your code inherently
safer. I think both are inferior to programming languages like Ada95.
Ada has a real type system (something that neither C++ nor JAVA has)
and will perform zounds of checks (it is the only language I know that
handles integer overflows). Since these checks give you a lot of
performance penalties, you have to provide additional information
about which checks can be omitted. This is maybe the major difference
between C++ and Ada95: Out of the box C++ provides few checks in favor
of speed, whereas Ada95 has all checks turned on. So C++ you have to
OPT-IN for run-time checks, Ada95 has the converse OPT-OUT philosophy.
Needless to say, nobody uses Ada95 except the Bundeswehr in Germany
(AFAIK).

2. Maybe even such fancy languages like Ada cannot reliably detect
memory aliasing issues because it may be the case that this task is
Turing hard. I haven't had time to think about it in detail, but I
think that you could reduce the HALTING problem to the problem of
accessing uninitialized memory through aliasing. This would explain
why the compiler industry didn't come up with a "decent" compiler: It
just may be that detecting _ALL_ such errors is simply impossible
(which doesn't mean that there may be a good heuristic algorithm for
detecting most of the obvious bugs).
I further assume that most cases where you get UB are also due to the
impossibiliy to check for such cases algorithmically.

@jacob:

Don't complain about the gcc team, the problem is definitely in your
code. Since you mess around with raw pointers, you're asking for
trouble (or rather the guy that wrote the code).
Cheer up, you have one of the worst jobs of the world of programming:
Inheriting code for your predecessor (some people say that this is
what object orientation is all about ;-), and having to find the bugs
in this code. Practically noone will give you credit for this, you're
more or less just a scape-goat. Personally, I have made little else
than re-write code that has been written by physicists (which should
be prohibited to writing code by law

for the last ten years. I can
imagine that bugfixing such code must be a lot more frustrating, so be
assured that you have our deepest sympathy.

Regards,
Stuart

James Kanze · Jul 14, 2009

On 11 Jul., 00:54, Joshua Maurice <[email protected]> wrote:

[snipped discussion about run-time error when OP accessed
uninitialized memory. OP complained that C++ compiler (gcc)
could not detect this even though its warning level was set to
highest]

You seem to be taking the opinion that compilers should
catch all undefined behavior. C++ is not Java. C++'s stated
primary design goals include
- runtime performance comparable with assembly
- don't pay for what you don't use
- portable
- easy to write code / programmer productivity (with less relative
emphasis on this one IMHO)
With these design goals in mind, it is not reasonable to
expect a compiler to catch all possible undefined behavior
or errors. To do that would necessarily restrict the
language so that it's less comparable to assembly in speed
and/or you start paying for things you don't use.

Click to expand...

That's not strictly true. Both the C and the C++ standards were
designed so that all undefined behavior can be caught.
Sometimes at a significant price, which means that very few
compilers do so. But there have been some (CenterLine, I
think), and of course, tools like Purify and valgrind catch a
lot (but not all) of the undefined behavior (without rendering
the implementation non-conform).

Just to add my two cents:
1. C++ lets you do everything, so chances are not bad that you
can go beyond your depth. In contrast to this, JAVA restricts
your abilities (no messing around with pointers), which makes
your code inherently safer.

That's provably false. Java seriously restricts what you can
do, to the point of not allowing you to write safe code (for a
sufficiently high enough level of "safe"). Basically, C++
doesn't to anything by default to provide safety, but allows you
(or your organization) to take whatever steps are needed for the
level of safety you need. Java imposes a very specific level of
safety. If it's adequate, fine---you don't have to do anything
else. If it's not, you're stuck, because there's nothing else
you can do. (The specific level Java imposes is NOT adequate
for most of what I do.)

I think both are inferior to programming languages like Ada95.

From what I've heard of it, you're probably right. But I've
never had the occasion to really use it, to be sure.

Ada has a real type system (something that neither C++ nor
JAVA has) and will perform zounds of checks (it is the only
language I know that handles integer overflows).

Again, C++ leaves behavior in case of overflow of signed
integral types or floating point types "undefined behavior". So
an implementation can perform all of the checks it wants. The
problem is that most implementations defined the behavior much
like Java does, which is useless (at least for "safe" software).
And the real problem is that most programmers accept such
implementations, and consider them normal---that most
programmers don't care about safety. (I've written C code in
the past which verified integral overflow, and I could do it in
Java or C++. But such code will never be as efficient as if the
compiler did it.)

Since these checks give you a lot of performance penalties,

Are you sure of that. I seem to recall reading that in typical
programs, a decent compiler is able to eliminate 90% of the
checks entirely. And if the compiler is generating the code,
it's one extra instruction per operation for the checks which
cannot be eliminated (at least on the machines I'm familiar
with). Not a killer for most applications.

you have to provide additional information about which checks
can be omitted. This is maybe the major difference between C++
and Ada95: Out of the box C++ provides few checks in favor of
speed, whereas Ada95 has all checks turned on. So C++ you have
to OPT-IN for run-time checks, Ada95 has the converse OPT-OUT
philosophy. Needless to say, nobody uses Ada95 except the
Bundeswehr in Germany (AFAIK).

Most C++ compilers don't allow you to opt-in, even though it's
the only reasonable option for most software.

2. Maybe even such fancy languages like Ada cannot reliably
detect memory aliasing issues because it may be the case that
this task is Turing hard.

I'm not sure which aliasing issues you're concerned about, but a
lot of languages I've seen used in the past don't allow you to
take the address of a variable (so pointers can only come from
dynamic allocation), use garbage collection (so a pointer can
never point to a non-allocated object---or worse, memory that
has since been allocated to a different object), and don't
support pointer arithmetic, so pointers can't point into the
middle of objects. Under such conditions, aliasing isn't
a difficult problem.

I haven't had time to think about it in detail, but I think
that you could reduce the HALTING problem to the problem of
accessing uninitialized memory through aliasing. This would
explain why the compiler industry didn't come up with a
"decent" compiler: It just may be that detecting _ALL_ such
errors is simply impossible (which doesn't mean that there may
be a good heuristic algorithm for detecting most of the
obvious bugs). I further assume that most cases where you get
UB are also due to the impossibiliy to check for such cases
algorithmically.

Compile time or runtime. The C++ standard certainly allows
"fat" pointers, which contain enough information for the runtime
to be able to detect all undefined behavior. Such an
implementation would run slower; an even greater problem is that
it wouldn't be compatible with the defined ABI of most
platforms.

Zachary Turner · Jul 14, 2009

Sure sure. How helpful. This is a HUGE code base of MB and MB of
C++. I did not write this code. It is my job to make it work, that's
all.

Obviously I am being blamed for asking a question, since asking
questions is obviously a NO NO here.

(If you ask a question it means you do not know everything,
contrary to the gurus here)

"Don't hack"

And how can I know if in those MBs of code there is a hack?

That was my question. Now, please answer THAT, and if you can't
I hope you can at least keep your mouth SHUT!

The job of the C++ compiler is simply to compile your code. Yes, it
could in theory do all the things you want it to do, because sure
there have been techniques invented that do such things. But might I
suggest your company invest in a static code analysis tool? While a C+
+ *compiler's* job is to compile your code according to the standard,
a static code analysis tool's purpose is exactly what you seem to be
looking for. So there's really no point in GCC attempting to add
these kind of features because they take programmer time away from
actually making the compiler more robust, stronger, producing faster
code, and confirming to the evolving standard. While the static code
analysis writers, on the other hand, have all the time in the world to
do exactly what you're looking for. There are a number of really good
ones available, possibly even some free ones. I would have a look on
Google for some if I were you.

Zachary Turner · Jul 14, 2009

gcc (and this is a feature of course, not a bug) generates code that it
is impossible to follow with -O2 or -O3. Then, the gcc compiler
considers that it has the right to generate code that reads from
an uninitialized memory location without even caring to see if they
could (at least) emit a warning.

Warnings are emitted at compile time. Reads from uninitialized memory
happen at run time. Doing a complete static data flow analysis of
your program to detect this is not an easy problem in the general
case. Use a combination of static & dynamic code analysis tools.
Honestly, you could have detected the exact location of the error in
about 5 minutes using Valgrind. Although you then would have been
scratching your head, wondering why the heck that was uninitialized in
the first place. Then a static analysis tool would have answered that
for you in about 5 minutes.

Make it part of your build process to fix all the code analysis
warnings in your codebase once a week from now on, much like you do to
fix all GCC warnings. GCC's a compiler, software development isn't a
one-tool job. You need debuggers, profilers, static analysis, dynamic
analysis, source code control, etc. I realize you're frustrated
spending 2 weeks fixing this bug which you think is a stupid bug and
should never happened in the first place. But hey, you learned an
important lesson. Don't let it happen in the first place next time.
Use the right tool for the job.

Pascal J. Bourguignon · Jul 15, 2009

Anand Hariharan said:
On Jul 10, 5:05 am, (e-mail address removed) (Pascal J. Bourguignon)
wrote:
(...)

Notice that of the same sort of bug that should be checked at run-time
are the array overflows and invalid pointers dereferences. The C and
C++ standard explicitely say that derefering a pointer outside of its
pointed array is undefined, even holding a pointer outside of its
array limits (plus 1) is undefined...

Click to expand...

Trying to read the value of an uninitialised variable results in UB as
well.

char a[5];
char* p=a; // valid
p+=4; // valid
*p; // valid
p++; // valid
*p; // undefined
p++; // undefined

Click to expand...

The first *p that you state as valid results in undefined behaviour
because 'a' is not initialised.

Oops! Make it: char a[5]="abcd";

Pascal J. Bourguignon · Jul 15, 2009

Joshua Maurice said:
jacob navia said:

I thought there could be a book with *advanced* C++ debugging but a
Google search, then an Amazon search yielded nothing
but books for beginners or user manuals of Visual C++ debugger
written in a book form.

Click to expand...

Is there a combination of gcc warnings (that is NOT included in Wall
since we already have that) that could be useful here?

Click to expand...

I wouldn't hold my breadth.

Is there a tool somewhere that could diagnose this problem?

Click to expand...

It's done by the Zeta-C compiler (since the target is the
LispMachine). Of course, today it might be easier to build a time
machine than to find a LispMachine with the Zeta-C compiler, and
anyways, it doesn't solve the problem of C++.

Perhaps one of the C/C++ interpreters are doing this type check. Try
them.

C INTERPRETERS:
CINT -http://root.cern.ch/root/Cint.html
EiC -http://eic.sourceforge.net/
Ch -http://www.softintegration.com
[ MPC (Multi-Platform C -> Java compiler) -http://www.axiomsol.com]

Otherwise, your best chance would be to patch them, or gcc (or
lcc-win32), to generate tagged data and implement run-time type
checks.

Notice that of the same sort of bug that should be checked at run-time
are the array overflows and invalid pointers dereferences. The C and
C++ standard explicitely say that derefering a pointer outside of its
pointed array is undefined, even holding a pointer outside of its
array limits (plus 1) is undefined...
[snip]

The problem is that C compiler writers don't bother writting the
run-time checks that would detect these bugs, much less doing the type
inference that would be needed to detecht a small number of them at
compilation-time.

Click to expand...

You seem to be taking the opinion that compilers should catch all
undefined behavior.

Not necessarily ALL the implementations (compilers or interpreters),
but there should be such implementations, and those should be the
implementation used most of the time, because most of the time, C++
programs are mere application programs that would benefit much more
from run-time checking than from fast instructions (the more so on
modern processors, where it's pointless to go fast in the processor,
since you always are waiting on the RAM).

C++ is not Java. C++'s stated primary design goals
include
- runtime performance comparable with assembly

For most programs, we don't care about the speed.

- don't pay for what you don't use

I wish you'd paid for the uncaught bugs left in executables that
affect the users.

[...]

- easy to write code / programmer productivity (with less relative
emphasis on this one IMHO)

Programmers would be more productive if the implementations helped to
catch bugs at run-time.

With these design goals in mind, it is not reasonable to expect a
compiler to catch all possible undefined behavior or errors.

Implementations of other programming languages are able to do so, why
not implementations of C++? It's perfectly reasonable to expect it,
and as a user of C++, I'd rather use such an implementation for 100%
of my C++ development, and 99% of my C++ program deployment.

To do
that would necessarily restrict the language so that it's less
comparable to assembly in speed and/or you start paying for things you
don't use.

Not at all, the restrictions are already in the language. (Well,
s/undefined behavior/and error should be signaled a compilation time
or thrown at run-time/).

In the C and C++ community, the assumption is that the programmer
knows what he's doing, and with that assumption, you can (relatively)
easily write really fast and portable code.

But nobody need really fast code. What we need is correct code, and
code that detects automatically when it goes awry, instead of going on
with invalid data in the memory, or worse, viruses and worms.

That someone hasn't written a "debugging" compiler which catches all
possible violations of the standard, as a debugging tool only, is
indeed a shame if true.

Ah! You're conceding my point. Thank you.

However, Valgrind comes to mind as useful tool
in this area.

But it's far from what we could expect.

Also, various versions MSVC do have optional runtime
bounds checking and other runtime checking.

Good!

Unfortunately on unix I know of no compiler implementing run-time
checks (only interpreters do, unfortunately, C++ interpreters have too
many restrictions on the language implemented so they're generally
useless).

Finally, C interpreters
can catch all such misuse which occurs at runtime, the existence of
which you reference in your post. Thus, it appears the tools which you
bemoan do not exist, do indeed exist, and thus I am confused by your
self contradictions.

AFAIK, there's no production-level implementation of C++ on unix
(Linux) providing run-time checks for undefined behavior.

The interpreters who indeed provide run-time checks, don't implement
the full C++ language, so they're not usable on real programs.
(eg. underC, http://home.mweb.co.za/sd/sdonovan/underc.html doesn't
implement multiple-inheritance).

Basically, what we'd like is an option of gcc/g++ (independent of the
optimization level) which would let you deploy programs with full
run-time checks. No buffer overflow would go undetected in an
executable compiled with that option.

Pascal J. Bourguignon · Jul 15, 2009

Joshua Maurice said:
Within C++ as the language rules stand, determining at compile-time if
the program can give undefined behavior through an aliasing violation
is in general undecidable, equivalent to the halting problem.

This is the reason why it has to be done at run-time, when it occurs.

I did not claim such "catching undefined behavior" and "disallowing
certain constructs andor runtime checks" are not synonymous. However,
they are related.

I believe I was correct and reasonable when I interpreted that the OP
was asking for a compiler which caught all bad aliasing, and I believe
I was correct and reasonable when I stated that doing so is impossible
without disallowing certain kinds of casting or imposing additional
runtime checks (both of which are contrary to the design goals of C+
+).

Notice that the design goals of Common Lisp are the same. However,
most Common Lisp implementation implement run-time checks most of the
time. (It is possible to disable most of the run-time checks in speed
critical parts).

Pascal J. Bourguignon · Jul 15, 2009

jacob navia said:
[...]
How are the maintainers supposed to know that?

By knowing the language, indeed. Some reading between the lines has
to be done, but still, it's well known that these constructs have no
standard defined behavior.

[...]

You are just saying the obvious:

C++ is not maintainable without huge efforts.

It is very easy to laugh at the maintenance programmers here. They
are just stupid of course, since if they weren't, they wouldn't be
in maintenance of course!

We may also laugh at the managers who choosed to develop the software
in C++ in the first place, when better programming languages existed,
exist, and will exist.

gcc (and this is a feature of course, not a bug) generates code that it
is impossible to follow with -O2 or -O3.

Yes, but it's FAST!

Then, the gcc compiler considers that it has the right to generate
code that reads from an uninitialized memory location without even
caring to see if they could (at least) emit a warning.

Yes, the C++ standard explicitely allows it to do so.
Bad standard, change standard.

That said, I don't know a lot of language whose standard doesn't give
a sizeable amount of leaway to the implementations. Even Common Lisp
leaves a lot of freedom to the implementations, so you have a lot of
constructs that are implementation dependant.

When you want to write portable code, you have to be careful not to
use implementation dependant (including option dependant) constructs.
Yours was one.

Pascal J. Bourguignon · Jul 15, 2009

James Kanze said:
Are you sure of that. I seem to recall reading that in typical
programs, a decent compiler is able to eliminate 90% of the
checks entirely. And if the compiler is generating the code,
it's one extra instruction per operation for the checks which
cannot be eliminated (at least on the machines I'm familiar
with). Not a killer for most applications.

Indeed. Modern processors (eg. as old as 680x0) provide software
traps to catch overflow/undeflow that used to cost very little, and
that cost nothing with pipelined processors, when the trap is not
taken.

I'm not sure which aliasing issues you're concerned about, but a
lot of languages I've seen used in the past don't allow you to
take the address of a variable (so pointers can only come from
dynamic allocation), use garbage collection (so a pointer can
never point to a non-allocated object---or worse, memory that
has since been allocated to a different object), and don't
support pointer arithmetic, so pointers can't point into the
middle of objects. Under such conditions, aliasing isn't
a difficult problem.

Compile time or runtime. The C++ standard certainly allows
"fat" pointers, which contain enough information for the runtime
to be able to detect all undefined behavior. Such an
implementation would run slower; an even greater problem is that
it wouldn't be compatible with the defined ABI of most
platforms.

Well, you would have to recompile the libraries, but since most if not
all libraries are written in C or C++, there would be no real
difficulty. (Common Lisp has not the same luck here).

James Kanze · Jul 15, 2009

Joshua Maurice said:
Joshua Maurice said:

On Jul 10, 3:05 am, (e-mail address removed) (Pascal J. Bourguignon)
wrote:
[snip]

The problem is that C compiler writers don't bother
writting the run-time checks that would detect these bugs,
much less doing the type inference that would be needed to
detecht a small number of them at compilation-time.

Click to expand...

You seem to be taking the opinion that compilers should
catch all undefined behavior.

Click to expand...

Not necessarily ALL the implementations (compilers or
interpreters), but there should be such implementations, and
those should be the implementation used most of the time,
because most of the time, C++ programs are mere application
programs that would benefit much more from run-time checking
than from fast instructions (the more so on modern processors,
where it's pointless to go fast in the processor, since you
always are waiting on the RAM).

I think that there are some implementations. At least in the
past, CenterLine caught most cases of undefined behavior. I
don't know what its current status is, but it is still being
sold. (http://www.ics.com/products/centerline/objectcenter/,
for more information.)

I agree with you that such a compiler should be the default and
usually used compiler. I have the impression, however, that we
are in a very small minority---at any rate, I don't have the
impression that CenterLine is a market leader. (ICS, which owns
it, seems to push its GUI expertise and products considerably
more.)

[...]

Implementations of other programming languages are able to do
so, why not implementations of C++? It's perfectly reasonable
to expect it, and as a user of C++, I'd rather use such an
implementation for 100% of my C++ development, and 99% of my
C++ program deployment.

Implementations of C++ are capable of doing a lot more than they
do. Apparently, the market doesn't want it. (Should we
conclude that C++ programmers don't care about quality, or
programmer productivity?)

[...]

Good!

But only in the standard library, I think.

Unfortunately on unix I know of no compiler implementing
run-time checks (only interpreters do, unfortunately, C++
interpreters have too many restrictions on the language
implemented so they're generally useless).

My impression is that g++ and VC++ are about equal with regards
to verifications. (VC++ does emit a lot of warnings about using
functions which don't, or can't verify, e.g. strcpy and such.)

[...]

Basically, what we'd like is an option of gcc/g++ (independent
of the optimization level) which would let you deploy programs
with full run-time checks. No buffer overflow would go
undetected in an executable compiled with that option.

Arrays in C are very poorly designed, and C++ has inherited this.
In order to do full run-time checking, you need fat pointers.
Which not only slows the code down considerably, but also breaks
the ABI. If you rigorously avoid C style arrays, and only use
std::vector, g++ does run-time check. (But as soon as you do
something like &v, all bets are off with regards to the
resulting pointer.)

Pascal J. Bourguignon · Jul 15, 2009

For most programs, we don't care about the speed.

And for programs that do care about speed, they still have a lot of
spare time to do checks:

http://www.cs.virginia.edu/papers/Hitting_Memory_Wall-wulf94.pdf

Michael Oswald · Jul 15, 2009

Stuart said:
Just to add my two cents:
1. C++ lets you do everything, so chances are not bad that you can go
beyond your depth. In contrast to this, JAVA restricts your abilities
(no messing around with pointers), which makes your code inherently
safer. I think both are inferior to programming languages like Ada95.
Ada has a real type system (something that neither C++ nor JAVA has)
and will perform zounds of checks (it is the only language I know that
handles integer overflows). Since these checks give you a lot of
performance penalties, you have to provide additional information
about which checks can be omitted.

Well, I am no expert on Ada, but I had a look on Ada 2005 when searching
for other languages to learn and wrote only some simple programs. I
finally changed to Haskell and Ocaml just to learn some new principles
of programming.

Anyway the Ada people claim, that a lot of these checks can be optimised
out by the compiler and the remaining ones are rather inexpensive.

> This is maybe the major difference
between C++ and Ada95: Out of the box C++ provides few checks in favor
of speed, whereas Ada95 has all checks turned on. So C++ you have to
OPT-IN for run-time checks, Ada95 has the converse OPT-OUT philosophy.
Needless to say, nobody uses Ada95 except the Bundeswehr in Germany
(AFAIK).

Even that is not quite true. Have a look at:
http://www.seas.gwu.edu/~mfeldman/ada-project-summary.html

Also, comp.lang.ada is quite active and there is even a new language for
the dotnet framework called A# which is an Ada derivate (like F# is an
ML derivate).

2. Maybe even such fancy languages like Ada cannot reliably detect
memory aliasing issues because it may be the case that this task is
Turing hard.

Ok, my information here is very very unprecise, because I just skimmed
over that chapters, but in Ada 2005 there is some construct where you
have to declare e.g. a pointer to Integer with the keyword ALIASING when
it should have the possibility to be set to already allocated memory,
which allows the compiler to detect such things.

I am absolutely not sure, how safe this is or what the compiler
allows/disallows here, anyone more familiar with Ada could probably explain.

Cheer up, you have one of the worst jobs of the world of programming:
Inheriting code for your predecessor (some people say that this is
what object orientation is all about ;-), and having to find the bugs
in this code. Practically noone will give you credit for this, you're
more or less just a scape-goat.

I second that. I did a lot of maintenance/enhancements to existing C++
systems which sometimes leads you to ludicrous laughs and sometimes to
deep depression

Personally, I have made little else
than re-write code that has been written by physicists (which should
be prohibited to writing code by law for the last ten years.

Quite similar here: develop a System in C++, give it out to about 20
companies to develop/extend/evolve this system, where it uses some very
old libraries/methods which even prevent you from using e.g. valgrind or
even gdb in some cases, feed all of this into the main line and then
give it to poor developers to go on bug-hunt

Not to mention, that the main reason, why it is used is more of a
political issue...

One of our favourite discussions between the developers in my
departement is about bashing this system...

lg,
Michael

Nick Keighley · Jul 15, 2009

[snipped discussion about run-time error when OP accessed
uninitialized memory. OP complained that C++ compiler (gcc)
could not detect this even though its warning level was set to
highest]

You seem to be taking the opinion that compilers should
catch all undefined behavior. C++ is not Java. C++'s stated
primary design goals include
- runtime performance comparable with assembly
- don't pay for what you don't use
- portable
- easy to write code / programmer productivity (with less relative
emphasis on this one IMHO)
With these design goals in mind, it is not reasonable to
expect a compiler to catch all possible undefined behavior
or errors. To do that would necessarily restrict the
language so that it's less comparable to assembly in speed
and/or you start paying for things you don't use.

Click to expand...

Click to expand...

That's not strictly true. Both the C and the C++ standards were
designed so that all undefined behavior can be caught.

really? Where does it say that? Do you mean at compile time or at
run-time?

I'd always thought about half of UB was in the spec precisely because
it was too hard to detect. The other half was hardware stuff things
like
what the modulo operator does with negative numbers

gets()

Sometimes at a significant price, which means that very few
compilers do so. But there have been some (CenterLine, I
think), and of course, tools like Purify and valgrind catch a
lot (but not all) of the undefined behavior (without rendering
the implementation non-conform).

ITYM detecting the access of uninitialized memory through aliasing
at compile time is equivalent to the Halting Problem.

Compile time or runtime. The C++ standard certainly allows
"fat" pointers, which contain enough information for the runtime
to be able to detect all undefined behavior. Such an
implementation would run slower; an even greater problem is that
it wouldn't be compatible with the defined ABI of most
platforms.

I can't quite work out how to break a fat-pointer implementation
but can't you do some very nasty things with printf("%p") and scanf
("%p")?

Jerry Coffin · Jul 15, 2009

[ ... ]

[...]

Good!

Click to expand...

But only in the standard library, I think.

Not so -- recent versions have flags to tell it to include runtime
checks in your code. A short description is available at:

http://msdn.microsoft.com/en-us/library/8wtf2dfz(VS.80).aspx

[ ... ]

Arrays in C are very poorly designed, and C++ has inherited this.
In order to do full run-time checking, you need fat pointers.
Which not only slows the code down considerably, but also breaks
the ABI. If you rigorously avoid C style arrays, and only use
std::vector, g++ does run-time check. (But as soon as you do
something like &v, all bets are off with regards to the
resulting pointer.)

Interestingly, the run-time checks provided by MS VC++ have almost
exactly the same limitation in one respect -- they can track (to a
degree) whether you use uninitialized variables, but taking the
address is treated as equivalent to initialization.

Advanced C++ FAQs: Volume 1 : Fundamentals	0	May 17, 2014
Books recommendations (Advanced C)	2	Mar 16, 2011
C++-books	9	May 23, 2011
Advanced Python books?	4	May 18, 2009
Best advanced books	5	Mar 3, 2008
Unix: advanced c++ debugging techniques	6	Oct 30, 2006
Homework in C - Help Needed	1	Oct 16, 2024
Unraveling Pointers and Arrays in C++: Seeking Expert Advice.	1	Jan 26, 2024

Books for advanced C++ debugging

Anand Hariharan

Joshua Maurice

Joshua Maurice

jacob navia

Vaclav Haisman

Ian Collins

Stuart Redmann

James Kanze

Zachary Turner

Zachary Turner

Pascal J. Bourguignon

Pascal J. Bourguignon

Pascal J. Bourguignon

Pascal J. Bourguignon

Pascal J. Bourguignon

James Kanze

Pascal J. Bourguignon

Michael Oswald

Nick Keighley

Jerry Coffin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads