I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.
So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about
"Conditional jump or move depends on uninitialised value(s)"
Most of these came from c code, a few from Intel libraries such
as _intel_fast_memcmp or __intel_sse2_strlen
They might have occurred inside those functions, but the real problem is
the arguments that were passed to those functions, directly or
indirectly. You should use the options that allow valgrind to transfer
control to a debugger, and then trace your way up the call stack until
you're inside your own code, so you can find out what it did to trigger
that message.
With the proper command line arguments (which I don't remember right
now), if you build your executable in debug mode, valgrind can tell you
the memory address containing the uninitialized value, and where the
object was allocated that contains that address. This is not a precise
identification: if it points to the start of a block, it could be
referring to any of the objects initialized in that vicinity. However,
if you check the address of each of those items, you should find one
that fits.
So I presume these are not really fatal, or else the program
would not get to first base. ...
They are not, in themselves, fatal, otherwise the program would have
halted. However, those non-fatal defects are likely to have been at
least part of the cause of other defects. The message means that you
have a variable somewhere whose value was never initialized, and your
program is making a decision about what to do next based upon the
uninitialized value of that variable. That is almost always the result
of a code defect (I say "almost always" only because, on certain poorly
protected systems, malware sometimes looks at uninitialized memory in
the hopes that it still contains valuable information left over from the
last thing it was used for).
Assuming that your code was actually intended to be making a decision
based upon an initialized value, the fact that it isn't doing so means
that your program will not be functioning the way that you intended.
... So when using valgrind, should
one start at the end and work back, as the last few are
more likely to indicate what killed it?
No, as with all debugging, it's generally better to remove the first
problem it finds, because that problem might have caused the other
problems, and even if it did not, the symptoms of that problem might
interfere with identification of the other problems.