valgrind spews avalanche of messages

bruce56 · Jun 29, 2013

I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.

So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about
"Conditional jump or move depends on uninitialised value(s)"
Most of these came from c code, a few from Intel libraries such
as _intel_fast_memcmp or __intel_sse2_strlen

So I presume these are not really fatal, or else the program
would not get to first base. So when using valgrind, should
one start at the end and work back, as the last few are
more likely to indicate what killed it?

James Kuyper · Jun 29, 2013

I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.

So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about
"Conditional jump or move depends on uninitialised value(s)"
Most of these came from c code, a few from Intel libraries such
as _intel_fast_memcmp or __intel_sse2_strlen

They might have occurred inside those functions, but the real problem is
the arguments that were passed to those functions, directly or
indirectly. You should use the options that allow valgrind to transfer
control to a debugger, and then trace your way up the call stack until
you're inside your own code, so you can find out what it did to trigger
that message.

With the proper command line arguments (which I don't remember right
now), if you build your executable in debug mode, valgrind can tell you
the memory address containing the uninitialized value, and where the
object was allocated that contains that address. This is not a precise
identification: if it points to the start of a block, it could be
referring to any of the objects initialized in that vicinity. However,
if you check the address of each of those items, you should find one
that fits.

So I presume these are not really fatal, or else the program
would not get to first base. ...

They are not, in themselves, fatal, otherwise the program would have
halted. However, those non-fatal defects are likely to have been at
least part of the cause of other defects. The message means that you
have a variable somewhere whose value was never initialized, and your
program is making a decision about what to do next based upon the
uninitialized value of that variable. That is almost always the result
of a code defect (I say "almost always" only because, on certain poorly
protected systems, malware sometimes looks at uninitialized memory in
the hopes that it still contains valuable information left over from the
last thing it was used for).

Assuming that your code was actually intended to be making a decision
based upon an initialized value, the fact that it isn't doing so means
that your program will not be functioning the way that you intended.

... So when using valgrind, should
one start at the end and work back, as the last few are
more likely to indicate what killed it?

No, as with all debugging, it's generally better to remove the first
problem it finds, because that problem might have caused the other
problems, and even if it did not, the symptoms of that problem might
interfere with identification of the other problems.

Malcolm McLean · Jun 29, 2013

I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.

So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about
"Conditional jump or move depends on uninitialised value(s)"
Most of these came from c code, a few from Intel libraries such
as _intel_fast_memcmp or __intel_sse2_strlen

So I presume these are not really fatal, or else the program
would not get to first base. So when using valgrind, should
one start at the end and work back, as the last few are
more likely to indicate what killed it?

It's unlikely that intel_fast_memcmp really has a bug in it. More
likely it's written in a strange, highly optimal way that valgrind
can't understand.
When the program segfaults, valgrind will halt. You your last
message should be the instruction that cased it to crash. However
the root of the problem is unlikely to be there. What will have
happened is that a pointer will have been set to an invalid
value earlier on. So second port of call, after the crash itself,
is finding where the bad pointer got its value from.

Öö Tiib · Jun 29, 2013

I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.

Open source is often of rather terrible quality. Not as rule, just often.

So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about
"Conditional jump or move depends on uninitialised value(s)"
Most of these came from c code, a few from Intel libraries such
as _intel_fast_memcmp or __intel_sse2_strlen

So I presume these are not really fatal, or else the program
would not get to first base. So when using valgrind, should
one start at the end and work back, as the last few are
more likely to indicate what killed it?

Note that the places where a defect is in code and where it manifests
itself (as crash or other misbehavior) are often quite distant.
Therefore I would start from first warnings. The warnings in library
are often caused by caller of library supplying invalid arguments. Last
warnings are usually where the fatally wounded program finally died.
That can be far from where it got the mortal wounds.

Les Cargill · Jun 29, 2013

I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.

So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about
"Conditional jump or move depends on uninitialised value(s)"
Most of these came from c code, a few from Intel libraries such
as _intel_fast_memcmp or __intel_sse2_strlen

So I presume these are not really fatal, or else the program
would not get to first base.

That is a bad assumption. Your "machine" has a loose part
that will wreck it every so-random often.

So when using valgrind, should
one start at the end and work back, as the last few are
more likely to indicate what killed it?

You have, somewhere, an uninitialized value used as an argument
to those functions, unless those libraries are seriously
broken.

Yes, you have to fix this

James Kuyper · Jun 29, 2013

It's unlikely that intel_fast_memcmp really has a bug in it. More
likely it's written in a strange, highly optimal way that valgrind
can't understand.

The way that valgrind works, it doesn't need to understand how
_intel_fast_memcmp() is written - it essentially runs the executable in
an instrumented emulator of the target platform. It keeps track of which
pieces of memory have been initialized, and when a conditional jump is
executed based upon the value stored in such memory, valgrind generates
this message. It doesn't have any need to understand why the jump is
being executed.

This message was almost certainly the result of uninitialized memory
being passed to _intel_fast_memcmp() by higher level code. Assuming it's
reasonably named, that function is likely to execute a conditional jump
based upon the value stored in each and every byte of both buffers
passed to it.

bruce56 · Jun 30, 2013

Note that the places where a defect is in code and where it manifests

itself (as crash or other misbehavior) are often quite distant.

Therefore I would start from first warnings. The warnings in library

are often caused by caller of library supplying invalid arguments. Last

warnings are usually where the fatally wounded program finally died.

That can be far from where it got the mortal wounds.

I know this. One of the free() crashes gives an address of 2020202020 hex,
which suggests a string of ASCII spaces is overwriting the allocated
chunk. But the module in question does no string handling.

Malcolm McLean · Jun 30, 2013

The way that valgrind works, it doesn't need to understand how

_intel_fast_memcmp() is written - it essentially runs the executable in

an instrumented emulator of the target platform. It keeps track of which

pieces of memory have been initialized, and when a conditional jump is

executed based upon the value stored in such memory, valgrind generates

this message. It doesn't have any need to understand why the jump is

being executed.

This message was almost certainly the result of uninitialized memory

being passed to _intel_fast_memcmp() by higher level code. Assuming it's

reasonably named, that function is likely to execute a conditional jump

based upon the value stored in each and every byte of both buffers

passed to it.

I'm guessing (it's only a guess) that intel_fast_memcmp takes arbitrary
unsigned char *s and lengths, aligns them on 64 bit boundaries, and if
all 64 bit chunks match, returns 0. If the edge bits don't match, it does
AND and OR masking to get the correct answer.
So valgrind will think it's using uninitialised memory, which it is,
but legitimately. (Not legal in C, but it's not written in C).

Ike Naar · Jun 30, 2013

Open source is often of rather terrible quality. Not as rule, just often.

Same for closed source.
At least with open source one can see how bad things are.

James Kuyper · Jun 30, 2013

On 06/30/2013 05:11 AM, Malcolm McLean wrote:
....

I'm guessing (it's only a guess) that intel_fast_memcmp takes arbitrary
unsigned char *s and lengths, aligns them on 64 bit boundaries, and if
all 64 bit chunks match, returns 0. If the edge bits don't match, it does
AND and OR masking to get the correct answer.
So valgrind will think it's using uninitialised memory, which it is,
but legitimately. (Not legal in C, but it's not written in C).

I'll concede that's a possibility, but it seems far more plausible to me
that calling code is defective by reason of calling _intel_fast_memcmp()
(directly or indirectly) to compare two buffers, one or both of which
has uninitialized memory within the specified length. Developers often
make mistakes like that, which is one of the main reasons for the very
existence of tools like valgrind.

Öö Tiib · Jun 30, 2013

Same for closed source.
At least with open source one can see how bad things are.

It is not benefit. When someone asks me to look into source code of
closed source then they usually offer money for that. With open
source ... maybe others are better at that ... but I have got noteworthy
money for working on open source only once.

Jorgen Grahn · Jun 30, 2013

I built an open-source package that comprises hundreds of source files
in C, C++, F90 among others. It has a few bugs suggesting memory
corruption, such as SIGSEGV or malloc and free aborts.

So I tried valgrind --tool=memcheck
It throws up hundreds of warnings about

These may well be your fault, as the others say.

But note that valgrind comes with "suppressions" -- warnings from
popular libraries (like the Gnu libc) which have been investigated and
found harmless. If you're working on an exotic combination of OS,
compiler and libc you may not have the best possible set of
suppressions.

The valgrind documentation can tell you more.

/Jorgen

glen herrmannsfeldt · Jul 1, 2013

James Kuyper said:
On 06/30/2013 05:11 AM, Malcolm McLean wrote:
(snip)

Or maybe written in non-standard C. While one can't portably
write things like that, within an implementation one can either
write the assembler code for it, or write it using the C compiler.
(Note that I didn't say write it in C.)

I'll concede that's a possibility, but it seems far more
plausible to me that calling code is defective by reason of
calling _intel_fast_memcmp() (directly or indirectly) to
compare two buffers, one or both of which has uninitialized
memory within the specified length. Developers often make mistakes
like that, which is one of the main reasons for the very
existence of tools like valgrind.

If the implementation is doing that, then a version of valgrind for
that implementation should know about it.

-- glen

James Kuyper · Jul 1, 2013

If the implementation is doing that, then a version of valgrind for
that implementation should know about it.

I was referring to a defect in the user code, not a defect in the
implementation.

Nobody · Jul 1, 2013

It *is* possible that a highly-optimized C library reads beyond the end of
a string, say, 8 bytes at a time, yet still operates correctly because the
compiler knows that its malloc() implementation won't ever allocate the
last 7 bytes of a virtual memory chunk, so the code won't segfault by
going beyond the end of the string, but valgrind doesn't know this.

It doesn't matter whether malloc() allocates the last few bytes of a page.

A string-processing algorithm which works in e.g. 8-byte units will
invariably work in *aligned* 8-byte units, so it will only read bytes
beyond the end of a string when those bytes are within the same alignment
unit as one or more bytes of the string, and thus within the same page.

Stephen Sprunk · Jul 2, 2013

It doesn't matter whether malloc() allocates the last few bytes of a
page.

A string-processing algorithm which works in e.g. 8-byte units will
invariably work in *aligned* 8-byte units, so it will only read
bytes beyond the end of a string when those bytes are within the same
alignment unit as one or more bytes of the string, and thus within
the same page.

malloc() implementations often allocate smallish blocks in a set of
fixed sizes, typically multiples of 8, to reduce heap fragmentation.
Since the start of each block needs to be aligned for any data type,
also typically 8 bytes, that means that reading from such blocks in
aligned chunks of 8 bytes will be safe from segfaults. As long as an
optimized memcmp() or strcmp() doesn't actually _use_ data from beyond
the proper length, it's not an error--at least for code, eg. Intel's
optimized libraries, that can be reasonably considered part of the
implementation.

S

valgrind and embedded ruby	0	Nov 21, 2005
malloc for members of a structure and a segmentation fault	19	Sep 15, 2008
Handling error/status messages by interface to C++ programs	9	Sep 8, 2009
bug raport - about way of linking in c	141	Sep 28, 2012
Performance of signed vs unsigned types	84	Apr 20, 2011
C and the future of computing	0	Apr 1, 2011
More on memory and possible leaks ... (longish)	4	Feb 14, 2005
C coding guidelines	99	Aug 26, 2009

valgrind spews avalanche of messages

bruce56

James Kuyper

Malcolm McLean

Öö Tiib

Les Cargill

James Kuyper

bruce56

Malcolm McLean

Ike Naar

James Kuyper

Öö Tiib

Jorgen Grahn

glen herrmannsfeldt

James Kuyper

Nobody

Stephen Sprunk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads