core due to memory corruption

deepak · Oct 8, 2010

Hi,

I'm working on a router and a process is core-ing due to memory
corruption.
It's very difficult to recreate core manually. But core is happening
intermittently.

All I have a core. I would like your suggestion to find the memory
corruption
by looking into core alone(with out reproducing it again).

Could please tell me checklists for solving this core?

Thanks,
Deepak

Seebs · Oct 8, 2010

I'm working on a router and a process is core-ing due to memory
corruption.
It's very difficult to recreate core manually. But core is happening
intermittently.
Whee.

All I have a core. I would like your suggestion to find the memory
corruption
by looking into core alone(with out reproducing it again).

Could please tell me checklists for solving this core?

You're probably not gonna get anywhere unless you're pretty good with
the debugger already.

Basically, look around a bit, and see if you can find an obviously
corrupted object. Now try to guess what it's been corrupted with.
If you get super lucky, it'll be corrupted because it's been overwritten
with legible text or something else you can easily identify, and then
you know what did it. Otherwise... Look around for other objects near
it, check to see whether it's been freed but is still being used, stuff
like that.

Really, you probably ought to look at a tool like valgrind. Or come up
with a reliable reproducer that you can trigger on purpose.

-s

James Waldby · Oct 8, 2010

I'm working on a router and a process is core-ing due to memory
corruption.
It's very difficult to recreate core manually. But core is happening
intermittently.

All I have a core. I would like your suggestion to find the memory
corruption
by looking into core alone(with out reproducing it again).

Could please tell me checklists for solving this core?

c.l.c people can ignore following comment.

/* Your question is off topic in c.l.c -- try a platform-specific
newsgroup, or perhaps comp.programming.

How you debug a problem like this depends on what features
your platform has. For example, we don't know if you have a
file system, a display, memory management, back channels,
what C compiler if any, etc.

Following is an approach aimed at the situation where one bad
line of code is responsible for the problem being debugged.

To find the function where the problem occurs, create a
circular buffer in which you store procedure entry and
exit data. Put the buffer and its index at a known or fixed
address so you can find it in the core file. Each time you
make an entry in the buffer, check the integrity of any
areas that are getting corrupted, so you can break as
soon as possible after corruption. To get procedure entry/
exit/caller data, see the -finstrument-functions section in
<http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html>.
In your __cyg_profile_func_enter and __cyg_profile_func_exit
functions, store entry/exit/caller and other relevant data in
the circular buffer, and then check integrity.

After you find the guilty function, put a 'z' character at
the beginning of a few of the code lines of the function,
where 'z' stands for the name of a macro like the following:

#define z fprintf(stderr,"%s@%3d \n",__FILE__,__LINE__);

After you bracket the guilty line within a range of lines, fill in
more z's until you locate a specific guilty line. Note, put
check(__FILE__,__LINE__); in place of the fprintf() or before
the fprintf() if you don't have a display, or if you want to do
corruption checking at each line. */

Also see -fstack-check section in webpage mentioned above.

Jorgen Grahn · Oct 9, 2010

Usually this means you're stomping out of bounds of some array.

Some debuggers let you debug from an executable and a core file like
gdb executablepath core
and you can use the usual debugger commands for the state at the moment of the
dump, such as a traceback and examining stack frames and memory. How much
information you get depends on the debugging information in the executable.

To be more precise, you get all the information there is. The
debugging information only affects how *easy it is to read*.

You may also be able to rebuild the executable with debug info, and
still use the existing core dump.

/Jorgen

Jorgen Grahn · Oct 9, 2010

You're probably not gonna get anywhere unless you're pretty good with
the debugger already.

Yes. First advice -- seek help from some clever co-worker who's good
with debuggers already, and learn as much as you can from him. If
you're developing routers, you should have such people already!

Basically, look around a bit, and see if you can find an obviously
corrupted object. Now try to guess what it's been corrupted with.
If you get super lucky, it'll be corrupted because it's been overwritten
with legible text or something else you can easily identify, and then
you know what did it. Otherwise... Look around for other objects near
it, check to see whether it's been freed but is still being used, stuff
like that.

There is a lot you can do with just a core dump, but it's not possible
to list all the things -- it depends a lot on the exact problem. A
good troubleshooter comes up with techniques which fit the problem.

/Jorgen

no module api file after generating ruby core api	0	Jun 25, 2009
Any ideas on tracking down memory corruption	6	Mar 18, 2006
uninferred due to asynchronous read logic	25	May 12, 2008
Failed to generate a user instance of SQL Server due to a failure instarting the process for the use	1	May 26, 2009
Generator using item[n-1] + item[n] memory	0	Feb 14, 2014
My boss tells me to work at home	5	Dec 18, 2021
Memory leak even after deleting memory pointers from vector	5	Sep 23, 2008
Debugging standard C library routines	83	Sep 30, 2006

core due to memory corruption

deepak

Seebs

James Waldby

Jorgen Grahn

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads