core due to memory corruption

D

deepak

Hi,

I'm working on a router and a process is core-ing due to memory
corruption.
It's very difficult to recreate core manually. But core is happening
intermittently.

All I have a core. I would like your suggestion to find the memory
corruption
by looking into core alone(with out reproducing it again).

Could please tell me checklists for solving this core?

Thanks,
Deepak
 
S

Seebs

I'm working on a router and a process is core-ing due to memory
corruption.
It's very difficult to recreate core manually. But core is happening
intermittently.
Whee.

All I have a core. I would like your suggestion to find the memory
corruption
by looking into core alone(with out reproducing it again).
Could please tell me checklists for solving this core?

You're probably not gonna get anywhere unless you're pretty good with
the debugger already.

Basically, look around a bit, and see if you can find an obviously
corrupted object. Now try to guess what it's been corrupted with.
If you get super lucky, it'll be corrupted because it's been overwritten
with legible text or something else you can easily identify, and then
you know what did it. Otherwise... Look around for other objects near
it, check to see whether it's been freed but is still being used, stuff
like that.

Really, you probably ought to look at a tool like valgrind. Or come up
with a reliable reproducer that you can trigger on purpose.

-s
 
J

James Waldby

I'm working on a router and a process is core-ing due to memory
corruption.
It's very difficult to recreate core manually. But core is happening
intermittently.

All I have a core. I would like your suggestion to find the memory
corruption
by looking into core alone(with out reproducing it again).

Could please tell me checklists for solving this core?

c.l.c people can ignore following comment.

/* Your question is off topic in c.l.c -- try a platform-specific
newsgroup, or perhaps comp.programming.

How you debug a problem like this depends on what features
your platform has. For example, we don't know if you have a
file system, a display, memory management, back channels,
what C compiler if any, etc.

Following is an approach aimed at the situation where one bad
line of code is responsible for the problem being debugged.

To find the function where the problem occurs, create a
circular buffer in which you store procedure entry and
exit data. Put the buffer and its index at a known or fixed
address so you can find it in the core file. Each time you
make an entry in the buffer, check the integrity of any
areas that are getting corrupted, so you can break as
soon as possible after corruption. To get procedure entry/
exit/caller data, see the -finstrument-functions section in
<http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html>.
In your __cyg_profile_func_enter and __cyg_profile_func_exit
functions, store entry/exit/caller and other relevant data in
the circular buffer, and then check integrity.

After you find the guilty function, put a 'z' character at
the beginning of a few of the code lines of the function,
where 'z' stands for the name of a macro like the following:

#define z fprintf(stderr,"%s@%3d \n",__FILE__,__LINE__);

After you bracket the guilty line within a range of lines, fill in
more z's until you locate a specific guilty line. Note, put
check(__FILE__,__LINE__); in place of the fprintf() or before
the fprintf() if you don't have a display, or if you want to do
corruption checking at each line. */

Also see -fstack-check section in webpage mentioned above.
 
J

Jorgen Grahn

Usually this means you're stomping out of bounds of some array.


Some debuggers let you debug from an executable and a core file like
gdb executablepath core
and you can use the usual debugger commands for the state at the moment of the
dump, such as a traceback and examining stack frames and memory. How much
information you get depends on the debugging information in the executable.

To be more precise, you get all the information there is. The
debugging information only affects how *easy it is to read*.

You may also be able to rebuild the executable with debug info, and
still use the existing core dump.

/Jorgen
 
J

Jorgen Grahn

You're probably not gonna get anywhere unless you're pretty good with
the debugger already.

Yes. First advice -- seek help from some clever co-worker who's good
with debuggers already, and learn as much as you can from him. If
you're developing routers, you should have such people already!
Basically, look around a bit, and see if you can find an obviously
corrupted object. Now try to guess what it's been corrupted with.
If you get super lucky, it'll be corrupted because it's been overwritten
with legible text or something else you can easily identify, and then
you know what did it. Otherwise... Look around for other objects near
it, check to see whether it's been freed but is still being used, stuff
like that.

There is a lot you can do with just a core dump, but it's not possible
to list all the things -- it depends a lot on the exact problem. A
good troubleshooter comes up with techniques which fit the problem.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top