debugging question

H

hpoliset

I have a debugging question w.r.t core dumps with signal 4 Illegal
instruction messages.

I analyzed the core file through gdb. In simple english following is the
pattern observed:

I have an program with a piece of code -lets call it function1() that gets
called by different callers. This function is executed 1000 's of times as
while the binary is run and it works fine.

However, under some strange circumstances (which is not reproducable
consistently) if the call sequence happens to have a specific
function_root() in the stack, the binary crashed in function1() with a
signal 4 illegal argument exception. This never happens at the same place in
the code.

The crash stack looks as follows:
function1() {
...some code
......
.....
returncode = function2(.......); -------------------------> crashes on this
line
....
}

When I look at the code in function1 and beyond it looks all clean.

These are the following questions i have:

1) Is it posibble to identify from the core dump whether its a stack
corruption. what are the other posibbilities.
2) Is there any systematic process to be followed to identify the victim of
the corruption followed by the actual culprit

Please advice.
 
J

jacob navia

hpoliset said:
I have a debugging question w.r.t core dumps with signal 4 Illegal
instruction messages. [snip]
However, under some strange circumstances (which is not reproducable
consistently) if the call sequence happens to have a specific
function_root() in the stack, the binary crashed in function1() with a
signal 4 illegal argument exception.

Well, I do not get it. Illegal instruction or illegal argument???

Illegal instruction means in most cases:
1) The return address of the called function was corrupted (stack corruption),
and when executing a return instruction the program jumps anywhere, in this
case in a place that contained no instructions, probably into a data space, or
somewhere.
To trap this kind of problem, lcc-win32 offers you a g4 debug level, that injects instructions
before executing a return to save the frame pointer in a global variable, that is later interpreted
to output the name of the faulty function. Other compilers
may use different schemas.

2) A function pointer was used that wasn't initialized, or become invalid because the
code it is pointing to was unloaded (dynamic library unloading too early).

3) A function declared with _stdcall callig convention was called without prototype
or with a wrong prototype. Under windows this is catched by the linker but other
systems may be different.

Look for temporary buffer variables like:
char tmpbuf[28];

Are there any strcpy memcpy or similar functions called with those variables?

Look for any function pointer calls. Are those pointers valid?
This never happens at the same place in
the code.

This would be consistent with a buffer overflow that only happens in special
circumstances, not always.

Good luck

jacob
 
M

Malcolm

hpoliset said:
However, under some strange circumstances (which is not
reproducable consistently) if the call sequence happens to have a
specific function_root() in the stack, the binary crashed in function1()
with a signal 4 illegal argument exception. This never happens at the
same place in the code.
If the crash doesn't show up consistently, this makes it very hard to debug.
Try to create a situation where the bug is reproducible, maybe by commenting
out sections of code, or feeding the program the same data set.
1) Is it posibble to identify from the core dump whether its a stack
corruption. what are the other posibbilities.
If the error code is "illegal argument" that would imply that the arguments
to the function have been corrupted. Arguments are not necessarily passed on
the stack. It could be that the caller is loading bad data into a register.
2) Is there any systematic process to be followed to identify the victim >
of the corruption followed by the actual culpritUnfortunately you are unlikely to have a "stack integrity check" function
supplied. A debugger will tell you where the program crashed, so you know
that the arguments to that function are corrupt. This probably means that
the error is in the caller (you could try writing a stub version of the
function that crashed, that adds one to each of its arguments, and see if
the program still crashes in the same place. This would rule out the callee
as a source of the crash.)
However the calling function could be fine, but be itself called with
invalid data. There is no magic bullet. You just have to trace back the
program flow until you come to the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top