J
jacob navia
CB Falconer boasts:
Well, he should not read this article, since we are going to see what
happens when in the middle of the execution of your program, you see
"Segmentation fault"
And nothing. Your program doesn't exist any more, and it is up
to you to find *where* in those thousands of lines lies the fault.
First, you have to compile your program for debug, i.e. to switch on the
debug information generation. Personally I always leave that switch ON,
and only tell the linker to not include that information in the final
executable if I am going to ship the program to the customer. Otherwise,
it is always in.
Second you have to start the debugger, and run the program in the same
environment as the one where it crashed. This is important. A slight
difference in environment could mean that the crash disappears, or
that you get another crash, and not the one you were looking
for...

Let's suppose for now, that you reproduce the crash.
A "crash" is actually a signal from the CPU to the operating system
(an interrupt) that alters the flow of your program, since an interrupt
routine takes over.
This interrupt can be triggered by a wrong address, for instance, and in
the old times when I started learning C, the processor would signal that
the system bus did not accept some address. The program would stop with
"bus error".
Nowadays it is still the same. Only the appearances have changed. An
interrupt routine informs the OS that your program is trying to access
some bad address, the OS notices that the process is running under a
debugger, and then it notifies the debugger of the fact that the
program under debug crashed. The details of how this is done change
from OS to OS, but there is ONE information that MUST arrive to the
debugger (at least): the program counter.
The debugger then, looks its database of debug information in the line
numbers associative tables, and looks up where this address appears.
In the easy cases, it will find the address in question and will then be
able to display the file where the crash happened.
In many cases however, the debugger will NOT find any correspondence
between the address of the crash, and the program. In those cases, the
debugger can either
1) give up, and just display the address where the program crashed,
as gdb does some times,
2) Try to use the st*** (Oh excuse me I was just going to use that
5 letter word)... Try again
Try to use the space allocated for local variables and follow the
linked list of function activations.
One way of trying to find the current point is the brutal approach of
the lcc-win debugger. I take all locals space, and go reading all of it
trying to find out if any of the numbers in the stack correspond to any
address I have in my line number information.
If I find (as usually is the case) an integer that correspond to a
program address, I assume that this address has been left by a CALL
instruction. To verify this, I read the contents around that address
and verify that a CALL instruction is in there. If the verification
passes I am certain that a CALL was done, and that this is where
the call in the program source code is located where things went wrong.
Because, obviously, you can pass a wrong pointer to a system routine
that calls another system routine with that pointer, and the chain can
be quite deep until the pointer is actually used.
Once the debugger has the point where the last user routine was active,
it can display the source code to the user. Local variables can be
retrieved only if the debugger is able to figure out the value of
the frame pointer when this routine was called. In general this is
enormously difficult, since the system routines are highly optimized
code, without stack frames, and maybe using the frame pointer as
a normal register to hold values that have nothing to do with stack
frames, values that if followed at face value could make the debugger
crash.
---------------------------------------------------------------------
The above is a description of debugging in a workstation type
environment. In other environments, this whole stuff is much more
difficult.
Take the lcc-win debugger when used with a 16 bit DSP of Analog devices
(80-90K of RAM + 512K EPROM).
For starters, there is no possibilities of breakpoints. Eproms can be
written in 4K chunks, and setting/unsetting breakpoints like that
is out of the question.
The DSP is running a version of a FORTH virtual machine, a very
small OS/controller. The generated C then, is just instructions for
this monitor that allows for a single STOPWM instruction, that
allows to stop the program and enter the monitor.
This approach has been described by Dave Hanson et al, see reference
[1]. In the general approach of Dave Hanson, the monitor is not
FORTH based but just a small C routine.
In this case, the FORTH interpreter is used as monitor.
In this environment, there is no MMU, and writing to a wrong address
will just destroy some data elsewhere, but not provoke any
visible problem immediately. There are no crashes, at least not
of the kind we saw above.
When the program stops because of a breakpoint, the first thing the
debug monitor does, is to see if the address is in the table of 16
active breakpoints. If it is not in there, the program continues execution.
If we have reached an active breakpoint, the monitor sends a character
through the serial line to lcc-win, patiently waiting for news from the
program in the PC hooked up to the circuit board. Then, the debugger
asks for addresses of the key variables, stack contents, etc. The user
sees his/her source code, and the whole feels like visual studio
lcc-win can also send a break (holding the serial line to zero for 8 or
nine bits, I do not remember exactly). The break provokes a monitor
interrupt, and we start a debugging session like if there was
an active breakpoint.
As you can see both environments are completely different. It would be
interesting to hear from people that use debuggers in other embedded
systems to share their experiences here.
Well, he should not read this article, since we are going to see what
happens when in the middle of the execution of your program, you see
"Segmentation fault"
And nothing. Your program doesn't exist any more, and it is up
to you to find *where* in those thousands of lines lies the fault.
First, you have to compile your program for debug, i.e. to switch on the
debug information generation. Personally I always leave that switch ON,
and only tell the linker to not include that information in the final
executable if I am going to ship the program to the customer. Otherwise,
it is always in.
Second you have to start the debugger, and run the program in the same
environment as the one where it crashed. This is important. A slight
difference in environment could mean that the crash disappears, or
that you get another crash, and not the one you were looking
for...
Let's suppose for now, that you reproduce the crash.
A "crash" is actually a signal from the CPU to the operating system
(an interrupt) that alters the flow of your program, since an interrupt
routine takes over.
This interrupt can be triggered by a wrong address, for instance, and in
the old times when I started learning C, the processor would signal that
the system bus did not accept some address. The program would stop with
"bus error".
Nowadays it is still the same. Only the appearances have changed. An
interrupt routine informs the OS that your program is trying to access
some bad address, the OS notices that the process is running under a
debugger, and then it notifies the debugger of the fact that the
program under debug crashed. The details of how this is done change
from OS to OS, but there is ONE information that MUST arrive to the
debugger (at least): the program counter.
The debugger then, looks its database of debug information in the line
numbers associative tables, and looks up where this address appears.
In the easy cases, it will find the address in question and will then be
able to display the file where the crash happened.
In many cases however, the debugger will NOT find any correspondence
between the address of the crash, and the program. In those cases, the
debugger can either
1) give up, and just display the address where the program crashed,
as gdb does some times,
2) Try to use the st*** (Oh excuse me I was just going to use that
5 letter word)... Try again
Try to use the space allocated for local variables and follow the
linked list of function activations.
One way of trying to find the current point is the brutal approach of
the lcc-win debugger. I take all locals space, and go reading all of it
trying to find out if any of the numbers in the stack correspond to any
address I have in my line number information.
If I find (as usually is the case) an integer that correspond to a
program address, I assume that this address has been left by a CALL
instruction. To verify this, I read the contents around that address
and verify that a CALL instruction is in there. If the verification
passes I am certain that a CALL was done, and that this is where
the call in the program source code is located where things went wrong.
Because, obviously, you can pass a wrong pointer to a system routine
that calls another system routine with that pointer, and the chain can
be quite deep until the pointer is actually used.
Once the debugger has the point where the last user routine was active,
it can display the source code to the user. Local variables can be
retrieved only if the debugger is able to figure out the value of
the frame pointer when this routine was called. In general this is
enormously difficult, since the system routines are highly optimized
code, without stack frames, and maybe using the frame pointer as
a normal register to hold values that have nothing to do with stack
frames, values that if followed at face value could make the debugger
crash.
---------------------------------------------------------------------
The above is a description of debugging in a workstation type
environment. In other environments, this whole stuff is much more
difficult.
Take the lcc-win debugger when used with a 16 bit DSP of Analog devices
(80-90K of RAM + 512K EPROM).
For starters, there is no possibilities of breakpoints. Eproms can be
written in 4K chunks, and setting/unsetting breakpoints like that
is out of the question.
The DSP is running a version of a FORTH virtual machine, a very
small OS/controller. The generated C then, is just instructions for
this monitor that allows for a single STOPWM instruction, that
allows to stop the program and enter the monitor.
This approach has been described by Dave Hanson et al, see reference
[1]. In the general approach of Dave Hanson, the monitor is not
FORTH based but just a small C routine.
In this case, the FORTH interpreter is used as monitor.
In this environment, there is no MMU, and writing to a wrong address
will just destroy some data elsewhere, but not provoke any
visible problem immediately. There are no crashes, at least not
of the kind we saw above.
When the program stops because of a breakpoint, the first thing the
debug monitor does, is to see if the address is in the table of 16
active breakpoints. If it is not in there, the program continues execution.
If we have reached an active breakpoint, the monitor sends a character
through the serial line to lcc-win, patiently waiting for news from the
program in the PC hooked up to the circuit board. Then, the debugger
asks for addresses of the key variables, stack contents, etc. The user
sees his/her source code, and the whole feels like visual studio
lcc-win can also send a break (holding the serial line to zero for 8 or
nine bits, I do not remember exactly). The break provokes a monitor
interrupt, and we start a debugging session like if there was
an active breakpoint.
As you can see both environments are completely different. It would be
interesting to hear from people that use debuggers in other embedded
systems to share their experiences here.