Weird segfaults

N

Naveen Parihar

I've a binary that runs on most of our servers but segfaults on one of
the servers. Further, even on this specific machine, the binary runs
successfully sometimes but segfaults most of the time. While trying to
debug this behaviour using gdb, I found out that the segfaults occurs
randomly at different places in the code. For example, once a pointer
was not initiallized to NULL (it was pointing to x8 insterad of x0) in
a constructor, even though it ran through the intialization code.

Has anyone come across this kind of behavior? Any help and/or
information sf highly appreciated.

-Naveen

Here is the configuration of the machine where the problem occurs:
isip213_[1]: g++ -v
Reading specs from /usr/local/lib/gcc-lib/i386-pc-solaris2.7/3.2.1/specs
Configured with: ../gcc-3.2.1/configure --prefix=/usr/local
--with-gnu-as --with-as=/usr/local/bin/as --with-gnu-ld
--with-ld=/usr/local/bin/ld
Thread model: posix
gcc version 3.2.1
isip213_[1]: uname -a
SunOS isip213.isip.msstate.edu 5.7 Generic_106542-20 i86pc i386
isip213_[1]: psrinfo -v
Status of processor 0 as of: 04/01/04 14:38:57
Processor has been on-line since 03/29/04 16:22:01.
The i386 processor operates at 826 MHz,
and has an i387 compatible floating point processor.
Status of processor 1 as of: 04/01/04 14:38:57
Processor has been on-line since 03/29/04 16:22:06.
The i386 processor operates at 826 MHz,
and has an i387 compatible floating point processor.
isip213_[1]:
 
C

Chris Theis

Naveen Parihar said:
I've a binary that runs on most of our servers but segfaults on one of
the servers. Further, even on this specific machine, the binary runs
successfully sometimes but segfaults most of the time. While trying to
debug this behaviour using gdb, I found out that the segfaults occurs
randomly at different places in the code. For example, once a pointer
was not initiallized to NULL (it was pointing to x8 insterad of x0) in
a constructor, even though it ran through the intialization code.

Has anyone come across this kind of behavior? Any help and/or
information sf highly appreciated.

-Naveen

Did you step through the initialization and then the initialized pointer was
still NULL? I somehow doubt that unless your compiler is very broken. Could
it be that you´re missing an appropriate copy constructor? Without some
example code that can reproduce the problem one cannot really give a precise
answer.

Regards
Chris
 
N

Naveen Parihar

Chris Theis said:
Did you step through the initialization and then the initialized pointer was
still NULL? I somehow doubt that unless your compiler is very broken. Could
it be that you´re missing an appropriate copy constructor? Without some
example code that can reproduce the problem one cannot really give a precise
answer.

Thanks for the reply.

This problem occurs while using a fairly complicated speech
recognition system. And the code base and runtime is so large that it
is not feasible to step through every single step of execution. Note
that the segfaults occurs randomly.
One of the instances of segfault occured immediately after declaring a
vector of about 10,000 singlelinkedlist. The documentation for both
Vector and SingleLinkedList is available here:

http://www.isip.msstate.edu/projects/speech/software/documentation/

It turns out that the last (tail) pointer of the element of vector, a
single linked list object, was not initialized to NULL. All the other
elements (singlelinkedlists) that I looked at were properly intialized
through the default constructor.

All these segfaults occur on a specific machine. We have other servers
identical (in hardware and OS) and everything runs fine on them. Also,
we share the same developement binaries (compiler, linker, etc.)
across all the machines and so, complier should not be an issue.

-Naveen
 
J

John Harrison

It turns out that the last (tail) pointer of the element of vector, a
single linked list object, was not initialized to NULL. All the other
elements (singlelinkedlists) that I looked at were properly intialized
through the default constructor.

All these segfaults occur on a specific machine. We have other servers
identical (in hardware and OS) and everything runs fine on them. Also,
we share the same developement binaries (compiler, linker, etc.)
across all the machines and so, complier should not be an issue.

-Naveen

Your errant server must be being bombarded by cosmic rays or other ionizing
radiation. Try lead shielding, it is cheap.

john
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top