perl segfault - how to troubleshoot

Discussion in 'Perl Misc' started by James Harris, Dec 2, 2008.

  1. James Harris

    James Harris Guest

    Since a few days ago perl segfaults when running certain scripts. The
    scripts were ok before the problem started and have not been changed.
    I have checked my (Ubuntu) system for updates. None were made near the
    time of the first incidence of the problem so I started looking more
    widely.

    I found that even if I try to check syntax on one of the scripts that
    fails perl segfaults.

    $ perl -c mythrename.pl
    Segmentation fault
    $

    If I try to run the debugger it doesn't get as far as prompting for
    the first line

    $ perl -d mythrename.pl
    Loading DB routines from perl5db.pl version 1.28
    Editor support available.
    Enter h or `h h' for help, or `man perldebug' for more help.

    At this point CPU usage goes to 100%. Since I cannot even get the
    debugger to start at the first line where do I go next to try and fix
    this?

    Anyone else had similar problems recently - within a week?

    James
     
    James Harris, Dec 2, 2008
    #1
    1. Advertising

  2. James Harris

    Guest

    On Mon, 1 Dec 2008 17:55:53 -0800 (PST), James Harris <> wrote:

    >Since a few days ago perl segfaults when running certain scripts. The
    >scripts were ok before the problem started and have not been changed.
    >I have checked my (Ubuntu) system for updates. None were made near the
    >time of the first incidence of the problem so I started looking more
    >widely.
    >
    >I found that even if I try to check syntax on one of the scripts that
    >fails perl segfaults.
    >
    >$ perl -c mythrename.pl
    >Segmentation fault
    >$
    >
    >If I try to run the debugger it doesn't get as far as prompting for
    >the first line
    >
    >$ perl -d mythrename.pl
    >Loading DB routines from perl5db.pl version 1.28
    >Editor support available.
    >Enter h or `h h' for help, or `man perldebug' for more help.
    >
    >At this point CPU usage goes to 100%. Since I cannot even get the
    >debugger to start at the first line where do I go next to try and fix
    >this?
    >
    >Anyone else had similar problems recently - within a week?
    >
    >James


    Switch to Windows where you can see OS faults.
    Stay away from porno sites or disable active content.
    Run virus scan.
    Run memtest from the bios.
    Reformat/reinstall the OS/Perl.

    Er, thats all I can think of.

    Good luck!

    sln
     
    , Dec 2, 2008
    #2
    1. Advertising

  3. James Harris

    Tim Greer Guest

    James Harris wrote:

    > Since a few days ago perl segfaults when running certain scripts. The
    > scripts were ok before the problem started and have not been changed.
    > I have checked my (Ubuntu) system for updates. None were made near the
    > time of the first incidence of the problem so I started looking more
    > widely.
    >
    > I found that even if I try to check syntax on one of the scripts that
    > fails perl segfaults.
    >
    > $ perl -c mythrename.pl
    > Segmentation fault
    > $
    >
    > If I try to run the debugger it doesn't get as far as prompting for
    > the first line
    >
    > $ perl -d mythrename.pl
    > Loading DB routines from perl5db.pl version 1.28
    > Editor support available.
    > Enter h or `h h' for help, or `man perldebug' for more help.
    >
    > At this point CPU usage goes to 100%. Since I cannot even get the
    > debugger to start at the first line where do I go next to try and fix
    > this?
    >
    > Anyone else had similar problems recently - within a week?
    >
    > James


    I doubt anyone else has had similar problems recently just due to
    running Perl (did you perform an install/upgrade of the system, any
    modules or Perl itself recently?) This sounds like a system problem.
    Run a memory checker. Run strace (or similar) on the process to see
    what it's doing. Check the running process(es), if possible (check
    top, ps, pstree, lsof, etc.) if relevant. See how much memory and CPU
    is free before running the script(s) and how much the script(s) try and
    use. Check your dmesg and system logs for error reporting, etc. and
    ensure your kernel is configured to have proper error logging. Are
    these scripts doing anything interesting, using any specific
    commands/binaries, doing anything intensive? You said some scripts seg
    fault and some don't. What types of scripts run and what types fail?
    Please provide relevant information pertaining to this potentially
    being an issue with Perl.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
     
    Tim Greer, Dec 2, 2008
    #3
  4. [A complimentary Cc of this posting was sent to
    James Harris
    <>], who wrote in article <>:
    > $ perl -c mythrename.pl
    > Segmentation fault


    Try running under debugger:

    gdb `which perl` -c mythrename.pl

    > $ perl -d mythrename.pl
    > Loading DB routines from perl5db.pl version 1.28
    > Editor support available.
    > Enter h or `h h' for help, or `man perldebug' for more help.


    Try enabling autotrace. Report.

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Dec 2, 2008
    #4
  5. James Harris

    smallpond Guest

    On Dec 1, 8:55 pm, James Harris <> wrote:
    > Since a few days ago perl segfaults when running certain scripts. The
    > scripts were ok before the problem started and have not been changed.
    > I have checked my (Ubuntu) system for updates. None were made near the
    > time of the first incidence of the problem so I started looking more
    > widely.
    >
    > I found that even if I try to check syntax on one of the scripts that
    > fails perl segfaults.
    >
    > $ perl -c mythrename.pl
    > Segmentation fault
    > $
    >
    > If I try to run the debugger it doesn't get as far as prompting for
    > the first line
    >
    > $ perl -d mythrename.pl
    > Loading DB routines from perl5db.pl version 1.28
    > Editor support available.
    > Enter h or `h h' for help, or `man perldebug' for more help.
    >
    > At this point CPU usage goes to 100%. Since I cannot even get the
    > debugger to start at the first line where do I go next to try and fix
    > this?
    >
    > Anyone else had similar problems recently - within a week?
    >
    > James



    What perl version?
    Was DB compiled for this perl version?
    What 'use' or 'require' statements do you have?
    Are you using threads?

    -S
     
    smallpond, Dec 2, 2008
    #5
  6. James Harris

    Guest

    James Harris <> wrote:
    > Since a few days ago perl segfaults when running certain scripts. The
    > scripts were ok before the problem started and have not been changed.
    > I have checked my (Ubuntu) system for updates. None were made near the
    > time of the first incidence of the problem so I started looking more
    > widely.
    >
    > I found that even if I try to check syntax on one of the scripts that
    > fails perl segfaults.
    >
    > $ perl -c mythrename.pl
    > Segmentation fault
    > $


    I'd run:

    strace perl -c mythrename.pl

    And capture the output. If it wasn't obvious what when wrong based on the
    last thing before the segfault, then I'd grep out all of the file opens
    (system libraries and such), and look at all those files to see if any
    had changed recently.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Dec 2, 2008
    #6
  7. [A complimentary Cc of this posting was NOT [per weedlist] sent to
    Ilya Zakharevich
    <>], who wrote in article <gh2pnm$3mh$>:
    > Try running under debugger:
    >
    > gdb `which perl` -c mythrename.pl


    Oups, IIRC, gdb does not allow a simple way of doing things. One
    needs something like

    gdb --args `which perl` -c mythrename.pl

    or

    gdb `which perl`
    run -c mythrename.pl

    Yours,
    Ilya
     
    Ilya Zakharevich, Dec 2, 2008
    #7
  8. James Harris

    James Harris Guest

    On 2 Dec, 16:57, wrote:
    > James Harris <> wrote:
    > > Since a few days ago perl segfaults when running certain scripts. The
    > > scripts were ok before the problem started and have not been changed.
    > > I have checked my (Ubuntu) system for updates. None were made near the
    > > time of the first incidence of the problem so I started looking more
    > > widely.

    >
    > > I found that even if I try to check syntax on one of the scripts that
    > > fails perl segfaults.

    >
    > > $ perl -c mythrename.pl
    > > Segmentation fault
    > > $

    >
    > I'd run:
    >
    > strace perl -c mythrename.pl
    >
    > And capture the output. If it wasn't obvious what when wrong based on the
    > last thing before the segfault, then I'd grep out all of the file opens
    > (system libraries and such), and look at all those files to see if any
    > had changed recently.


    Great suggestion. I'd never heard of strace but it looks like the
    quickest way to track down what was happening. The output ends as
    follows

    stat64("/usr/lib/perl/5.8/auto/IO/IO.so", {st_mode=S_IFREG|0644,
    st_size=15580, ...}) = 0
    stat64("/usr/lib/perl/5.8/auto/IO/IO.bs", {st_mode=S_IFREG|0644,
    st_size=0, ...}) = 0
    open("/usr/lib/perl/5.8/auto/IO/IO.so", O_RDONLY) = 9
    read(9, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\22\0"...,
    512) = 512
    fstat64(9, {st_mode=S_IFREG|0644, st_size=15580, ...}) = 0
    mmap2(NULL, 18548, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 9,
    0) = 0xb7c09000
    mmap2(0xb7c0d000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
    MAP_DENYWRITE, 9, 0x3) = 0xb7c0d000
    close(9) = 0
    brk(0x83e5000) = 0x83e5000
    read(8, "sage: $io->stat()\';\n stat($_["..., 4096) = 3496
    read(8, "", 4096) = 0
    close(8) = 0
    stat64("/etc/perl/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No such file
    or directory)
    stat64("/etc/perl/Socket.pm", 0xbf9eaf6c) = -1 ENOENT (No such file or
    directory)
    stat64("/usr/local/lib/perl/5.8.8/Socket.pmc", 0xbf9eb05c) = -1 ENOENT
    (No such file or directory)
    stat64("/usr/local/lib/perl/5.8.8/Socket.pm", 0xbf9eaf6c) = -1 ENOENT
    (No such file or directory)
    stat64("/usr/local/share/perl/5.8.8/Socket.pmc", 0xbf9eb05c) = -1
    ENOENT (No such file or directory)
    stat64("/usr/local/share/perl/5.8.8/Socket.pm", 0xbf9eaf6c) = -1
    ENOENT (No such file or directory)
    stat64("/usr/lib/perl5/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No such
    file or directory)
    stat64("/usr/lib/perl5/Socket.pm", 0xbf9eaf6c) = -1 ENOENT (No such
    file or directory)
    stat64("/usr/share/perl5/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No such
    file or directory)
    stat64("/usr/share/perl5/Socket.pm", 0xbf9eaf6c) = -1 ENOENT (No such
    file or directory)
    stat64("/usr/lib/perl/5.8/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No
    such file or directory)
    stat64("/usr/lib/perl/5.8/Socket.pm", {st_mode=S_IFREG|0644,
    st_size=3514, ...}) = 0
    open("/usr/lib/perl/5.8/Socket.pm", O_RDONLY|O_LARGEFILE) = 8
    ioctl(8, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf9ead78) = -1 ENOTTY
    (Inappropriate ioctl for device)
    _llseek(8, 0, [0], SEEK_CUR) = 0
    read(8, "package Socket;\n\nour($VERSION, @"..., 4096) = 3514
    read(8, "", 4096) = 0
    close(8) = 0
    stat64("/usr/lib/perl/5.8/auto/Socket/Socket.so", {st_mode=S_IFREG|
    0644, st_size=19676, ...}) = 0
    stat64("/usr/lib/perl/5.8/auto/Socket/Socket.bs", {st_mode=S_IFREG|
    0644, st_size=0, ...}) = 0
    open("/usr/lib/perl/5.8/auto/Socket/Socket.so", O_RDONLY) = 8
    read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16"...,
    512) = 512
    fstat64(8, {st_mode=S_IFREG|0644, st_size=19676, ...}) = 0
    mmap2(NULL, 22644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,
    0) = 0xb7c03000
    mmap2(0xb7c08000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
    MAP_DENYWRITE, 8, 0x4) = 0xb7c08000
    close(8) = 0
    --- SIGSEGV (Segmentation fault) @ 0 (0) ---
    +++ killed by SIGSEGV +++
    Process 18883 detached

    Does this mean that the segfault occurred as a result of the close(8)
    call - or at least that close() was the last system call prior to the
    fault?

    Or would the segfault have happened while executing the next call
    which is not shown? (From the man page it looks as though strace will
    trap signals and still print the failing call after a signal
    occurs ... but I'm not sure I'm reading it correctly.)

    James
     
    James Harris, Dec 2, 2008
    #8
  9. On 2008-12-02, James Harris <> wrote:
    > Since a few days ago perl segfaults when running certain scripts. The
    > scripts were ok before the problem started and have not been changed.
    > I have checked my (Ubuntu) system for updates. None were made near the
    > time of the first incidence of the problem so I started looking more
    > widely.


    (just for the record) The motto of c.t.t is: "Please provide complete
    minimal example that clearly exhibits your problem". Point.

    *CUT*

    [To: all]

    perl -wle '
    $x = q|abc|;
    q|zyx| =~ m[(??{ q|zyx| =~ m,[$x-]+, })]'
    Out of memory!

    If B<perl> is replaced with B<debugperl> (that's the Perl of Debian
    built with I<DEBUGGING> enabled, then tail is:

    Matching embedded REx "" against ""...
    104520 <zyx%0%0%0%0%0!%0%0%0%340%255P%tX%255P%t%320%355%16%10%3%0%0> <>|
    1: NOTHING(2)
    104520 <zyx%0%0%0%0%0!%0%0%0%340%255P%tX%255P%t%320%355%16%10%3%0%0> <>|
    2: END(0)
    EVAL trying tail ... 0
    104520 <zyx%0%0%0%0%0!%0%0%0%340%255P%tX%255P%t%320%355%16%10%3%0%0> <>|
    4: END(0)
    Freeing REx: ""
    Match successful!
    panic: malloc at -e line 1.
    Freeing REx: "(??{ q|zyx| =~ m,[$x-]+, })"
    Freeing REx: "[abc-]+"
    panic: free from wrong pool.

    Some time ago when I'd experimented with that construct (I insist, I had
    a reason) then that beast segfaulted too. This sample doesn't. That
    wasn't one-liner after all.

    --
    Torvalds' goal for Linux is very simple: World Domination
     
    Eric Pozharski, Dec 2, 2008
    #9
  10. James Harris

    James Harris Guest

    On 2 Dec, 07:46, Tim Greer <> wrote:
    > James Harris wrote:
    > > Since a few days ago perl segfaults when running certain scripts. The
    > > scripts were ok before the problem started and have not been changed.
    > > I have checked my (Ubuntu) system for updates. None were made near the
    > > time of the first incidence of the problem so I started looking more
    > > widely.

    ....
    > > Anyone else had similar problems recently - within a week?

    >
    > I doubt anyone else has had similar problems recently just due to
    > running Perl (did you perform an install/upgrade of the system, any
    > modules or Perl itself recently?)


    No upgrades. This is really weird. From syslog the problem started on
    Nov 27th at 17:56:54 with the following entry

    $ gunzip -c syslog.3.gz | grep segfault | more
    Nov 27 17:56:54 s01 kernel: [678797.925818] maps.pl[21703]: segfault
    at 00000000 eip b7bdd004 esp bf8a1e8f error 6

    Before that there was no segfault - at least for the two days the logs
    go back. (I have since - I think - changed the log rotate frequency
    for syslog to weekly rather than daily.)

    November's system upgrades were on Nov 20, Nov 23, and Nov 29 (and the
    fault started on 27th) so the only upgrade prior to the fault starting
    was that on the 23rd about four days beforehand.

    After that there are lots of similar segfault messages. Extracting the
    list of affected modules gives

    maps.pl
    bbccurrentxml.pl
    bbcthreedayxml.pl
    optimize_mythdb.pl
    tv_grab_uk_rt (a Perl script)
    mythrename.pl
    popcon-upload (a Perl script)

    Apart from popcon the rest seem to be related to mythtv or xmltv. They
    get upgraded along with the rest of the system so the last possible
    changes to them should also have been approx four days prior to the
    error starting.

    > This sounds like a system problem.
    > Run a memory checker. Run strace (or similar) on the process to see
    > what it's doing. Check the running process(es), if possible (check
    > top, ps, pstree, lsof, etc.) if relevant. See how much memory and CPU
    > is free before running the script(s) and how much the script(s) try and
    > use. Check your dmesg and system logs for error reporting, etc. and
    > ensure your kernel is configured to have proper error logging. Are
    > these scripts doing anything interesting, using any specific
    > commands/binaries, doing anything intensive? You said some scripts seg
    > fault and some don't. What types of scripts run and what types fail?
    > Please provide relevant information pertaining to this potentially
    > being an issue with Perl.


    Thanks for all the suggestions. I'll have to check further.

    James
     
    James Harris, Dec 3, 2008
    #10
  11. James Harris

    James Harris Guest

    On 2 Dec, 21:14, Ilya Zakharevich <> wrote:
    ....
    > gdb --args `which perl` -c mythrename.pl
    >
    > or
    >
    > gdb `which perl`
    > run -c mythrename.pl


    Use of gdb is new to me but running the file through to the end (at
    least that's what I think I have done) gives

    (gdb) run
    Starting program: /usr/bin/perl -c mythrename.pl
    (no debugging symbols found)
    (no debugging symbols found)
    (no debugging symbols found)
    (no debugging symbols found)
    (no debugging symbols found)
    [Thread debugging using libthread_db enabled]
    (no debugging symbols found)
    (no debugging symbols found)
    [New Thread 0xb7d568c0 (LWP 19771)]
    (no debugging symbols found)
    (no debugging symbols found)
    (no debugging symbols found)
    (no debugging symbols found)
    (no debugging symbols found)

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xb7d568c0 (LWP 19771)]
    0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/
    Socket.so
    (gdb)

    This sounds like the fault occurred in Socket.so. The file itself has
    not been updated recently

    $ ls -l /usr/lib/perl/5.8/auto/Socket/Socket.so
    -rw-r--r-- 1 root root 19676 2007-11-27 11:08 /usr/lib/perl/5.8/auto/
    Socket/Socket.so
    $

    so maybe it is being passed a null pointer that it was not before. I
    assume a pointer because the syslog message is

    Dec 3 00:23:17 s01 kernel: [287839.008837] mythrename.pl[19643]:
    segfault at 00000001 eip b7be5004 esp bfa104ff error 6

    This, I think, shows the instruction pointer as b7be5004 - i.e. not
    null - so presumably 00000001 is a memory address that's being passed
    to Socket.so or generated within it.

    I guess Socket.so is a shared library (shared object?) or similar.
    Maybe gdb can be used to find out if 00000001 is being passed to
    Socket.so (or a routine within it) and track down where 00000001 is
    being generated.

    I wish I could remember what changed, if anything, when the problem
    started....

    Thanks for the pointers to use gdb.

    James
     
    James Harris, Dec 3, 2008
    #11
  12. [A complimentary Cc of this posting was sent to
    James Harris
    <>], who wrote in article <>:
    > Program received signal SIGSEGV, Segmentation fault.
    > [Switching to Thread 0xb7d568c0 (LWP 19771)]
    > 0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/Socket.so


    > Dec 3 00:23:17 s01 kernel: [287839.008837] mythrename.pl[19643]:
    > segfault at 00000001 eip b7be5004 esp bfa104ff error 6


    This "at" looks very suspicious to me. I suspect what happens is that
    code at b7be5004 is essentially jump(00000001). (gdb allows you to
    look at the assembler, hint hint.)

    My guess would be that you run under some flavor of Unix, and your
    dynamic linking happens in the usual under Unix "russian roulette" way
    (except under AIX and [IIRC] HPUX, but for some unfathomable reason
    the default build of Perl under AIX now ALSO uses "russian roulette"
    dynalinking...).

    [E.g.: some newly installed module has an entry point named the same
    as a "legitimate" entry point in an "expected to be linked in"
    module, and now you are dynalinked with this "stray" module. To
    add insult to injury, this entry point was CODE, and now is DATA -
    so you get a constant 0x1 instead of an address of a subroutine...

    At least this is was I spent hours debugging when Solaris update
    put a symbol `err' into nls.so...]

    Hope this helps,
    Ilya

    P.S. I can imagine many ways why (on a sanely designed architecture)
    an update of a DLL hits the fan only days later.
     
    Ilya Zakharevich, Dec 3, 2008
    #12
  13. [A complimentary Cc of this posting was sent to
    James Harris
    <>], who wrote in article <>:
    > open("/usr/lib/perl/5.8/auto/Socket/Socket.so", O_RDONLY) = 8
    > read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16"...,
    > 512) = 512
    > fstat64(8, {st_mode=S_IFREG|0644, st_size=19676, ...}) = 0
    > mmap2(NULL, 22644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,
    > 0) = 0xb7c03000
    > mmap2(0xb7c08000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
    > MAP_DENYWRITE, 8, 0x4) = 0xb7c08000
    > close(8) = 0
    > --- SIGSEGV (Segmentation fault) @ 0 (0) ---


    > Does this mean that the segfault occurred as a result of the close(8)
    > call - or at least that close() was the last system call prior to the
    > fault?


    This does not mean anything. Just ignore this listing. Your segfault
    happens in "user code"; strace would not give you any useful
    information, expect the rough outline on when in "the program
    execution history"...

    [When you know that segfault happens in boot_Socket, you know that
    the previous successful syscall was opening the module Socket...
    Hmm, on the other hand, I would expect that dlsym() should be called
    somewhere; probably it is done in user space as well, and is not
    reflected in syscalls...]

    Yours,
    Ilya
     
    Ilya Zakharevich, Dec 3, 2008
    #13
  14. [A complimentary Cc of this posting was sent to
    James Harris
    <>], who wrote in article <>:
    > Program received signal SIGSEGV, Segmentation fault.
    > [Switching to Thread 0xb7d568c0 (LWP 19771)]
    > 0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/
    > Socket.so


    Hmm, BTW: would

    perl -MSocket -e0

    segfault? If not, look in the listing of loaded modules, and try to
    get a minimal sequence of module loads which segfaults.

    Yours,
    Ilya
     
    Ilya Zakharevich, Dec 3, 2008
    #14
  15. James Harris

    Guest

    James Harris <> wrote:
    > On 2 Dec, 16:57, wrote:
    > >
    > > I'd run:
    > >
    > > strace perl -c mythrename.pl
    > >
    > > And capture the output. If it wasn't obvious what when wrong based on
    > > the last thing before the segfault, then I'd grep out all of the file
    > > opens (system libraries and such), and look at all those files to see
    > > if any had changed recently.

    >
    > Great suggestion. I'd never heard of strace but it looks like the
    > quickest way to track down what was happening. The output ends as
    > follows
    >

    ....
    > open("/usr/lib/perl/5.8/auto/Socket/Socket.so", O_RDONLY) = 8
    > read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16"...,
    > 512) = 512
    > fstat64(8, {st_mode=S_IFREG|0644, st_size=19676, ...}) = 0
    > mmap2(NULL, 22644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,
    > 0) = 0xb7c03000
    > mmap2(0xb7c08000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
    > MAP_DENYWRITE, 8, 0x4) = 0xb7c08000
    > close(8) = 0
    > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
    > +++ killed by SIGSEGV +++
    > Process 18883 detached
    >
    > Does this mean that the segfault occurred as a result of the close(8)
    > call - or at least that close() was the last system call prior to the
    > fault?


    No to the first--close(8) completed successfully before the segfault.
    Yes to the second, close(8) was the last system call prior to the fault.
    So the fault probably occurred in "user space".

    I'd grep through the trace output for "open"ed .so files and see what their
    mod times are.

    You could use ltrace rather than strace to get even more info, but that
    would be a last resort. (Well actually, learning how to use gdb would be
    my last resort. It might do the job better, but I'd rather use the tools I
    already know than learn new ones.)


    > Or would the segfault have happened while executing the next call
    > which is not shown? (From the man page it looks as though strace will
    > trap signals and still print the failing call after a signal
    > occurs ... but I'm not sure I'm reading it correctly.)


    I think it would print at least something upon the initiation of the system
    call, even if the call did not successful finish. I can't guarantee it,
    but that is the working assumption I would make.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Dec 3, 2008
    #15
  16. James Harris

    James Harris Guest

    On 3 Dec, 17:06, wrote:
    ....
    > > Does this mean that the segfault occurred as a result of the close(8)
    > > call - or at least that close() was the last system call prior to the
    > > fault?

    >
    > No to the first--close(8) completed successfully before the segfault.
    > Yes to the second, close(8) was the last system call prior to the fault.
    > So the fault probably occurred in "user space".


    To follow up on this, I shut the machine down and ran memtest86+ to
    check the ram. That checked out OK for all tests.

    On restart, however, problems with at disk partitions were found. The
    problems reported included

    * Block bitmap differences
    * Free inode counts wrong
    * (Most alarming) Buffer I/O errors from which one can only a) ignore
    and b) force rewrite
    * (Most relevant, perhaps, as it relates to Perl's Socket.so though
    not directly):

    /usr/lib/perl/5.8.8/auto/Socket/Socket.so.dpkg-tmp mod time Nov 27,
    2007
    has 2 multiply-claimed blocks shared with 0 files

    I spent a while running through the reported problems and then let (e2)
    fsck do the rest. It took some time but on subsequent reboot the Perl
    problem had gone away. I expected to have to reinstall Socket.so at
    least but so far it seems to be OK now. As scripts run I'll keep an
    eye on them. Hopefully they will all work now.

    I've added a Linux group because this has led to other queries:

    1. Is there a way to tell what file systems are corrupt while the
    machine is running normally? - I.e. was Linux (Ubuntu) telling me of
    the faults somewhere?
    2. If it was where does it report this?
    3. If it wasn't why not??? Fsck knew of faults on some of the file
    systems on bootup without having to scan the disks for them. If it
    knows there why not report it sooner?

    Thanks to all in the Perl group for the education in debugging tools.
    I'll find other uses for them.

    James
     
    James Harris, Dec 3, 2008
    #16
  17. James Harris

    James Harris Guest

    On 3 Dec, 02:31, Ilya Zakharevich <> wrote:
    > [A complimentary Cc of this posting was sent to
    > James Harris
    > <>], who wrote in article <>:
    >
    > > Program received signal SIGSEGV, Segmentation fault.
    > > [Switching to Thread 0xb7d568c0 (LWP 19771)]
    > > 0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/Socket.so
    > > Dec 3 00:23:17 s01 kernel: [287839.008837] mythrename.pl[19643]:
    > > segfault at 00000001 eip b7be5004 esp bfa104ff error 6

    >
    > This "at" looks very suspicious to me. I suspect what happens is that
    > code at b7be5004 is essentially jump(00000001). (gdb allows you to
    > look at the assembler, hint hint.)


    Good call. The instruction at EIP was

    mov [ecx], edx

    and ecx showed, not surprisingly as 1. From the disassembly ecx was
    loaded a few instructions before after a call to a function.

    It's moot now as I'm glad to say, the problem has gone away after
    finding errors on some file systems (per separate post).

    I'll monitor for a while and see if the remaining perl scripts work or
    fail but hopefully that's it fixed - with me a bit more aware of
    debugging facilities than I was before....

    James
     
    James Harris, Dec 3, 2008
    #17
  18. James Harris

    Tim Greer Guest

    James Harris wrote:

    > I've added a Linux group because this has led to other queries:


    Oddly, I check the Linux groups before I check the Perl group and I
    didn't see your post there.

    > 1. Is there a way to tell what file systems are corrupt while the
    > machine is running normally? - I.e. was Linux (Ubuntu) telling me of
    > the faults somewhere?


    fsck, badblocks, smartctl, and various tools for your drive (raid
    specific tools/checks for some raid drives and their health, if you use
    raid, too).

    > 2. If it was where does it report this?


    To the output or a log, depending on the tool and option used or
    direction of the output, or if you mean to see any warnings/errors as
    they happen, check dmesg as it happens and the messages log. Other
    logs if you use other tools to check and log automatically.

    > 3. If it wasn't why not??? Fsck knew of faults on some of the file
    > systems on bootup without having to scan the disks for them. If it
    > knows there why not report it sooner?


    Ensure your kernel has the proper error logging/debugging enabled and
    you're running the checks manually or automatically with the
    aforementioned tools.

    The above is just a very quick run down and nothing too involved, but
    the general idea.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
     
    Tim Greer, Dec 4, 2008
    #18
  19. James Harris wrote:
    > On 3 Dec, 17:06, wrote:
    > ...
    >>> Does this mean that the segfault occurred as a result of the close(8)
    >>> call - or at least that close() was the last system call prior to the
    >>> fault?

    >> No to the first--close(8) completed successfully before the segfault.
    >> Yes to the second, close(8) was the last system call prior to the fault.
    >> So the fault probably occurred in "user space".

    >
    > To follow up on this, I shut the machine down and ran memtest86+ to
    > check the ram. That checked out OK for all tests.
    >
    > On restart, however, problems with at disk partitions were found. The
    > problems reported included
    >
    > * Block bitmap differences
    > * Free inode counts wrong
    > * (Most alarming) Buffer I/O errors from which one can only a) ignore
    > and b) force rewrite
    > * (Most relevant, perhaps, as it relates to Perl's Socket.so though
    > not directly):
    >
    > /usr/lib/perl/5.8.8/auto/Socket/Socket.so.dpkg-tmp mod time Nov 27,
    > 2007
    > has 2 multiply-claimed blocks shared with 0 files
    >
    > I spent a while running through the reported problems and then let (e2)
    > fsck do the rest. It took some time but on subsequent reboot the Perl
    > problem had gone away. I expected to have to reinstall Socket.so at
    > least but so far it seems to be OK now. As scripts run I'll keep an
    > eye on them. Hopefully they will all work now.
    >
    > I've added a Linux group because this has led to other queries:
    >
    > 1. Is there a way to tell what file systems are corrupt while the
    > machine is running normally? - I.e. was Linux (Ubuntu) telling me of
    > the faults somewhere?
    > 2. If it was where does it report this?
    > 3. If it wasn't why not??? Fsck knew of faults on some of the file
    > systems on bootup without having to scan the disks for them. If it
    > knows there why not report it sooner?
    >
    > Thanks to all in the Perl group for the education in debugging tools.
    > I'll find other uses for them.
    >
    > James


    When this sort f thing started happeing on my Mac OSX, there was so much
    corruption that I ended up reinstalling the OS.

    I had a bad RAM stick. And possibly a bad disk,. Or maybe the RAM
    corrupted the disk. It was small and old, so it got replaced anyway.
     
    The Natural Philosopher, Dec 4, 2008
    #19
  20. James Harris

    James Harris Guest

    Re: Linux disk errors - any early indications

    On 4 Dec, 08:35, The Natural Philosopher <> wrote:

    ....

    > > 1. Is there a way to tell what file systems are corrupt while the
    > > machine is running normally? - I.e. was Linux (Ubuntu) telling me of
    > > the faults somewhere?
    > > 2. If it was where does it report this?
    > > 3. If it wasn't why not??? Fsck knew of faults on some of the file
    > > systems on bootup without having to scan the disks for them. If it
    > > knows there why not report it sooner?


    ....

    > When this sort f thing started happeing on my Mac OSX, there was so much
    > corruption that I ended up reinstalling the OS.


    The really odd thing is that there seems to be no corruption - at
    least none that I've found so far. Perl modules that failed prior to
    fixing the file systems now work. I would have expected (e2)fsck to
    fix the structure of the partitions. I'm surprised it has apparently
    fixed or recovered the data too. Maybe that's something to do with
    using the journalling ext3...? I don't know - but I'm glad it is OK!

    James
     
    James Harris, Dec 4, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Charax
    Replies:
    5
    Views:
    495
    David
    Mar 5, 2004
  2. darrel
    Replies:
    0
    Views:
    480
    darrel
    Dec 15, 2005
  3. VB Programmer

    How troubleshoot detailsview edit?

    VB Programmer, Jan 15, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    520
    VB Programmer
    Jan 15, 2006
  4. =?Utf-8?B?bWFya2VsYw==?=
    Replies:
    2
    Views:
    1,090
    =?Utf-8?B?bWFya2VsYw==?=
    Jan 31, 2006
  5. Andrey Vul
    Replies:
    8
    Views:
    714
    Richard Bos
    Jul 30, 2010
Loading...

Share This Page