perl segfault - how to troubleshoot

J

James Harris

Since a few days ago perl segfaults when running certain scripts. The
scripts were ok before the problem started and have not been changed.
I have checked my (Ubuntu) system for updates. None were made near the
time of the first incidence of the problem so I started looking more
widely.

I found that even if I try to check syntax on one of the scripts that
fails perl segfaults.

$ perl -c mythrename.pl
Segmentation fault
$

If I try to run the debugger it doesn't get as far as prompting for
the first line

$ perl -d mythrename.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.

At this point CPU usage goes to 100%. Since I cannot even get the
debugger to start at the first line where do I go next to try and fix
this?

Anyone else had similar problems recently - within a week?

James
 
S

sln

Since a few days ago perl segfaults when running certain scripts. The
scripts were ok before the problem started and have not been changed.
I have checked my (Ubuntu) system for updates. None were made near the
time of the first incidence of the problem so I started looking more
widely.

I found that even if I try to check syntax on one of the scripts that
fails perl segfaults.

$ perl -c mythrename.pl
Segmentation fault
$

If I try to run the debugger it doesn't get as far as prompting for
the first line

$ perl -d mythrename.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.

At this point CPU usage goes to 100%. Since I cannot even get the
debugger to start at the first line where do I go next to try and fix
this?

Anyone else had similar problems recently - within a week?

James

Switch to Windows where you can see OS faults.
Stay away from porno sites or disable active content.
Run virus scan.
Run memtest from the bios.
Reformat/reinstall the OS/Perl.

Er, thats all I can think of.

Good luck!

sln
 
T

Tim Greer

James said:
Since a few days ago perl segfaults when running certain scripts. The
scripts were ok before the problem started and have not been changed.
I have checked my (Ubuntu) system for updates. None were made near the
time of the first incidence of the problem so I started looking more
widely.

I found that even if I try to check syntax on one of the scripts that
fails perl segfaults.

$ perl -c mythrename.pl
Segmentation fault
$

If I try to run the debugger it doesn't get as far as prompting for
the first line

$ perl -d mythrename.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.

At this point CPU usage goes to 100%. Since I cannot even get the
debugger to start at the first line where do I go next to try and fix
this?

Anyone else had similar problems recently - within a week?

James

I doubt anyone else has had similar problems recently just due to
running Perl (did you perform an install/upgrade of the system, any
modules or Perl itself recently?) This sounds like a system problem.
Run a memory checker. Run strace (or similar) on the process to see
what it's doing. Check the running process(es), if possible (check
top, ps, pstree, lsof, etc.) if relevant. See how much memory and CPU
is free before running the script(s) and how much the script(s) try and
use. Check your dmesg and system logs for error reporting, etc. and
ensure your kernel is configured to have proper error logging. Are
these scripts doing anything interesting, using any specific
commands/binaries, doing anything intensive? You said some scripts seg
fault and some don't. What types of scripts run and what types fail?
Please provide relevant information pertaining to this potentially
being an issue with Perl.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
James Harris
$ perl -c mythrename.pl
Segmentation fault

Try running under debugger:

gdb `which perl` -c mythrename.pl
$ perl -d mythrename.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.

Try enabling autotrace. Report.

Hope this helps,
Ilya
 
S

smallpond

Since a few days ago perl segfaults when running certain scripts. The
scripts were ok before the problem started and have not been changed.
I have checked my (Ubuntu) system for updates. None were made near the
time of the first incidence of the problem so I started looking more
widely.

I found that even if I try to check syntax on one of the scripts that
fails perl segfaults.

$ perl -c mythrename.pl
Segmentation fault
$

If I try to run the debugger it doesn't get as far as prompting for
the first line

$ perl -d mythrename.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.

At this point CPU usage goes to 100%. Since I cannot even get the
debugger to start at the first line where do I go next to try and fix
this?

Anyone else had similar problems recently - within a week?

James


What perl version?
Was DB compiled for this perl version?
What 'use' or 'require' statements do you have?
Are you using threads?

-S
 
X

xhoster

James Harris said:
Since a few days ago perl segfaults when running certain scripts. The
scripts were ok before the problem started and have not been changed.
I have checked my (Ubuntu) system for updates. None were made near the
time of the first incidence of the problem so I started looking more
widely.

I found that even if I try to check syntax on one of the scripts that
fails perl segfaults.

$ perl -c mythrename.pl
Segmentation fault
$

I'd run:

strace perl -c mythrename.pl

And capture the output. If it wasn't obvious what when wrong based on the
last thing before the segfault, then I'd grep out all of the file opens
(system libraries and such), and look at all those files to see if any
had changed recently.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Ilya Zakharevich
Try running under debugger:

gdb `which perl` -c mythrename.pl

Oups, IIRC, gdb does not allow a simple way of doing things. One
needs something like

gdb --args `which perl` -c mythrename.pl

or

gdb `which perl`
run -c mythrename.pl

Yours,
Ilya
 
J

James Harris

I'd run:

strace perl -c mythrename.pl

And capture the output. If it wasn't obvious what when wrong based on the
last thing before the segfault, then I'd grep out all of the file opens
(system libraries and such), and look at all those files to see if any
had changed recently.

Great suggestion. I'd never heard of strace but it looks like the
quickest way to track down what was happening. The output ends as
follows

stat64("/usr/lib/perl/5.8/auto/IO/IO.so", {st_mode=S_IFREG|0644,
st_size=15580, ...}) = 0
stat64("/usr/lib/perl/5.8/auto/IO/IO.bs", {st_mode=S_IFREG|0644,
st_size=0, ...}) = 0
open("/usr/lib/perl/5.8/auto/IO/IO.so", O_RDONLY) = 9
read(9, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\22\0"...,
512) = 512
fstat64(9, {st_mode=S_IFREG|0644, st_size=15580, ...}) = 0
mmap2(NULL, 18548, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 9,
0) = 0xb7c09000
mmap2(0xb7c0d000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 9, 0x3) = 0xb7c0d000
close(9) = 0
brk(0x83e5000) = 0x83e5000
read(8, "sage: $io->stat()\';\n stat($_["..., 4096) = 3496
read(8, "", 4096) = 0
close(8) = 0
stat64("/etc/perl/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No such file
or directory)
stat64("/etc/perl/Socket.pm", 0xbf9eaf6c) = -1 ENOENT (No such file or
directory)
stat64("/usr/local/lib/perl/5.8.8/Socket.pmc", 0xbf9eb05c) = -1 ENOENT
(No such file or directory)
stat64("/usr/local/lib/perl/5.8.8/Socket.pm", 0xbf9eaf6c) = -1 ENOENT
(No such file or directory)
stat64("/usr/local/share/perl/5.8.8/Socket.pmc", 0xbf9eb05c) = -1
ENOENT (No such file or directory)
stat64("/usr/local/share/perl/5.8.8/Socket.pm", 0xbf9eaf6c) = -1
ENOENT (No such file or directory)
stat64("/usr/lib/perl5/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No such
file or directory)
stat64("/usr/lib/perl5/Socket.pm", 0xbf9eaf6c) = -1 ENOENT (No such
file or directory)
stat64("/usr/share/perl5/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No such
file or directory)
stat64("/usr/share/perl5/Socket.pm", 0xbf9eaf6c) = -1 ENOENT (No such
file or directory)
stat64("/usr/lib/perl/5.8/Socket.pmc", 0xbf9eb05c) = -1 ENOENT (No
such file or directory)
stat64("/usr/lib/perl/5.8/Socket.pm", {st_mode=S_IFREG|0644,
st_size=3514, ...}) = 0
open("/usr/lib/perl/5.8/Socket.pm", O_RDONLY|O_LARGEFILE) = 8
ioctl(8, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf9ead78) = -1 ENOTTY
(Inappropriate ioctl for device)
_llseek(8, 0, [0], SEEK_CUR) = 0
read(8, "package Socket;\n\nour($VERSION, @"..., 4096) = 3514
read(8, "", 4096) = 0
close(8) = 0
stat64("/usr/lib/perl/5.8/auto/Socket/Socket.so", {st_mode=S_IFREG|
0644, st_size=19676, ...}) = 0
stat64("/usr/lib/perl/5.8/auto/Socket/Socket.bs", {st_mode=S_IFREG|
0644, st_size=0, ...}) = 0
open("/usr/lib/perl/5.8/auto/Socket/Socket.so", O_RDONLY) = 8
read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16"...,
512) = 512
fstat64(8, {st_mode=S_IFREG|0644, st_size=19676, ...}) = 0
mmap2(NULL, 22644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,
0) = 0xb7c03000
mmap2(0xb7c08000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 8, 0x4) = 0xb7c08000
close(8) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 18883 detached

Does this mean that the segfault occurred as a result of the close(8)
call - or at least that close() was the last system call prior to the
fault?

Or would the segfault have happened while executing the next call
which is not shown? (From the man page it looks as though strace will
trap signals and still print the failing call after a signal
occurs ... but I'm not sure I'm reading it correctly.)

James
 
E

Eric Pozharski

Since a few days ago perl segfaults when running certain scripts. The
scripts were ok before the problem started and have not been changed.
I have checked my (Ubuntu) system for updates. None were made near the
time of the first incidence of the problem so I started looking more
widely.

(just for the record) The motto of c.t.t is: "Please provide complete
minimal example that clearly exhibits your problem". Point.

*CUT*

[To: all]

perl -wle '
$x = q|abc|;
q|zyx| =~ m[(??{ q|zyx| =~ m,[$x-]+, })]'
Out of memory!

If B<perl> is replaced with B<debugperl> (that's the Perl of Debian
built with I<DEBUGGING> enabled, then tail is:

Matching embedded REx "" against ""...
104520 <zyx%0%0%0%0%0!%0%0%0%340%255P%tX%255P%t%320%355%16%10%3%0%0> <>|
1: NOTHING(2)
104520 <zyx%0%0%0%0%0!%0%0%0%340%255P%tX%255P%t%320%355%16%10%3%0%0> <>|
2: END(0)
EVAL trying tail ... 0
104520 <zyx%0%0%0%0%0!%0%0%0%340%255P%tX%255P%t%320%355%16%10%3%0%0> <>|
4: END(0)
Freeing REx: ""
Match successful!
panic: malloc at -e line 1.
Freeing REx: "(??{ q|zyx| =~ m,[$x-]+, })"
Freeing REx: "[abc-]+"
panic: free from wrong pool.

Some time ago when I'd experimented with that construct (I insist, I had
a reason) then that beast segfaulted too. This sample doesn't. That
wasn't one-liner after all.
 
J

James Harris

I doubt anyone else has had similar problems recently just due to
running Perl (did you perform an install/upgrade of the system, any
modules or Perl itself recently?)

No upgrades. This is really weird. From syslog the problem started on
Nov 27th at 17:56:54 with the following entry

$ gunzip -c syslog.3.gz | grep segfault | more
Nov 27 17:56:54 s01 kernel: [678797.925818] maps.pl[21703]: segfault
at 00000000 eip b7bdd004 esp bf8a1e8f error 6

Before that there was no segfault - at least for the two days the logs
go back. (I have since - I think - changed the log rotate frequency
for syslog to weekly rather than daily.)

November's system upgrades were on Nov 20, Nov 23, and Nov 29 (and the
fault started on 27th) so the only upgrade prior to the fault starting
was that on the 23rd about four days beforehand.

After that there are lots of similar segfault messages. Extracting the
list of affected modules gives

maps.pl
bbccurrentxml.pl
bbcthreedayxml.pl
optimize_mythdb.pl
tv_grab_uk_rt (a Perl script)
mythrename.pl
popcon-upload (a Perl script)

Apart from popcon the rest seem to be related to mythtv or xmltv. They
get upgraded along with the rest of the system so the last possible
changes to them should also have been approx four days prior to the
error starting.
This sounds like a system problem.
Run a memory checker. Run strace (or similar) on the process to see
what it's doing. Check the running process(es), if possible (check
top, ps, pstree, lsof, etc.) if relevant. See how much memory and CPU
is free before running the script(s) and how much the script(s) try and
use. Check your dmesg and system logs for error reporting, etc. and
ensure your kernel is configured to have proper error logging. Are
these scripts doing anything interesting, using any specific
commands/binaries, doing anything intensive? You said some scripts seg
fault and some don't. What types of scripts run and what types fail?
Please provide relevant information pertaining to this potentially
being an issue with Perl.

Thanks for all the suggestions. I'll have to check further.

James
 
J

James Harris

gdb --args `which perl` -c mythrename.pl

or

gdb `which perl`
run -c mythrename.pl

Use of gdb is new to me but running the file through to the end (at
least that's what I think I have done) gives

(gdb) run
Starting program: /usr/bin/perl -c mythrename.pl
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
(no debugging symbols found)
(no debugging symbols found)
[New Thread 0xb7d568c0 (LWP 19771)]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7d568c0 (LWP 19771)]
0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/
Socket.so
(gdb)

This sounds like the fault occurred in Socket.so. The file itself has
not been updated recently

$ ls -l /usr/lib/perl/5.8/auto/Socket/Socket.so
-rw-r--r-- 1 root root 19676 2007-11-27 11:08 /usr/lib/perl/5.8/auto/
Socket/Socket.so
$

so maybe it is being passed a null pointer that it was not before. I
assume a pointer because the syslog message is

Dec 3 00:23:17 s01 kernel: [287839.008837] mythrename.pl[19643]:
segfault at 00000001 eip b7be5004 esp bfa104ff error 6

This, I think, shows the instruction pointer as b7be5004 - i.e. not
null - so presumably 00000001 is a memory address that's being passed
to Socket.so or generated within it.

I guess Socket.so is a shared library (shared object?) or similar.
Maybe gdb can be used to find out if 00000001 is being passed to
Socket.so (or a routine within it) and track down where 00000001 is
being generated.

I wish I could remember what changed, if anything, when the problem
started....

Thanks for the pointers to use gdb.

James
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
James Harris
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7d568c0 (LWP 19771)]
0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/Socket.so
Dec 3 00:23:17 s01 kernel: [287839.008837] mythrename.pl[19643]:
segfault at 00000001 eip b7be5004 esp bfa104ff error 6

This "at" looks very suspicious to me. I suspect what happens is that
code at b7be5004 is essentially jump(00000001). (gdb allows you to
look at the assembler, hint hint.)

My guess would be that you run under some flavor of Unix, and your
dynamic linking happens in the usual under Unix "russian roulette" way
(except under AIX and [IIRC] HPUX, but for some unfathomable reason
the default build of Perl under AIX now ALSO uses "russian roulette"
dynalinking...).

[E.g.: some newly installed module has an entry point named the same
as a "legitimate" entry point in an "expected to be linked in"
module, and now you are dynalinked with this "stray" module. To
add insult to injury, this entry point was CODE, and now is DATA -
so you get a constant 0x1 instead of an address of a subroutine...

At least this is was I spent hours debugging when Solaris update
put a symbol `err' into nls.so...]

Hope this helps,
Ilya

P.S. I can imagine many ways why (on a sanely designed architecture)
an update of a DLL hits the fan only days later.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
James Harris
open("/usr/lib/perl/5.8/auto/Socket/Socket.so", O_RDONLY) = 8
read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16"...,
512) = 512
fstat64(8, {st_mode=S_IFREG|0644, st_size=19676, ...}) = 0
mmap2(NULL, 22644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,
0) = 0xb7c03000
mmap2(0xb7c08000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 8, 0x4) = 0xb7c08000
close(8) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
Does this mean that the segfault occurred as a result of the close(8)
call - or at least that close() was the last system call prior to the
fault?

This does not mean anything. Just ignore this listing. Your segfault
happens in "user code"; strace would not give you any useful
information, expect the rough outline on when in "the program
execution history"...

[When you know that segfault happens in boot_Socket, you know that
the previous successful syscall was opening the module Socket...
Hmm, on the other hand, I would expect that dlsym() should be called
somewhere; probably it is done in user space as well, and is not
reflected in syscalls...]

Yours,
Ilya
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
James Harris
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7d568c0 (LWP 19771)]
0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/
Socket.so

Hmm, BTW: would

perl -MSocket -e0

segfault? If not, look in the listing of loaded modules, and try to
get a minimal sequence of module loads which segfaults.

Yours,
Ilya
 
X

xhoster

James Harris said:
Great suggestion. I'd never heard of strace but it looks like the
quickest way to track down what was happening. The output ends as
follows
....
open("/usr/lib/perl/5.8/auto/Socket/Socket.so", O_RDONLY) = 8
read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16"...,
512) = 512
fstat64(8, {st_mode=S_IFREG|0644, st_size=19676, ...}) = 0
mmap2(NULL, 22644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,
0) = 0xb7c03000
mmap2(0xb7c08000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 8, 0x4) = 0xb7c08000
close(8) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 18883 detached

Does this mean that the segfault occurred as a result of the close(8)
call - or at least that close() was the last system call prior to the
fault?

No to the first--close(8) completed successfully before the segfault.
Yes to the second, close(8) was the last system call prior to the fault.
So the fault probably occurred in "user space".

I'd grep through the trace output for "open"ed .so files and see what their
mod times are.

You could use ltrace rather than strace to get even more info, but that
would be a last resort. (Well actually, learning how to use gdb would be
my last resort. It might do the job better, but I'd rather use the tools I
already know than learn new ones.)

Or would the segfault have happened while executing the next call
which is not shown? (From the man page it looks as though strace will
trap signals and still print the failing call after a signal
occurs ... but I'm not sure I'm reading it correctly.)

I think it would print at least something upon the initiation of the system
call, even if the call did not successful finish. I can't guarantee it,
but that is the working assumption I would make.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
J

James Harris

On 3 Dec, 17:06, (e-mail address removed) wrote:
....
No to the first--close(8) completed successfully before the segfault.
Yes to the second, close(8) was the last system call prior to the fault.
So the fault probably occurred in "user space".

To follow up on this, I shut the machine down and ran memtest86+ to
check the ram. That checked out OK for all tests.

On restart, however, problems with at disk partitions were found. The
problems reported included

* Block bitmap differences
* Free inode counts wrong
* (Most alarming) Buffer I/O errors from which one can only a) ignore
and b) force rewrite
* (Most relevant, perhaps, as it relates to Perl's Socket.so though
not directly):

/usr/lib/perl/5.8.8/auto/Socket/Socket.so.dpkg-tmp mod time Nov 27,
2007
has 2 multiply-claimed blocks shared with 0 files

I spent a while running through the reported problems and then let (e2)
fsck do the rest. It took some time but on subsequent reboot the Perl
problem had gone away. I expected to have to reinstall Socket.so at
least but so far it seems to be OK now. As scripts run I'll keep an
eye on them. Hopefully they will all work now.

I've added a Linux group because this has led to other queries:

1. Is there a way to tell what file systems are corrupt while the
machine is running normally? - I.e. was Linux (Ubuntu) telling me of
the faults somewhere?
2. If it was where does it report this?
3. If it wasn't why not??? Fsck knew of faults on some of the file
systems on bootup without having to scan the disks for them. If it
knows there why not report it sooner?

Thanks to all in the Perl group for the education in debugging tools.
I'll find other uses for them.

James
 
J

James Harris

[A complimentary Cc of this posting was sent to
James Harris
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7d568c0 (LWP 19771)]
0xb7be0004 in boot_Socket () from /usr/lib/perl/5.8/auto/Socket/Socket.so
Dec 3 00:23:17 s01 kernel: [287839.008837] mythrename.pl[19643]:
segfault at 00000001 eip b7be5004 esp bfa104ff error 6

This "at" looks very suspicious to me. I suspect what happens is that
code at b7be5004 is essentially jump(00000001). (gdb allows you to
look at the assembler, hint hint.)

Good call. The instruction at EIP was

mov [ecx], edx

and ecx showed, not surprisingly as 1. From the disassembly ecx was
loaded a few instructions before after a call to a function.

It's moot now as I'm glad to say, the problem has gone away after
finding errors on some file systems (per separate post).

I'll monitor for a while and see if the remaining perl scripts work or
fail but hopefully that's it fixed - with me a bit more aware of
debugging facilities than I was before....

James
 
T

Tim Greer

James said:
I've added a Linux group because this has led to other queries:

Oddly, I check the Linux groups before I check the Perl group and I
didn't see your post there.
1. Is there a way to tell what file systems are corrupt while the
machine is running normally? - I.e. was Linux (Ubuntu) telling me of
the faults somewhere?

fsck, badblocks, smartctl, and various tools for your drive (raid
specific tools/checks for some raid drives and their health, if you use
raid, too).
2. If it was where does it report this?

To the output or a log, depending on the tool and option used or
direction of the output, or if you mean to see any warnings/errors as
they happen, check dmesg as it happens and the messages log. Other
logs if you use other tools to check and log automatically.
3. If it wasn't why not??? Fsck knew of faults on some of the file
systems on bootup without having to scan the disks for them. If it
knows there why not report it sooner?

Ensure your kernel has the proper error logging/debugging enabled and
you're running the checks manually or automatically with the
aforementioned tools.

The above is just a very quick run down and nothing too involved, but
the general idea.
 
T

The Natural Philosopher

James said:
On 3 Dec, 17:06, (e-mail address removed) wrote:
...

To follow up on this, I shut the machine down and ran memtest86+ to
check the ram. That checked out OK for all tests.

On restart, however, problems with at disk partitions were found. The
problems reported included

* Block bitmap differences
* Free inode counts wrong
* (Most alarming) Buffer I/O errors from which one can only a) ignore
and b) force rewrite
* (Most relevant, perhaps, as it relates to Perl's Socket.so though
not directly):

/usr/lib/perl/5.8.8/auto/Socket/Socket.so.dpkg-tmp mod time Nov 27,
2007
has 2 multiply-claimed blocks shared with 0 files

I spent a while running through the reported problems and then let (e2)
fsck do the rest. It took some time but on subsequent reboot the Perl
problem had gone away. I expected to have to reinstall Socket.so at
least but so far it seems to be OK now. As scripts run I'll keep an
eye on them. Hopefully they will all work now.

I've added a Linux group because this has led to other queries:

1. Is there a way to tell what file systems are corrupt while the
machine is running normally? - I.e. was Linux (Ubuntu) telling me of
the faults somewhere?
2. If it was where does it report this?
3. If it wasn't why not??? Fsck knew of faults on some of the file
systems on bootup without having to scan the disks for them. If it
knows there why not report it sooner?

Thanks to all in the Perl group for the education in debugging tools.
I'll find other uses for them.

James

When this sort f thing started happeing on my Mac OSX, there was so much
corruption that I ended up reinstalling the OS.

I had a bad RAM stick. And possibly a bad disk,. Or maybe the RAM
corrupted the disk. It was small and old, so it got replaced anyway.
 
J

James Harris

....
....

When this sort f thing started happeing on my Mac OSX, there was so much
corruption that I ended up reinstalling the OS.

The really odd thing is that there seems to be no corruption - at
least none that I've found so far. Perl modules that failed prior to
fixing the file systems now work. I would have expected (e2)fsck to
fix the structure of the partitions. I'm surprised it has apparently
fixed or recovered the data too. Maybe that's something to do with
using the journalling ext3...? I don't know - but I'm glad it is OK!

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top