perl segfault - how to troubleshoot

J

James Harris

James Harris wrote:
....


fsck, badblocks, smartctl, and various tools for your drive (raid
specific tools/checks for some raid drives and their health, if you use
raid, too).

I was thinking more of indications while Linux is running and the
disks are mounted, not of taking them down to scan them. I didn't
explain clearly enough but when restarting Linux the system knew
immediately that some file systems had errors. It didn't have to scan
the volumes to know there were errors. It simply said that file system
X has errors and will be scanned and checked.

If it knew that file system X had errors without scanning it there
must be a data value somewhere - probably in the affected partition -
that indicates error. If it wrote this value when closing the system
down it did so also without scanning the disks. Therefore Linux must
have known _prior to_ shutdown that file system X had errors. If it
did so I was wondering if this information is available to the system
admin prior to shutdown.

Does that make more sense now?

To the output or a log, depending on the tool and option used or
direction of the output, or if you mean to see any warnings/errors as
they happen, check dmesg as it happens and the messages log. Other
logs if you use other tools to check and log automatically.

Checked both of those but cannot see a notification of disk or file
system errors - at least not prior to reboot and running fsck.

Ensure your kernel has the proper error logging/debugging enabled and
you're running the checks manually or automatically with the
aforementioned tools.

Logging sounds good. I would prefer to avoid debugging as IMHO the
kernel or drivers should report the problem in all cases. It is fairly
important to know if there are file system corruptions, after all.

WRT logging syslog has *.* in syslog.conf. Just auth and authpriv are
set to none. I think that means syslog should contain any disk or file
system error messages but there are none I can see. Perhaps I need to
look for something specific...?

James
 
T

The Natural Philosopher

James said:
...


The really odd thing is that there seems to be no corruption - at
least none that I've found so far. Perl modules that failed prior to
fixing the file systems now work. I would have expected (e2)fsck to
fix the structure of the partitions. I'm surprised it has apparently
fixed or recovered the data too. Maybe that's something to do with
using the journalling ext3...? I don't know - but I'm glad it is OK!

That means that you didn't get MUCH disk corruption. Not beyond fsck's
ability to fix,and the disk image is probably OK now.

The question of what hardware caused the issue still remains though.
 
S

sln

On 3 Dec, 17:06, (e-mail address removed) wrote:
...

To follow up on this, I shut the machine down and ran memtest86+ to
check the ram. That checked out OK for all tests.

On restart, however, problems with at disk partitions were found. The
problems reported included

* Block bitmap differences
* Free inode counts wrong
* (Most alarming) Buffer I/O errors from which one can only a) ignore
and b) force rewrite
* (Most relevant, perhaps, as it relates to Perl's Socket.so though
not directly):

/usr/lib/perl/5.8.8/auto/Socket/Socket.so.dpkg-tmp mod time Nov 27,
2007
has 2 multiply-claimed blocks shared with 0 files

I spent a while running through the reported problems and then let (e2)
fsck do the rest. It took some time but on subsequent reboot the Perl
problem had gone away. I expected to have to reinstall Socket.so at
least but so far it seems to be OK now. As scripts run I'll keep an
eye on them. Hopefully they will all work now.

I've added a Linux group because this has led to other queries:

1. Is there a way to tell what file systems are corrupt while the
machine is running normally? - I.e. was Linux (Ubuntu) telling me of
the faults somewhere?
2. If it was where does it report this?
3. If it wasn't why not??? Fsck knew of faults on some of the file
systems on bootup without having to scan the disks for them. If it
knows there why not report it sooner?

Thanks to all in the Perl group for the education in debugging tools.
I'll find other uses for them.

James

I take it you hardly ever shut down/reboot. Every OS seems to do at
least a checksum on the file structure tables (without doing a file
scan) on boot-up.

If all you do is run a particular software all the time, odds are
disk/memory problems manifest in those areas of the disk producing
wild crazy cascading errors .. A typical indication of hardware fatigue.

Replace the whole machine if it is old.
Otherwise, some things to try:

* Run memtest for 24 straight hours, all tests.
* Stress test the cpu/ram, monitoring temperature.
Windows has ORTHOS and Core Temp 94.
* Stress test the disks. Windows has Everest Ultimate.
* Do a full scan on the IDE drives checking for bad physical
sectors.
* Repartition/format and move the software used most frequently
to a new area on the disk. This includes temporary files it
might create, to a new area.


sln
 
B

Bill Marcum

[Followup-To: comp.os.linux.misc]
I was thinking more of indications while Linux is running and the
disks are mounted, not of taking them down to scan them. I didn't
explain clearly enough but when restarting Linux the system knew
immediately that some file systems had errors. It didn't have to scan
the volumes to know there were errors. It simply said that file system
X has errors and will be scanned and checked.

If it knew that file system X had errors without scanning it there
must be a data value somewhere - probably in the affected partition -
that indicates error. If it wrote this value when closing the system
down it did so also without scanning the disks. Therefore Linux must
have known _prior to_ shutdown that file system X had errors. If it
did so I was wondering if this information is available to the system
admin prior to shutdown.

Does that make more sense now?
Watch for error messages in the log files. Logcheck could help.
 
T

Tim Greer

James said:
I was thinking more of indications while Linux is running and the
disks are mounted, not of taking them down to scan them.

You can run the checks while the drives/partitions are mounted. fsck
doesn't have to be told to actually try and fix the issues it finds,
but you can simply check with smartctl or similar tools, as well as
running badblocks on a live environment (again, you needn't tell it to
fix the bad blocks or issues it finds). What tools you use, depends on
your drives, drivers, kernel options and what you have installed. I'd
recommend asking in a Linux group or searching google for specifics.
 
S

sln

You can run the checks while the drives/partitions are mounted. fsck
doesn't have to be told to actually try and fix the issues it finds,
but you can simply check with smartctl or similar tools, as well as
running badblocks on a live environment (again, you needn't tell it to
fix the bad blocks or issues it finds). What tools you use, depends on
your drives, drivers, kernel options and what you have installed. I'd
recommend asking in a Linux group or searching google for specifics.

I'm not sure about anything Linux. Usually segment faults originates from
the cpu, filters up through the OS, then to the app. Or, it originates from
the OS, then up to the app, this usually a pointer that is not in the virtual
address space. Like de-referencing address 0.

It depends on the fault level, since, all the app see's is the OS generated
exceptions, it may be too serious for the app to continue.

In fact, sometimes, the app can do all the exception handling it wants but
it cannot trap the exception. The OS takes a dive and terminates without
prejudice.

When the OS tanks like this, usually the app see's nothing, or if anything
just spurts out a message like 'segfault', when its notified its about to be
terminated by the OS and nothing you can do about it, and nothing I'm gonna
tell you about it.

In Windows, a pretty system dialog comes up telling you Perl is gonna die,
do you want to Debug? Do you feel lucky sucker, well do ya? Of course, there
is not debug build, so click ok and watch it terminate.

Its a hardware problem, fix it. It will happen again and again and again.
OR, you can try to DEBUG the OS, because thats what it comes down to.
If you think the OS is faulty, you should contact Linux. But, they will
laugh to death before they hang up on you.


sln
 
S

sln

I'm not sure about anything Linux. Usually segment faults originates from
the cpu, filters up through the OS, then to the app. Or, it originates from
the OS, then up to the app, this usually a pointer that is not in the virtual
address space. Like de-referencing address 0.

It depends on the fault level, since, all the app see's is the OS generated
exceptions, it may be too serious for the app to continue.

In fact, sometimes, the app can do all the exception handling it wants but
it cannot trap the exception. The OS takes a dive and terminates without
prejudice.

When the OS tanks like this, usually the app see's nothing, or if anything
just spurts out a message like 'segfault', when its notified its about to be
terminated by the OS and nothing you can do about it, and nothing I'm gonna
tell you about it.

In Windows, a pretty system dialog comes up telling you Perl is gonna die,
do you want to Debug? Do you feel lucky sucker, well do ya? Of course, there
is not debug build, so click ok and watch it terminate.

Its a hardware problem, fix it. It will happen again and again and again.
OR, you can try to DEBUG the OS, because thats what it comes down to.
If you think the OS is faulty, you should contact Linux. But, they will
laugh to death before they hang up on you.
But remember, in your case, the OS wasn't faulty, neither was Perl.
The OS was fed faulty IDE electronic data from its file drivers.
BIG indication of hardware problems.

Here's a lesson and test for you all at the same time. Try to fix a
faulty drive, recover data, fix tables, try to get it right.
Reboot, then do it again. Watch 10,000 more errors come up.
Stick in some borderline memory, with known good drives, boot up, run some
stuff, reboot, then check the disk. Fix the disk, then run some more stuf.
Reboot, check the disk. Guess what? Your fixing the disk as much as when
the known bad disk was in there.

Kinda makes ya think doesen't it? Chicken/Egg thing.
In reality, you don't know if its bad disk or bad memory.
Hey, its not over, you just spent $800 bucks on new memory and disk
hardware... but man, its still happening... the cpu is bad.
But wait, it gets better, the bios is trashed giving false info to
the major components.. oh man, the DMA controller is fried.
Now you have to replace the motherboard.

But wait, how much/many OS re-installs does this all take.
Oh, maybe 10-20 before you get smart. By this time you have respect
for your hardware capturs' and have given up one man band
System Administrator job forever, and are praying at the feet of
the Hardware God, so help me Jesus, and chanting "I believe, I believe.."
when they find you catatonic in your house, frothing at the mouth with
a gun to your head !!!!!!!!!!!!!!!!!!


sln
 
S

sln

But remember, in your case, the OS wasn't faulty, neither was Perl.
The OS was fed faulty IDE electronic data from its file drivers.
BIG indication of hardware problems.

Here's a lesson and test for you all at the same time. Try to fix a
faulty drive, recover data, fix tables, try to get it right.
Reboot, then do it again. Watch 10,000 more errors come up.
Stick in some borderline memory, with known good drives, boot up, run some
stuff, reboot, then check the disk. Fix the disk, then run some more stuf.
Reboot, check the disk. Guess what? Your fixing the disk as much as when
the known bad disk was in there.

Kinda makes ya think doesen't it? Chicken/Egg thing.
In reality, you don't know if its bad disk or bad memory.
Hey, its not over, you just spent $800 bucks on new memory and disk
hardware... but man, its still happening... the cpu is bad.
But wait, it gets better, the bios is trashed giving false info to
the major components.. oh man, the DMA controller is fried.
Now you have to replace the motherboard.

But wait, how much/many OS re-installs does this all take.
Oh, maybe 10-20 before you get smart. By this time you have respect
for your hardware capturs' and have given up one man band
System Administrator job forever, and are praying at the feet of
the Hardware God, so help me Jesus, and chanting "I believe, I believe.."
when they find you catatonic in your house, frothing at the mouth with
a gun to your head !!!!!!!!!!!!!!!!!!
Oh, but it gets better. What if right now, the operating system files
are corrupt. Which ones? Where when, how to find out? When did this happen?

Seriously, are you telling me the OS could be corrupt? Why yes, yes I am.
I'm telling you right now the OS could be corrupt and you don't even know
about it.

Get a load of dem apples !

sln
 
S

sln

Oh, but it gets better. What if right now, the operating system files
are corrupt. Which ones? Where when, how to find out? When did this happen?

Seriously, are you telling me the OS could be corrupt? Why yes, yes I am.
I'm telling you right now the OS could be corrupt and you don't even know
about it.

Get a load of dem apples !
My home windows xp system uses a DFI Lanparty SLI with a 4x250 GB
Fujitsu Sata II raid array, running an overclocked Opteron dual core at 3 Ghz,
with 4G dual-core (the FSB is actually o/c'd to 270, and with the multiplyer
give the cpu its 3Ghz) No cooling problems because I have a massive sink on the cpu,
on air. All I can tell you is its a beast, and very reliable.

These Fujitsu's are/were the fastest in the world a year or so ago. Dunno now.
All I can tell you is. Whatever problems you 'think' can be solved on a computer
language usenet group, you are sadly mistaken.

sln
 
T

Tim Greer

Whatever problems you 'think' can be solved on a computer
language usenet group, you are sadly mistaken.

Who are you talking to? The OP asked the question and had the problem,
so reply to them (not me). Also, why did you reply to your own posts
four times? Why would you reply to your own posts and argue with
yourself? Why did you reply to _my_ post in the first place? I didn't
ask the question, and your poor method of quoting and replying to me,
makes it look like I asked the question, when I didn't.

You also said you killfiled me a few days ago when you embarrassed
yourself in another thread the other day. What happened there, can't
stay away? By the way, *I* did filter you out, but saw your replies to
me here on usenet archives when I was asked to refer to something
earlier for someone (so I've lifted the filter to reply to you here
about this). Anyway, your posting methods are just completely out
there. Please move on.
 
S

sln

Who are you talking to? The OP asked the question and had the problem,
so reply to them (not me). Also, why did you reply to your own posts
four times? Why would you reply to your own posts and argue with
yourself? Why did you reply to _my_ post in the first place? I didn't
ask the question, and your poor method of quoting and replying to me,
makes it look like I asked the question, when I didn't.

You also said you killfiled me a few days ago when you embarrassed
yourself in another thread the other day. What happened there, can't
stay away? By the way, *I* did filter you out, but saw your replies to
me here on usenet archives when I was asked to refer to something
earlier for someone (so I've lifted the filter to reply to you here
about this). Anyway, your posting methods are just completely out
there. Please move on.

Nobody killfiles me, you just got snookered by some other posters
rhetoric. They don't killfile me, they never did, they never will.
I'm relavent in a slanted way, but more so, a prodigeous implementor.
And I laugh it all off anyway.

I didn't post to you directly, but indirectly based on your snippings.
I guess thats a direct post to not only what your quoting was, but your
packaged answer. Which is felloneous in this context. It is mearly a
jump point, nothing personal.

I never argue with myself. I always asume I know whats best.
Its a family thing.

sln
 
T

Tim Greer

Nobody killfiles me,

I do (and will again after this reply -- you can have "the last word").
I had removed you from my filter to see your post here so I could
remind you that you said *you* had killfiled me, which you clearly
didn't.

On Friday 28 November 2008 5:45:38 pm, you stated you had "plonked" me.

http://en.wikipedia.org/wiki/Plonk_(usenet)

If nothing else, at least be a man of your word.
you just got snookered by some other posters
rhetoric.

Not really. What other people say they feel (negatively) about you, had
absolutely no bearing on my act of filtering you.
They don't killfile me, they never did, they never will.

I really don't know and don't care. However, since that's a matter of
interest, it seems you will never live up to your own word and killfile
me. That's fine, but why say you have or will?
I'm relavent in a slanted way, but more so, a prodigeous implementor.
And I laugh it all off anyway.

You're irrelevant.
I didn't post to you directly,

I never said you did, I stated your method of quoting and replying to my
post is confusing and appears as if you were replying to me. You
appeared confused. You then replied to yourself three more times, as
you very often do on this group. It's strange behavior, not that I
care, I just asked why you would out of curiosity.
but indirectly based on your snippings.

Your reply had nothing to do with my reply to the OP. You should reply
to the OP's own posts if you want to reply to them.
I guess thats a direct post to not only what your quoting was, but
your packaged answer.

You're not making sense. My response was to their new post and
question, not to guess about what their problem was. The man asked how
to check the drive for errors (on a Linux system), and you went on a
rant about some basic things you can do in Windows. It wasn't relevant
or helpful to them.
Which is felloneous in this context. It is
mearly a jump point, nothing personal.

I'm not offended, I didn't take it personally. If you didn't see the
OP's own posts, your news reader might be broken, or you need a new ISP
or news server. Additionally, you said you had kill filed me, yet you
reply to my posts, and not the OP's own. That seems odd.
I never argue with myself.

Odd that you have replied to yourself complaining about your own
previous posts. I guess you confused youself with someone else on
occasion?
I always asume I know whats best.
Unfortunate.

Its a family thing.

A Manson Family thing?

*plonk*
 
S

sln

(e-mail address removed) wrote:



I do (and will again after this reply -- you can have "the last word").
I had removed you from my filter to see your post here so I could
remind you that you said *you* had killfiled me, which you clearly
didn't.

On Friday 28 November 2008 5:45:38 pm, you stated you had "plonked" me.

http://en.wikipedia.org/wiki/Plonk_(usenet)

If nothing else, at least be a man of your word.


Not really. What other people say they feel (negatively) about you, had
absolutely no bearing on my act of filtering you.


I really don't know and don't care. However, since that's a matter of
interest, it seems you will never live up to your own word and killfile
me. That's fine, but why say you have or will?


You're irrelevant.


I never said you did, I stated your method of quoting and replying to my
post is confusing and appears as if you were replying to me. You
appeared confused. You then replied to yourself three more times, as
you very often do on this group. It's strange behavior, not that I
care, I just asked why you would out of curiosity.


Your reply had nothing to do with my reply to the OP. You should reply
to the OP's own posts if you want to reply to them.


You're not making sense. My response was to their new post and
question, not to guess about what their problem was. The man asked how
to check the drive for errors (on a Linux system), and you went on a
rant about some basic things you can do in Windows. It wasn't relevant
or helpful to them.


I'm not offended, I didn't take it personally. If you didn't see the
OP's own posts, your news reader might be broken, or you need a new ISP
or news server. Additionally, you said you had kill filed me, yet you
reply to my posts, and not the OP's own. That seems odd.


Odd that you have replied to yourself complaining about your own
previous posts. I guess you confused youself with someone else on
occasion?


A Manson Family thing?

*plonk*

There is just something phoney about you. Your a lightweight with
a big mouth. You parrot the behaviour of the more intellectuals here
but with nothing to back it up with.

Honestly, its you that is unfortunate.

Good luck!

sln
 
S

sln

I take it you hardly ever shut down/reboot. Every OS seems to do at
least a checksum on the file structure tables (without doing a file
scan) on boot-up.

If all you do is run a particular software all the time, odds are
disk/memory problems manifest in those areas of the disk producing
wild crazy cascading errors .. A typical indication of hardware fatigue.

Replace the whole machine if it is old.
Otherwise, some things to try:

* Run memtest for 24 straight hours, all tests.
* Stress test the cpu/ram, monitoring temperature.
Windows has ORTHOS and Core Temp 94.
* Stress test the disks. Windows has Everest Ultimate.
* Do a full scan on the IDE drives checking for bad physical
sectors.
* Repartition/format and move the software used most frequently
to a new area on the disk. This includes temporary files it
might create, to a new area.
I just want to add. Between software installs, defragment the disk
so that they are are on continuous sectors. Delete temporary files
after they are no longer needed. Defragment before and after upgrades,
etc. This will lengthen the life of your drive(s) as its arm will not
have to be subject to excessive physical 'jerk' when seeking data in
excessively scatterred patterns.

Good luck!


sln
 
T

Tim Greer

I just want to add. Between software installs, defragment the disk
so that they are are on continuous sectors. Delete temporary files
after they are no longer needed. Defragment before and after upgrades,
etc. This will lengthen the life of your drive(s) as its arm will not
have to be subject to excessive physical 'jerk' when seeking data in
excessively scatterred patterns.

Good luck!


sln

I've decided to lift my filter on you after replying to another post
(third party replying is no fun). It just wouldn't be as fun here if I
can't see your posts (no wonder people don't actually filter you,
right?), and here's a great example of why no one filters you and the
fun they'd miss out on:

The user said, very clearly, that are on a Linux system (Ubuntu). Linux
doesn't have defrag, and doesn't need it. This is already too far off
topic for this group, but to continually recommend these trivial things
that are related to Windows, when the user is on Linux, isn't helping
anyone. I'm not trying to be mean about it, but it is counter
productive.

A drive issue is unrelated to defragging a drive on Windows anyway, and
Linux doesn't use it (it doesn't need it). The OP was already informed
how to check the drives, as well as methods to tune them (not that that
is their problem anyway). Since this is no longer Perl related, reply
to them on the Linux news group (about Linux related answers).

Thanks for listening.
 
T

Tim Greer

Dr.Ruud said:
Tim Greer schreef:


*PLONK*

Good grief, calm down about the whole thing. Without seeing his posts,
besides the fact it makes your day more fun, is the fact you can
correct him for the sake of the OP, which I had done.
 
J

James Harris

You can run the checks while the drives/partitions are mounted. fsck
doesn't have to be told to actually try and fix the issues it finds,
but you can simply check with smartctl or similar tools, as well as
running badblocks on a live environment (again, you needn't tell it to
fix the bad blocks or issues it finds). What tools you use, depends on
your drives, drivers, kernel options and what you have installed. I'd
recommend asking in a Linux group or searching google for specifics.

(e2)fsck has no option to _only_ report disk error status. It has a -n
option to answer No to all questions but it still goes away and scans
the disk - at least it does so for a partition which it knows has
errors.

What I'd like it to get at is the info about which partitions have
errors without scanning them.

I found tune2fs has a -e error-behaviour option but its only options
are


continue Continue normal execution.

remount-ro Remount filesystem read-only.

panic Cause a kernel panic.

None of these are what's needed unless "continue" generates an error
report somewhere which it doesn't seem to at the moment. What I want
is for an alert to be raised when an error is detected. For example
by:

1) e-mail,
2) syslog,
3) note the fact in memory etc.

Then these could be checked by:

1) e-mail is easy to read,
2) syslog can be scanned in a job overnight - if there is some known
string to look for,
3) a command to report on the partition errors noted in memory.

Any one of these would do, though the last is the best option as it
doesn't rely on disk storage. Doesn't seem much to expect one of these
but Linux has no such option (that I can find, anyway) to report a
filesystem problem.

James
 
T

Tim Greer

James said:
Any one of these would do, though the last is the best option as it
doesn't rely on disk storage. Doesn't seem much to expect one of these
but Linux has no such option (that I can find, anyway) to report a
filesystem problem.

You needn't scan the drive for errors (that was a suggestion, of a few,
of how you could check to health or the current drive(s)), and with
normal usage it will report the error in the dmesg and messages log (if
it can), assuming you have the proper error reporting/debugging
enabled.

You should use tools such as smartctl (for example) and other commands
that will warn you both as errors happen and as it sees problems
starting to happen, but the rest were just tools you could use to check
the state of the drive now (you can use the other tools now that aren't
having to spend a long time scanning). It really depends on your
system and what you can do, such as a hardware raid card, software
raid, or just normal ide/scsi/sata drives. This should probably no
longer be posted to the Perl group.
 
J

James Harris

You needn't scan the drive for errors (that was a suggestion, of a few,
of how you could check to health or the current drive(s)), and with
normal usage it will report the error in the dmesg and messages log (if
it can), assuming you have the proper error reporting/debugging
enabled.

You should use tools such as smartctl (for example) and other commands
that will warn you both as errors happen and as it sees problems
starting to happen, but the rest were just tools you could use to check

OK. I have installed the smartmontools and set it up. It looks good so
far though it must be said it is a disk-level tool rather than one
that identifies partitions or file systems that have errors. The early
indications in the log and the e-mail ability are helpful, though.

Am also running badblocks scans on some of the partitions.

the state of the drive now (you can use the other tools now that aren't
having to spend a long time scanning). It really depends on your
system and what you can do, such as a hardware raid card, software
raid, or just normal ide/scsi/sata drives. This should probably no
longer be posted to the Perl group.

Agreed. Followups set accordingly.

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top