Long script "just stops" sometime

J

Jerry Krinock

I've written a 1500-line script which processes several dozen files of
source text written in Markdown to html. It takes several minutes to
run, indicating progress by printf statements. However, about 20% of
the time, in the middle of processing a Markdown file, it just stops
progressing, as though it is in an infinite loop. If I kill the
process and restart, it always completes successfully.

My script is, of course, being a script, not particularly efficient.
I was thinking that maybe Perl was running out of memory or something,
although that's not supposed to happen nowadays (Perl 5.10.0, Mac OS X
10.6). And when I check it in Apple's Activity Monitor during normal
operation, I find that its CPU and memory usage are hardly noticeable,
maybe 3% and a few tens of megabytes.

Are there any conditions under which Perl would "just stop"?

Any suggestions to troubleshoot this would be appreciated.

Thanks!

Jerry Krinock
 
I

Ilya Zakharevich

operation, I find that its CPU and memory usage are hardly noticeable,
maybe 3% and a few tens of megabytes.

Are there any conditions under which Perl would "just stop"?

With no CPU usage? I would say it reads from STDIN. Did you try to
press Enter or C-d?
Any suggestions to troubleshoot this would be appreciated.

Have not you heard about debugger? If this happens often, you can
just wait for it to happen in the debugger. If worse comes to worst,
and interactive debugging does not help, you can always start in
NonStop mode with `tracing', and look for the last several thousands
of lines when the `stop' happens.

Hope this helps,
Ilya
 
C

C.DeRykus

I've written a 1500-line script which processes several dozen files of
source text written in Markdown to html.  It takes several minutes to
run, indicating progress by printf statements.  However, about 20% of
the time, in the middle of processing a Markdown file, it just stops
progressing, as though it is in an infinite loop.  If I kill the
process and restart, it always completes successfully.

My script is, of course, being a script, not particularly efficient.
I was thinking that maybe Perl was running out of memory or something,
although that's not supposed to happen nowadays (Perl 5.10.0, Mac OS X
10.6).  And when I check it in Apple's Activity Monitor during normal
operation, I find that its CPU and memory usage are hardly noticeable,
maybe 3% and a few tens of megabytes.

Are there any conditions under which Perl would "just stop"?

Any suggestions to troubleshoot this would be appreciated.

You might try just setting a timeout around
whatever code section turns up in a stack
trace. (perldoc -f alarm).

Then exec the program again (perldoc -f exec)
if there's a timeout.. Of course if memory's
the problem, you may be able to find some way
to reduce memory usage and eliminate the timeout
workaround.
 
P

Peter J. Holzer

With no CPU usage? I would say it reads from STDIN. Did you try to
press Enter or C-d?

Two tools I find indispensable when trying to figure out "strange"
behaviour of programs are lsof and strace.

lsof lists the open files of a process. It has been ported to lots of
unixoid systems (I first encountered it on HP-UX) and should be
available on MacOS.

strace prints out the system calls a process invokes. Unfortunately,
while most unixoid systems have a program which does this, it seems to
have a different name on each ("truss" on Solaris, "tusc" on HP-UX, ...)
so the OP will have to find out himself how its called on MacOS.

In this case, if your guess is right, strace would show that the process
is currently waiting for a read on fd 0 to complete, and then lsof could
be used to find out which file fd 0 is (ok, so for fd 0 you may know
that it's the tty, for for (say) fd 43 you want a tool to look it up).

hp
 
X

Xho Jingleheimerschmidt

Peter said:
Two tools I find indispensable when trying to figure out "strange"
behaviour of programs are lsof and strace.

lsof lists the open files of a process. It has been ported to lots of
unixoid systems (I first encountered it on HP-UX) and should be
available on MacOS.

strace prints out the system calls a process invokes. Unfortunately,
while most unixoid systems have a program which does this, it seems to
have a different name on each ("truss" on Solaris, "tusc" on HP-UX, ...)
so the OP will have to find out himself how its called on MacOS.

In this case, if your guess is right, strace would show that the process
is currently waiting for a read on fd 0 to complete, and then lsof could
be used to find out which file fd 0 is (ok, so for fd 0 you may know
that it's the tty, for for (say) fd 43 you want a tool to look it up).

Unfortunately, if you use strace "-p" option to attach to an
already-running process, if often doesn't show you what call the process
was waiting on at the time of the attachment. You would have to start
stracing the process from the beginning, which is inconvenient if the
situation at question only happens occasionally.

Xho
 
E

Eric Pozharski

with said:
You can find out what call the process is currently blocking in with ps.

I've just checked (linux, 2.6.30, Debian). Neither '-n
/boot/System.map-2.6.30*' nor '-n /proc/*/wchan' helps 'ps' to find
syscall. 'ps' fails with its default routine either. Thus the output
for 'WCHAN' column is always either '-' or '?'. I remember, it was
there. Now it's missing.
 
E

Eric Pozharski

with said:
I've just checked (linux, 2.6.30, Debian). Neither '-n
/boot/System.map-2.6.30*' nor '-n /proc/*/wchan' helps 'ps' to find
syscall. 'ps' fails with its default routine either. Thus the output
for 'WCHAN' column is always either '-' or '?'. I remember, it was
there. Now it's missing.

I've checked more thoroughly, all 'wchan' files (there're
'/proc/*/task/*/wchan' too) are always '0'. BTW, they don't have
trailing newline either. R.I.P.
 
P

Peter J. Holzer

Unfortunately, if you use strace "-p" option to attach to an
already-running process, if often doesn't show you what call the process
was waiting on at the time of the attachment.

I don't think that ever happened to me in many years of using strace.
Except with multithreaded processes, but for those strace doesn't work
reliably anyway (I don't understand why).

hp
 
X

Xho Jingleheimerschmidt

Peter said:
I don't think that ever happened to me in many years of using strace.
Except with multithreaded processes, but for those strace doesn't work
reliably anyway (I don't understand why).

This is what I get:

$ perl -le '<>' &
[1] 17657
$ strace -p 17657
Process 17657 attached - interrupt to quit

And then no output until a break out of strace.

Xho
 
X

Xho Jingleheimerschmidt

Ben said:
Quoth Xho Jingleheimerschmidt said:
Well, maybe *you* can. I've never been able to.

Really?

~% uname
FreeBSD
~% perl -MPOSIX=pause -epause &
[3] 68357
~% ps -lp 68357
UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
1001 68357 14199 0 61 0 5204 3160 pause S 3 0:00.06 perl -MPOSI
~%

~$ uname
Linux
~$ perl -MPOSIX=pause -epause &
[1] 10335
~$ ps -lp 10335
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1051 10335 10321 0 80 0 - 5759 pause pts/1 00:00:00 perl
~$

It only tells you *which* syscall, of course, not what arguments it was
called with, but that's something.

On my Linux:

$ perl -MPOSIX=pause -epause &
[2] 26855
$ ps -lp 26855
Warning: /boot/System.map-2.6.21.4-eeepc not parseable as a System.map
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 26855 1790 0 77 0 - 943 11ff41 pts/0 00:00:00 perl

I don't have access to a more conventional Linux right now, but I don't
recall seeing anything other than '-' and '?' listed under wchan. I'll
have to try to remember to give it a try.

Thanks,

Xho
 
J

Jerry Krinock

I really appreciate all the thought that went into this thread. But
the first solution seems to work…

Quoth Jerry Krinock <[email protected]>:
Add the following somewhere early on:

    require Carp;
    $SIG{INFO} = sub { Carp::cluck("SIGINFO") };
    $SIG{QUIT} = sub { Carp::confess("SIGQUIT") };

Now you can press ^T to get a backtrace, and ^\ to get a backtrace and
kill the program.

Actually, works better than advertised. When I hit ^T, it prints a
backtrace to the console, and then, surprise!, the script starts
running again, and eventually exits success. This happened twice in
the last few days.

The backtrace tells me that it's sticking when I invoke IPC::Run to
invoke another perl script which I have written. I haven't dug into
it yet, because, well, it's not a big issue if all I need to is type
^T to un-stuck it. It might be mis-processing one little line of text
or something.

Thanks for all the help,

Jerry
 
P

Peter J. Holzer

Peter said:
I don't think that ever happened to me in many years of using strace.
Except with multithreaded processes, but for those strace doesn't work
reliably anyway (I don't understand why).

This is what I get:

$ perl -le '<>' &
[1] 17657

Here I get

[1] + suspended (tty input) perl -le '<>'

(I expected that. A background process should not be able to
read from the TTY)

strace then very rapidly prints

read(0, 0x8bc1c90, 4096) = ? ERESTARTSYS (To be
restarted)
--- SIGTTIN (Stopped (tty input)) @ 0 (0) ---
--- SIGTTIN (Stopped (tty input)) @ 0 (0) ---
read(0, 0x8bc1c90, 4096) = ? ERESTARTSYS (To be
restarted)
--- SIGTTIN (Stopped (tty input)) @ 0 (0) ---
--- SIGTTIN (Stopped (tty input)) @ 0 (0) ---
....


If I try that again without the "&", and start strace in another
terminal, I get:

% strace -p 27233
Process 27233 attached - interrupt to quit
read(0,

and the cursor is to the right of "read(0, ", indicating that the read
system call is still in progress. The line is completed as soon as the
system call returns.

(Linux 2.6.32-5-686, strace 4.5.20-2, but AFAIR it was always like this)

hp
 
E

Eric Pozharski

with said:
I don't have access to a more conventional Linux right now, but I
don't recall seeing anything other than '-' and '?' listed under
wchan. I'll have to try to remember to give it a try.

Please note, observations could depend on distribution and/or the kernel
being self-built or of-stock. I believe, kind of security-via-obscurity
idealism. And that's not the first time I have to fight that in Debian,
in particular.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top