E
el.dodgero
Hey, I just had this weird issue come up. A Schroedingbug of some sort.
I have a perlscript that deals with grabbing database updates from a
screen scrape off a tn3270 connection. It gets the list of records
(about 5,000 of them) and then gets the information form the tn3270
screen scrape, which generally amounts in about 30,000 (average of rows
per item).
To make this whole thing run faster, since I have 19 dedicated seperate
logins to the tn3270 connection, and since the screen scraping is SLOW
compared to most other things, rather than conecting and updating one
at a time, I slice the list up into 19 even (or a little short on the
last chunk) chunks, and then hand them off to child processes. I then
write the PIDs I get back form the kids into a file as they return.
A controller script that wraps around this runs this script, then
checks the PID file against ps to determine of those kids, by pids, are
still running. Each of the kids is just doings its list and writing the
results into the bottom of the same tab-delimited text file.
When all the pids are done, i.e. when none of them are found in ps, the
controller script then runs sqlldr and loads all the stuff into the
database.
The entire process takes about 12-20 minutes depending on hos slow the
network is running at any given time. This, compared to the up to three
hours it used to take, is a real improvement, and the approach makes
sense.
It's also been working flawlessly for weeks.
Until this week. Things started flaking out, and I was wracking my
brain to figure out why. I ran it, and it was acting like it was
returning way too fast, and doing pretty much nothing, or only putting
in a few rows (like less than 100).
Finally, I ran the first script manually. Then I copied the pid file
into another file, opened it in vim, appended a kill -9 to the
beginning of every line, made it executable, and tried to kill off all
the kids.
It didn't work. The PIDs were wrong.
The PIDs -- on Solaris, mind you, not like Windows or something weird
that would need to emulate fork() -- the PIDs returned from fork() were
*wrong*, at least compared to what ps showed running.
I can't for the life of me figure out why this would be. I checked for
possible hacker attmepts in case someone was running something that
deliberately tricked the kernel into offsetting process IDs, but didn't
see anything of the sort. All I know is that I got 19 process IDs back
form fork() that were NOT the process IDs that were actually running
and reported in the output of ps.
Anyone know anything, or is this going to be one of those mystery posts
that hangs around on the net where, a year later, when someone has the
same issue they google for it, find this message and get all excited
that they have found the solution, only to realise that it's just this
empty, unanswered question? *
* Blatant guilt-trip for anyone who knows the answer but doesn't feel
like posting. Have some chicken soup and matzo! Eat, eat, ya so skinny!
You nevah call anymoa!
I have a perlscript that deals with grabbing database updates from a
screen scrape off a tn3270 connection. It gets the list of records
(about 5,000 of them) and then gets the information form the tn3270
screen scrape, which generally amounts in about 30,000 (average of rows
per item).
To make this whole thing run faster, since I have 19 dedicated seperate
logins to the tn3270 connection, and since the screen scraping is SLOW
compared to most other things, rather than conecting and updating one
at a time, I slice the list up into 19 even (or a little short on the
last chunk) chunks, and then hand them off to child processes. I then
write the PIDs I get back form the kids into a file as they return.
A controller script that wraps around this runs this script, then
checks the PID file against ps to determine of those kids, by pids, are
still running. Each of the kids is just doings its list and writing the
results into the bottom of the same tab-delimited text file.
When all the pids are done, i.e. when none of them are found in ps, the
controller script then runs sqlldr and loads all the stuff into the
database.
The entire process takes about 12-20 minutes depending on hos slow the
network is running at any given time. This, compared to the up to three
hours it used to take, is a real improvement, and the approach makes
sense.
It's also been working flawlessly for weeks.
Until this week. Things started flaking out, and I was wracking my
brain to figure out why. I ran it, and it was acting like it was
returning way too fast, and doing pretty much nothing, or only putting
in a few rows (like less than 100).
Finally, I ran the first script manually. Then I copied the pid file
into another file, opened it in vim, appended a kill -9 to the
beginning of every line, made it executable, and tried to kill off all
the kids.
It didn't work. The PIDs were wrong.
The PIDs -- on Solaris, mind you, not like Windows or something weird
that would need to emulate fork() -- the PIDs returned from fork() were
*wrong*, at least compared to what ps showed running.
I can't for the life of me figure out why this would be. I checked for
possible hacker attmepts in case someone was running something that
deliberately tricked the kernel into offsetting process IDs, but didn't
see anything of the sort. All I know is that I got 19 process IDs back
form fork() that were NOT the process IDs that were actually running
and reported in the output of ps.
Anyone know anything, or is this going to be one of those mystery posts
that hangs around on the net where, a year later, when someone has the
same issue they google for it, find this message and get all excited
that they have found the solution, only to realise that it's just this
empty, unanswered question? *
* Blatant guilt-trip for anyone who knows the answer but doesn't feel
like posting. Have some chicken soup and matzo! Eat, eat, ya so skinny!
You nevah call anymoa!