LWP hangs

Y

yahavba

Hi,
I'm using LWP on win32. Sometimes after a period of successfull
communication, the perl process just hangs, and it seems like LWP
stopped. The debug messages are:

LWP::UserAgent::_need_proxy: Not proxied
LWP::protocol::http::request: ()

in this stage the script stalls and i have to stop it using task
manager.

does anyone know why this might happen?

I'd appreciate you help!

Thanks
 
J

Jamie

In said:
Hi,
I'm using LWP on win32. Sometimes after a period of successfull
communication, the perl process just hangs, and it seems like LWP
stopped. The debug messages are:

LWP::UserAgent::_need_proxy: Not proxied
LWP::protocol::http::request: ()

in this stage the script stalls and i have to stop it using task
manager.

does anyone know why this might happen?

Is it going through a proxy?

Can you tell us where/which method it's hanging on? Sometimes
I'll override the methods (get/post/etc..) and have it spit out
the URL and the parameters it's trying to use. Then I'll go in with
telnet and try to mimick what it would/should do, sort of working
around from there. The key is in figuring out if it's LWP or if
it's something to do with the underlying network.

Course, if it's SSL, things get a little tricky..

Jamie
 
Y

yahavba

Is it going through a proxy?

Can you tell us where/which method it's hanging on? Sometimes
I'll override the methods (get/post/etc..) and have it spit out
the URL and the parameters it's trying to use. Then I'll go in with
telnet and try to mimick what it would/should do, sort of working
around from there. The key is in figuring out if it'sLWPor if
it's something to do with the underlying network.

Course, if it's SSL, things get a little tricky..

Jamie
--http://www.geniegate.com Custom web programming
Perl * Java * UNIX User Management Solutions

Hi,
It's hanging on get. I'm using get on a url which is secured (https),
and it works for quite a long time untill suddenly it stops and
hangs.
When you say you override the method - how exactly is it done? how can
i verify what parameters it is trying to use?

thanks for you help!
 
Y

yahavba

Is it going through a proxy?

Can you tell us where/which method it's hanging on? Sometimes
I'll override the methods (get/post/etc..) and have it spit out
the URL and the parameters it's trying to use. Then I'll go in with
telnet and try to mimick what it would/should do, sort of working
around from there. The key is in figuring out if it'sLWPor if
it's something to do with the underlying network.

Course, if it's SSL, things get a little tricky..

Jamie
--http://www.geniegate.com Custom web programming
Perl * Java * UNIX User Management Solutions

Also, it's not going through a proxy (and it is using SSL)
thanks!
 
J

Jamie

In said:
Hi,
It's hanging on get. I'm using get on a url which is secured (https),
and it works for quite a long time untill suddenly it stops and
hangs.
When you say you override the method - how exactly is it done? how can
i verify what parameters it is trying to use?

thanks for you help!

Do this: perldoc -m LWP::UserAgent

It'll give you the source code for LWP::UserAgent.

Then, in a sub or another package or a variety of ways..

NOTE:!!!!! Not-tested code, this is just a "for example" thing!

I'll probably goof this up, I'm editing "live" so beware...

sub get_ua {
{
package My::Ua;
use LWP::UserAgent;
use base 'LWP::UserAgent';
use strict;

# Use our bugged version to snoop in on things.
sub get {
my($self,@args) = @_;
print "CP1: $self called with " . join(',',@args), "\n";
my $rv = $self->SUPER::get(@args);
print "CP2: returning from get\n";
return($rv);
}
}
return(My::Ua->new(@_)); # Create our own version of LWP::UserAgent.
}


When you construct your LWP::UserAgent object, call get_ua() instead, in
the customized get() method above, you can insert print statements and
so on which will tell you the precise URL it's attempting to fetch. (you
can make a note of the URL's and observe if it's always the same URL,
this would be a key piece of information)

Confirm things are as they should, then follow along the path of LWP
until you get to request() (and at that point.. it's probably just
as easy to copy the whole thing over and pollute with print statements)
placing "CPnnn" statements in along the way.

At the end of it all, you'll get to a point where there isn't a "CPnnn"
printed where you think there ought to be one. At that point, you'll
have found exactly where it's hanging, and, if you're lucky.. it'll
be something obvious. :) If not, at least you'll have a good idea what's
wrong.

Do NOT modify the source of LWP::UserAgent (or any other module for that
matter) directly, always copy, or if it's more convenient, do a
custom override as above. Otherwise you'll end up with corrupt modules.

See Also: LWP::Debug

Though I've never used it, every problem I've ever had was as a result of me
passing the wrong stuff into get/post, I've never had to go further than what
I've described above. (and I usually override LWP::UserAgent in the beginning
anyway, just in case I might want to change it's behavior later on in
program development, ex: password fetching)

The above is just a debugging method I've found useful for "tough cases".

Jamie
 
Y

yahavba

Do this: perldoc -mLWP::UserAgent

It'll give you the source code forLWP::UserAgent.

Then, in a sub or another package or a variety of ways..

NOTE:!!!!! Not-tested code, this is just a "for example" thing!

I'll probably goof this up, I'm editing "live" so beware...

sub get_ua {
{
package My::Ua;
useLWP::UserAgent;
use base 'LWP::UserAgent';
use strict;

# Use our bugged version to snoop in on things.
sub get {
my($self,@args) = @_;
print "CP1: $self called with " . join(',',@args), "\n";
my $rv = $self->SUPER::get(@args);
print "CP2: returning from get\n";
return($rv);
}
}
return(My::Ua->new(@_)); # Create our own version ofLWP::UserAgent.

}

When you construct yourLWP::UserAgent object, call get_ua() instead, in
the customized get() method above, you can insert print statements and
so on which will tell you the precise URL it's attempting to fetch. (you
can make a note of the URL's and observe if it's always the same URL,
this would be a key piece of information)

Confirm things are as they should, then follow along the path ofLWP
until you get to request() (and at that point.. it's probably just
as easy to copy the whole thing over and pollute with print statements)
placing "CPnnn" statements in along the way.

At the end of it all, you'll get to a point where there isn't a "CPnnn"
printed where you think there ought to be one. At that point, you'll
have found exactly where it's hanging, and, if you're lucky.. it'll
be something obvious. :) If not, at least you'll have a good idea what's
wrong.

Do NOT modify the source ofLWP::UserAgent (or any other module for that
matter) directly, always copy, or if it's more convenient, do a
custom override as above. Otherwise you'll end up with corrupt modules.

See Also:LWP::Debug

Though I've never used it, every problem I've ever had was as a result of me
passing the wrong stuff into get/post, I've never had to go further than what
I've described above. (and I usually overrideLWP::UserAgent in the beginning
anyway, just in case I might want to change it's behavior later on in
program development, ex: password fetching)

The above is just a debugging method I've found useful for "tough cases".

Jamie
--http://www.geniegate.com Custom web programming
Perl * Java * UNIX User Management Solutions

Hi Jamie,
thanks a lot for your help and advice.
i've tried your solution, and i see that the GET actually receive the
correct URL. Afterwards, usually after fetching pages for 1-2 hours,
it hangs. I tried to use "alarm" of 60 seconds (and mapped SIG{ALRM}
to a subroutine of my own) but it didn't help, even ctrl-C doesn't
kill the process - only Task Manager kill.
I'm thinking of another way of running the GET commands in a seperate
process or thread, and then if i can't see the results of the GET in
the main process i will kill this thread, what do you think?
 
J

Jamie

In said:
On Mar 23, 3:32 am, (e-mail address removed) (Jamie) wrote:
i've tried your solution, and i see that the GET actually receive the
correct URL. Afterwards, usually after fetching pages for 1-2 hours,
it hangs. I tried to use "alarm" of 60 seconds (and mapped SIG{ALRM}
to a subroutine of my own) but it didn't help, even ctrl-C doesn't
kill the process - only Task Manager kill.

Does it hang on the exact URL each time? The ^C sort of baffles
me. With UNIX, I would probably examine the process and see if it's
taking a lot of memory (even then, ^C should work)
I'm thinking of another way of running the GET commands in a seperate
process or thread, and then if i can't see the results of the GET in
the main process i will kill this thread, what do you think?

I suppose that would work, or, fork a new process for each URL, wait
and then process it, then fork another process each time you fetch
a URL. The hack being, keep resource allocations in a child proc where
they can be cleaned up on exit.

Long running processes are sort of famous for memory leaks. (usually they get
progressively slower and slower and eventually just don't work / memory errors)

The "right way" (IMO) is to find out whats happening though. (this can be
really hard to do. Data::Dumper combined with UNIVERSAL::DESTROY will sometimes
help, but.. it's just not easy)

Doing a "fork()" is a cheap way around the problem, when the child process
dies (at least with unix) the memory is reclaimed. It's more of a band-aid
than a solution though. (useful if you need to do something you /know/ will
take a lot of memory, it's the only way I know of to give it back when
done)

If it's practical, you might take just the part that GET's the URL, without
any other code, run that in a loop and see if it hangs. That might let
you know if it's a problem with LWP or the rest of your code is doing
something that doesn't cause a problem until the GET.

I don't know enough about windows to understand the rest of the story,
could be most anything.. sockets not being closed? handles? collecting
a boatload of UserAgent objects some place?


Jamie
 
Y

yahavba

Does ithangon the exact URL each time? The ^C sort of baffles
me. With UNIX, I would probably examine the process and see if it's
taking a lot of memory (even then, ^C should work)


I suppose that would work, or, fork a new process for each URL, wait
and then process it, then fork another process each time you fetch
a URL. The hack being, keep resource allocations in a child proc where
they can be cleaned up on exit.

Long running processes are sort of famous for memory leaks. (usually theyget
progressively slower and slower and eventually just don't work / memory errors)

The "right way" (IMO) is to find out whats happening though. (this can be
really hard to do. Data::Dumper combined with UNIVERSAL::DESTROY will sometimes
help, but.. it's just not easy)

Doing a "fork()" is a cheap way around the problem, when the child process
dies (at least with unix) the memory is reclaimed. It's more of a band-aid
than a solution though. (useful if you need to do something you /know/ will
take a lot of memory, it's the only way I know of to give it back when
done)

If it's practical, you might take just the part thatGET'sthe URL, without
any other code, run that in a loop and see if it hangs. That might let
you know if it's a problem withLWPor the rest of your code is doing
something that doesn't cause a problem until theGET.

I don't know enough about windows to understand the rest of the story,
could be most anything.. sockets not being closed? handles? collecting
a boatload of UserAgent objects some place?

Jamie
--http://www.geniegate.com Custom web programming
Perl * Java * UNIX User Management Solutions

Hi Jamie,

I noticed that the perl process gets more and more memory (up to 300M
and more) over time. i did some investigation on the web and found out
that this might happen because the Mechanize object saves each visited
page so that the "back()" procedure will be possible. i know suspect
that this might cause the trouble - and not other issues. I'm trying
my best to find out how to overcome this - till now i haven't found
any way to disable this page saving.
did you came across such a behavior of mech?
since i'm logged in to the website, it's not reasonable for me to re-
create the object each time...
have to keep on thinking about it.
happy holiday and thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top