how check new URL of redirected page

Z

zawszedamian_p

I read webpage using HTTP::Response but the page was redirected - how
can I read new url?
I tried HTTP::Response->base() but it returns orginal url.

Thanks
 
B

Ben Morrow

Quoth (e-mail address removed):
I read webpage using HTTP::Response but the page was redirected - how
can I read new url?
I tried HTTP::Response->base() but it returns orginal url.

If you mean a proper HTTP redirect rather than an HTML meta-refresh or
something more evil in JavaScript,

$response->header('Location');

However, LWP::UserAgent will follow redirects by default, so unless
you've turned it off this won't help :(. If the page is HTML with a
meta-refresh, you will need to parse it with e.g. HTML::parser and
extract the <meta> elements, and find the one with the refresh in. If
it's using JS, you're out of luck, unless the pages you are working with
have similar pieces of JS every time and you can see how to extract the
URL.

Ben
 
C

comp.llang.perl.moderated

Quoth (e-mail address removed):




If you mean a proper HTTP redirect rather than an HTML meta-refresh or
something more evil in JavaScript,

$response->header('Location');

However, LWP::UserAgent will follow redirects by default, so unless
you've turned it off this won't help :(.
...

A possibly more convenient alternative to turning off redirects
entirely is LWP's simple_request which won't follow redirects:

my $resp = $ua->simple_request($request);
if ( $resp->code == 302 ) {
$uri = URI->new($resp->header('Location'));
...
 
T

Ted Zlatanov

PV> HTML::parser is "too big gun to small rabbit" :) For meta element
PV> base redirections is successful some like this

PV> # I precede that html page is in variable $content
PV> $content=~s/^.?(<meta\s+?HTTP-EQUIV=.REFRESH..+?>).+$/$1/si;
PV> $content=~s/^.+?url=(.+?)[\'\">]

PV> Now $content contain new URL.

This is like using garrote wire to catch and strangle the rabbit :)

Ted
 
B

Ben Morrow

Quoth "Petr Vileta said:
Sorry Ben, please do not kill me, but HTML::parser is "too big gun to small
rabbit" :)
For meta element base redirections is successful some like this

# I precede that html page is in variable $content

my $content = <<HTML;
<html>
<head>
<!-- <meta HTTP-EQUIV="REFRESH" url="some/fake/url"> -->
< META content=10;url=foo http-equiv=refresh>
</head>
<body>
Hello world!
</body>
$content=~s/^.?(<meta\s+?HTTP-EQUIV=.REFRESH..+?>).+$/$1/si;
$content=~s/^.+?url=(.+?)[\'\">]

This line is not valid Perl.
Now $content contain new URL.

No, it doesn't.

LWP::UserAgent will parse the <head> section of a text/html document for
you, and return the http-equiv headers in with the real HTTP headers.
For this purpose it uses HTML::HeadParser, which, guess what, is a
subclass of HTML::parser. This means that a refresh can be detected with

$response->header('refresh');

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top