HTTP::Request, trailing slash

Sebastian Bauer · Jun 29, 2004

Hi,

I try to wirte a script that downloads images from a webpage. I use

my $uagent = LWP::UserAgent->new();
my $request = HTTP::Request->new(GET => "$url");
my $result = $uagent->request($request);

but the problem is that - even if $url does not contain a trailing slash -
some a slash is added at the end. This behaviour keeps me from downloading
the file because the download is refused this way. I think it must be this
slash, if i try to get the file via Mozilla (without the slash) it works,
with Mozilla with the slash it does not.

It would be great if anyone could help me out and tell me how to get rid of
the slash
Thx, Sebastian

Gunnar Hjalmarsson · Jun 30, 2004

Sebastian said:
I try to wirte a script that downloads images from a webpage. I use

my $uagent = LWP::UserAgent->new();
my $request = HTTP::Request->new(GET => "$url");
my $result = $uagent->request($request);

but the problem is that - even if $url does not contain a trailing
slash - some a slash is added at the end.

Where/when is the slash appended? Isn't $url what it is, and if it
contains a valid URL, the request is successful?

To make it easier to understand what you mean, please post a
*complete* program with a couple of valid URLs that your code fails to
get.

Sebastian Bauer · Jun 30, 2004

Thx a lot for answering, here comes the program that fails.
It extracts the url of an image out of a webpage and then should download
this file. If you try to load the $img_url in mozilla it works. If you
append a slash or open the url in konqueror it fails the same way the
script does...

#!/usr/bin/perl

use strict;
use warnings;
use LWP::UserAgent;

my $taz_url = "http://www.taz.de";
my $filename = "tom.gif";

my $uagent = LWP::UserAgent->new();
my $request = HTTP::Request->new(GET => $taz_url
.."/pt/2004/06/30.nf/tomnf");

my $result = $uagent->request($request);
my $img_url;

if($result->content() =~
/<img src="(.*)" alt="TOM">\s+<br \/><b>Tom Touché vom/)
{
$img_url = $1;
} else {
die "url of todays image cannot be determined\n";
}

print "${taz_url}${img_url}\n";

$request = HTTP::Request->new(GET => "${taz_url}${img_url}");
$result = $uagent->simple_request($request,$filename);
if($result->is_success) {
print "todays image stored in $filename\n";
} else {
die "could not store todays image\n";
}

Thx for your help

Gunnar Hjalmarsson · Jun 30, 2004

Sebastian said:
Thx a lot for answering, here comes the program that fails. It
extracts the url of an image out of a webpage and then should
download this file. If you try to load the $img_url in mozilla it
works.

$img_url is assigned the absolute URL
'/pt/.nf/gif.t,tom.d,1088589600', and that's not enough for any
browser to find the image. I don't understand what you mean by that.

But your script concatenates $taz_url and $img_url to
'http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600'
which seems to be a valid URL to a (copyright protected) image.

If you append a slash or open the url in konqueror it fails the
same way the script does...

The script you posted does not fail for me. It prints "todays image
stored in tom.gif", and no slash is appended.

Sorry, but I still don't understand what the problem is.

<program snipped>

Sebastian Bauer · Jun 30, 2004

the script downloads something, but if you make a

less tom.gif

you'll see that it did not download an image but plain text. If you follow
the link http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600 in mozilla you'll
get an image. if you follow
http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600/
you'll get the same text as the tom.gif file contains. That's the reason why
i thought that there might an additional slash

which seems to be a valid URL to a (copyright protected) image.

this is just for personal use (i collect those images and its hard work to
to it manually)

thx sebastian

Gunnar Hjalmarsson · Jun 30, 2004

Sebastian said:
the script downloads something, but if you make a

less tom.gif

you'll see that it did not download an image but plain text.

Aha, I see that now. Actually it downloads an HTML error page.

If you follow the link
http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600 in mozilla you'll
get an image. if you follow
http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600/ you'll get the
same text as the tom.gif file contains. That's the reason why i
thought that there might an additional slash

I see. Well, that error page is returned whichever incorrect URL you
are using, so why would it be caused by an appended slash?

The URL is not exactly the standard kind of URL. Maybe its special
nature makes LWP misinterpret it in some way. Maybe the site owner has
taken actions to prevent that people do what you are trying to do (you
can't view the image directly any longer, with or without the slash,
so I'd guess that the latter is the case).

Sebastian Bauer · Jun 30, 2004

i would accept that they dont allow me to download the image via a script,
but what drives me crazy is that i can obviously get the image via mozilla
but NO other way. I tried telnet 5 mins ago -> didnt work. So it seems i
will need to download the images by myself or i get it working any other
way.
Thx a lot, if you have any ideas please let me know ;-)
Sebastian

gnari · Jun 30, 2004

Sebastian Bauer said:
i would accept that they dont allow me to download the image via a script,
but what drives me crazy is that i can obviously get the image via mozilla
but NO other way. I tried telnet 5 mins ago -> didnt work. So it seems i
will need to download the images by myself or i get it working any other
way.
Thx a lot, if you have any ideas please let me know ;-)

sessions, referer, cookies, useragent ...

gnari

Gisle Aas · Jul 1, 2004

gnari said:
sessions, referer, cookies, useragent ...

They check the 'referer'. This program works for me:

#!/usr/bin/perl

use strict;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new(keep_alive => 1);
my $res = $ua->get("http://www.taz.de/pt/.nf/gif.t,tom.d,1088676000",
referer => "http://www.taz.de/pt/2004/07/01.nf/tomnf");
print $res->as_string;

Sebastian Bauer · Jul 1, 2004

Gisle said:
They check the 'referer'. This program works for me:

#!/usr/bin/perl

use strict;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new(keep_alive => 1);
my $res = $ua->get("http://www.taz.de/pt/.nf/gif.t,tom.d,1088676000",
referer => "http://www.taz.de/pt/2004/07/01.nf/tomnf");
print $res->as_string;

Thx a lot, this one worked, i found out the same today but thank you for the
solution

Sebastian

Changing .html in URL	3	Jul 11, 2022
Axios 403 error when sending get request	3	Jul 4, 2023
https request failing	2	Sep 18, 2012
problems assembling POST HTTP::Request	0	Sep 22, 2010
How to save lwp::useragent state?	1	Apr 28, 2004
need help with HTTP::Request::Common	5	Mar 12, 2005
HTTP::Request::Common Post problem	1	Nov 6, 2006
form post URL encoded	4	Jun 26, 2013

HTTP::Request, trailing slash

Sebastian Bauer

Gunnar Hjalmarsson

Sebastian Bauer

Gunnar Hjalmarsson

Sebastian Bauer

Gunnar Hjalmarsson

Sebastian Bauer

gnari

Gisle Aas

Sebastian Bauer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads