HTTP::Request, trailing slash

Discussion in 'Perl Misc' started by Sebastian Bauer, Jun 29, 2004.

  1. Hi,

    I try to wirte a script that downloads images from a webpage. I use

    my $uagent  = LWP::UserAgent->new();
    my $request = HTTP::Request->new(GET => "$url");
    my $result  = $uagent->request($request);

    but the problem is that - even if $url does not contain a trailing slash -
    some a slash is added at the end. This behaviour keeps me from downloading
    the file because the download is refused this way. I think it must be this
    slash, if i try to get the file via Mozilla (without the slash) it works,
    with Mozilla with the slash it does not.

    It would be great if anyone could help me out and tell me how to get rid of
    the slash
    Thx, Sebastian
     
    Sebastian Bauer, Jun 29, 2004
    #1
    1. Advertising

  2. Sebastian Bauer wrote:
    > I try to wirte a script that downloads images from a webpage. I use
    >
    > my $uagent = LWP::UserAgent->new();
    > my $request = HTTP::Request->new(GET => "$url");
    > my $result = $uagent->request($request);
    >
    > but the problem is that - even if $url does not contain a trailing
    > slash - some a slash is added at the end.


    Where/when is the slash appended? Isn't $url what it is, and if it
    contains a valid URL, the request is successful?

    To make it easier to understand what you mean, please post a
    *complete* program with a couple of valid URLs that your code fails to
    get.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 30, 2004
    #2
    1. Advertising

  3. Thx a lot for answering, here comes the program that fails.
    It extracts the url of an image out of a webpage and then should download
    this file. If you try to load the $img_url in mozilla it works. If you
    append a slash or open the url in konqueror it fails the same way the
    script does...

    #!/usr/bin/perl

    use strict;
    use warnings;
    use LWP::UserAgent;

    my $taz_url = "http://www.taz.de";
    my $filename = "tom.gif";

    my $uagent = LWP::UserAgent->new();
    my $request = HTTP::Request->new(GET => $taz_url
    .."/pt/2004/06/30.nf/tomnf");

    my $result = $uagent->request($request);
    my $img_url;


    if($result->content() =~
    /<img src="(.*)" alt="TOM">\s+<br \/><b>Tom Touch&eacute; vom/)
    {
    $img_url = $1;
    } else {
    die "url of todays image cannot be determined\n";
    }

    print "${taz_url}${img_url}\n";

    $request = HTTP::Request->new(GET => "${taz_url}${img_url}");
    $result = $uagent->simple_request($request,$filename);
    if($result->is_success) {
    print "todays image stored in $filename\n";
    } else {
    die "could not store todays image\n";
    }


    Thx for your help
     
    Sebastian Bauer, Jun 30, 2004
    #3
  4. Sebastian Bauer wrote:
    > Thx a lot for answering, here comes the program that fails. It
    > extracts the url of an image out of a webpage and then should
    > download this file. If you try to load the $img_url in mozilla it
    > works.


    $img_url is assigned the absolute URL
    '/pt/.nf/gif.t,tom.d,1088589600', and that's not enough for any
    browser to find the image. I don't understand what you mean by that.

    But your script concatenates $taz_url and $img_url to
    'http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600'
    which seems to be a valid URL to a (copyright protected) image.

    > If you append a slash or open the url in konqueror it fails the
    > same way the script does...


    The script you posted does not fail for me. It prints "todays image
    stored in tom.gif", and no slash is appended.

    Sorry, but I still don't understand what the problem is.

    <program snipped>

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 30, 2004
    #4
  5. the script downloads something, but if you make a

    less tom.gif

    you'll see that it did not download an image but plain text. If you follow
    the link http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600 in mozilla you'll
    get an image. if you follow
    http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600/
    you'll get the same text as the tom.gif file contains. That's the reason why
    i thought that there might an additional slash

    >which seems to be a valid URL to a (copyright protected) image.

    this is just for personal use (i collect those images and its hard work to
    to it manually)

    thx sebastian
     
    Sebastian Bauer, Jun 30, 2004
    #5
  6. Sebastian Bauer wrote:
    > the script downloads something, but if you make a
    >
    > less tom.gif
    >
    > you'll see that it did not download an image but plain text.


    Aha, I see that now. Actually it downloads an HTML error page.

    > If you follow the link
    > http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600 in mozilla you'll
    > get an image. if you follow
    > http://www.taz.de/pt/.nf/gif.t,tom.d,1088589600/ you'll get the
    > same text as the tom.gif file contains. That's the reason why i
    > thought that there might an additional slash


    I see. Well, that error page is returned whichever incorrect URL you
    are using, so why would it be caused by an appended slash?

    The URL is not exactly the standard kind of URL. Maybe its special
    nature makes LWP misinterpret it in some way. Maybe the site owner has
    taken actions to prevent that people do what you are trying to do (you
    can't view the image directly any longer, with or without the slash,
    so I'd guess that the latter is the case).

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 30, 2004
    #6
  7. i would accept that they dont allow me to download the image via a script,
    but what drives me crazy is that i can obviously get the image via mozilla
    but NO other way. I tried telnet 5 mins ago -> didnt work. So it seems i
    will need to download the images by myself or i get it working any other
    way.
    Thx a lot, if you have any ideas please let me know ;-)
    Sebastian
     
    Sebastian Bauer, Jun 30, 2004
    #7
  8. Sebastian Bauer

    gnari Guest

    "Sebastian Bauer" <> wrote in message
    news:cbv80l$a8e$01$-online.com...
    > i would accept that they dont allow me to download the image via a script,
    > but what drives me crazy is that i can obviously get the image via mozilla
    > but NO other way. I tried telnet 5 mins ago -> didnt work. So it seems i
    > will need to download the images by myself or i get it working any other
    > way.
    > Thx a lot, if you have any ideas please let me know ;-)


    sessions, referer, cookies, useragent ...

    gnari
     
    gnari, Jun 30, 2004
    #8
  9. Sebastian Bauer

    Gisle Aas Guest

    "gnari" <> writes:

    > "Sebastian Bauer" <> wrote in message
    > news:cbv80l$a8e$01$-online.com...
    > > i would accept that they dont allow me to download the image via a script,
    > > but what drives me crazy is that i can obviously get the image via mozilla
    > > but NO other way. I tried telnet 5 mins ago -> didnt work. So it seems i
    > > will need to download the images by myself or i get it working any other
    > > way.
    > > Thx a lot, if you have any ideas please let me know ;-)

    >
    > sessions, referer, cookies, useragent ...


    They check the 'referer'. This program works for me:

    #!/usr/bin/perl

    use strict;
    use LWP::UserAgent;

    my $ua = LWP::UserAgent->new(keep_alive => 1);
    my $res = $ua->get("http://www.taz.de/pt/.nf/gif.t,tom.d,1088676000",
    referer => "http://www.taz.de/pt/2004/07/01.nf/tomnf");
    print $res->as_string;
     
    Gisle Aas, Jul 1, 2004
    #9
  10. Gisle Aas wrote:

    > "gnari" <> writes:
    >
    >> "Sebastian Bauer" <> wrote in message
    >> news:cbv80l$a8e$01$-online.com...
    >> > i would accept that they dont allow me to download the image via a
    >> > script, but what drives me crazy is that i can obviously get the image
    >> > via mozilla but NO other way. I tried telnet 5 mins ago -> didnt work.
    >> > So it seems i will need to download the images by myself or i get it
    >> > working any other way.
    >> > Thx a lot, if you have any ideas please let me know ;-)

    >>
    >> sessions, referer, cookies, useragent ...

    >
    > They check the 'referer'. This program works for me:
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use LWP::UserAgent;
    >
    > my $ua = LWP::UserAgent->new(keep_alive => 1);
    > my $res = $ua->get("http://www.taz.de/pt/.nf/gif.t,tom.d,1088676000",
    > referer => "http://www.taz.de/pt/2004/07/01.nf/tomnf");
    > print $res->as_string;


    Thx a lot, this one worked, i found out the same today but thank you for the
    solution :)
    Sebastian
     
    Sebastian Bauer, Jul 1, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    321
    Max Erickson
    Nov 10, 2006
  2. abcd
    Replies:
    7
    Views:
    355
    Ben Finney
    Mar 15, 2007
  3. Harlan Messinger

    ASP.NET inserts trailing slash in tag

    Harlan Messinger, Feb 18, 2008, in forum: ASP .Net
    Replies:
    4
    Views:
    480
    Harlan Messinger
    Feb 19, 2008
  4. Jeff Mitchell

    trailing slash issue in Find.find

    Jeff Mitchell, Aug 24, 2003, in forum: Ruby
    Replies:
    0
    Views:
    112
    Jeff Mitchell
    Aug 24, 2003
  5. Stan Brown
    Replies:
    6
    Views:
    416
    Stan Brown
    Oct 29, 2003
Loading...

Share This Page