Error downloading page, some pages work great but cant seem to get this one

Discussion in 'Perl' started by Jack Schafer, Apr 23, 2004.

  1. Jack Schafer

    Jack Schafer Guest

    I am trying to download the source code for an array of differant
    websites, usually i will get something like this from Dilbert.com:

    HTTP/1.1 200 OK
    Date: Fri, 23 Apr 2004 00:04:54 GMT
    Server: Apache/1.3.27 (Unix) Resin/2.1.s030505 mod_ssl/2.8.14
    OpenSSL/0.9.7b
    Last-Modified: Thu, 22 Apr 2004 07:05:10 GMT
    ETag: "182ba6-9d7b-40876ea6"
    Accept-Ranges: bytes
    Content-Length: 40315
    Connection: close
    Content-Type: text/html


    then the whole html page prints
    .....


    the problem occurs when i try the same thing on www.kingsofchaos.com i
    get the following header:

    HTTP/1.1 200 OK
    Date: Fri, 23 Apr 2004 00:16:49 GMT
    Server: Apache/1.3.29 (Unix) (Gentoo/Linux)
    Connection: close
    Content-Type: text/html

    with out the page attatched.
    I was wondering if you had any ideas on why i cant access the page,
    and any suggestions as to how i should do it. Right now i am using the
    following code:


    use IO::Socket::INET;
    my $host = $_[0];
    my $get = $_[1];
    my $port= 80;
    my $protocol = "tcp";
    my $socket;
    my @page;
    $socket = IO::Socket::INET->new(PeerAddr => $host, PeerPort => $port,
    Proto => $protocol) or die "Could not connect\n";
    #sends request
    $socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
    #recieve desired file
    @page=<$socket>;
    Jack Schafer, Apr 23, 2004
    #1
    1. Advertising

  2. Jack Schafer

    Joe Smith Guest

    Re: Error downloading page, some pages work great but cant seem toget this one

    Jack Schafer wrote:
    > the problem occurs when i try the same thing on www.kingsofchaos.com
    > $socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
    > @page=<$socket>;


    1) You're doing it the hard way. Use the LWP modules instead.
    2) Because of 1, you're not sending all of the HTTP headers the
    web server wants to see.

    According to the Web Scraping Proxy (http://www.research.att.com/~hpk/wsp/)
    you'll need to store and send cookies, and execute javascript.

    # Request: http://www.kingsofchaos.com/
    $request = new HTTP::Request('GET' => "http://www.kingsofchaos.com/");
    # Set-Cookie: koc_session=ea30aa58e36; path=/; domain=www.kingsofchaos.com
    # Set-Cookie: security_hash=323466; expires=Sun, 23-May-2004 08:17:26 GMT;
    path=/; domain=.kingsofchaos.com
    # Set-Cookie: cookie_hash=801f782dce8147; path=/

    3) Post to comp.lang.perl.misc (instead of comp.lang.perl) next time.
    -Joe
    Joe Smith, Apr 23, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ben
    Replies:
    11
    Views:
    559
    Jonathan N. Little
    Aug 27, 2005
  2. Luke -
    Replies:
    3
    Views:
    297
    sloan
    Oct 23, 2006
  3. Nagaraj
    Replies:
    1
    Views:
    854
    Lionel B
    Mar 1, 2007
  4. ThatsIT.net.au

    Cant seem to index aspx pages

    ThatsIT.net.au, May 13, 2007, in forum: ASP .Net
    Replies:
    0
    Views:
    350
    ThatsIT.net.au
    May 13, 2007
  5. Replies:
    10
    Views:
    260
Loading...

Share This Page