How to make a Perl program do concurrent downloading?

Discussion in 'Perl Misc' started by Adlene, May 1, 2004.

  1. Adlene

    Adlene Guest

    Hi, there:

    I wrote a program to download 500,000 HTML files from a website, I
    have compiled all the links in a file. my grabber.pl will download all of
    them...

    I have a fast internet connection. I think it is better to run multiple
    downloads at
    same time, but $INET = new Win32::Internet() only allows one at a
    time...what
    may I do?

    I also found, occassionally the grabber just hang somewhere...In such
    situation I
    need to bypass $INET->FetchURL($url), write the offending URL in an error
    file
    and continue on to next iteration...How may I do that?

    Best Regards,
    Adlene
    Adlene, May 1, 2004
    #1
    1. Advertising

  2. "Adlene" <> wrote in message news:<c6vvmn$ck4$>...
    > Hi, there:
    >
    > I wrote a program to download 500,000 HTML files from a website, I
    > have compiled all the links in a file. my grabber.pl will download all of
    > them...


    Depending on who owns the Internet site, they may find it rude that
    you want to dowload so many files and that you may want to take as
    much resources as possible from their web server. Perhaps you should
    find a different way of retrieving the data, such as contacting the
    web site administrator and tell them what you want to do, they may
    give you a tar gzipped file of the site??


    >
    > I have a fast internet connection. I think it is better to run multiple
    > downloads at


    It may be better for you, but that is questionable for everyone else.

    Here is some information on web robots. You might want to do some
    more searching though on web robots.

    http://www.phantomsearch.com/usersguide/R04Robot.htm

    <from the above URL>

    The Four Laws of Web Robotics
    Law One: A Web Robot Must Show Identification
    Phantom supports this. You can set the "User-Agent" and "From E-Mail"
    fields in the preferences dialog. Both of these are reported in the
    HTTP header when Phantom makes requests of remote Web servers.

    Law Two: A Web Robot Must Obey Exclusion Standard
    Phantom fully supports the exclusion standard.

    Law Three: A Web Robot Must Not Hog Resources
    Phantom only retrieves files it can index (unless mirroring with
    binaries option on) and restricts its movement to the path specified
    by starting point s. You can also set the minimum time between hits on
    the same server. Generally, 60 seconds is considered polite.

    For busy sites a greater hit rate may be acceptable, but do not assume
    whether a site is "busy" or notÑ contact the webmaster first. When
    crawling your own server, of course, you can set the hit interval to
    anything you like, including zero.

    Law Four: A Web Robot Must Report Errors
    Phantom can show you links that are no longer valid. Please contact
    the Webmaster and pass this information on if broken URLs are found.



    > same time, but $INET = new Win32::Internet() only allows one at a
    > time...what
    > may I do?
    >
    > I also found, occassionally the grabber just hang somewhere...In such
    > situation I
    > need to bypass $INET->FetchURL($url), write the offending URL in an error
    > file
    > and continue on to next iteration...How may I do that?
    >
    > Best Regards,
    > Adlene
    Bryan Castillo, May 2, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bienwell
    Replies:
    4
    Views:
    3,609
    Steve C. Orr [MVP, MCSD]
    Jun 3, 2005
  2. Pep
    Replies:
    6
    Views:
    818
  3. Abby
    Replies:
    1
    Views:
    492
    Jack Klein
    Aug 29, 2003
  4. www
    Replies:
    13
    Views:
    618
    Patricia Shanahan
    Feb 7, 2007
  5. Daniel Pfeiffer

    GNU make & make.pl are dead: long live Perl makepp

    Daniel Pfeiffer, Sep 8, 2003, in forum: Perl Misc
    Replies:
    1
    Views:
    235
    Daniel Pfeiffer
    Sep 9, 2003
Loading...

Share This Page