web crawling for books

Discussion in 'Perl Misc' started by alexxx.magni@gmail.com, Nov 25, 2007.

  1. Guest

    I have a large list of my library's books,
    and I would like to setup a Perl spider, going on the web for each
    author/title information, and returning useful info I didnt put into
    the records (editor, year, topic, isbn, ...).
    I already wrote down the basic spider's structure, but I'm not sure
    which site is more apt to such a search (considering also that its
    robots.txt should allow me access).
    Which site would you suggest for such a task?

    Thank you!


    Alessandro Magni
     
    , Nov 25, 2007
    #1
    1. Advertising

  2. On Nov 25, 9:58 am, "" <>
    wrote:
    > I have a large list of my library's books,
    > and I would like to setup a Perl spider, going on the web for each
    > author/title information, and returning useful info I didnt put into
    > the records (editor, year, topic, isbn, ...).
    > I already wrote down the basic spider's structure, but I'm not sure
    > which site is more apt to such a search (considering also that its
    > robots.txt should allow me access).
    > Which site would you suggest for such a task?
    >
    > Thank you!
    >
    > Alessandro Magni


    Hi,

    speaking from experience, I think you will be able to obtain higher
    quality results which are more relevant using API's instead of just
    scraping sites. For example, check out the Amazon Web Services API at
    http://www.amazon.com/AWS-home-page-Money/b?ie=UTF8&node=3435361
    You could also potentially use http://books.google.com/.

    Spiros
     
    Spiros Denaxas, Nov 25, 2007
    #2
    1. Advertising

  3. Adam Funk Guest

    On 2007-11-25, wrote:

    > I have a large list of my library's books,
    > and I would like to setup a Perl spider, going on the web for each
    > author/title information, and returning useful info I didnt put into
    > the records (editor, year, topic, isbn, ...).
    > I already wrote down the basic spider's structure, but I'm not sure
    > which site is more apt to such a search (considering also that its
    > robots.txt should allow me access).
    > Which site would you suggest for such a task?


    You might want to look at Alexandria, which already does quite a bit
    of this. It's written in Ruby, but the source code might give you
    some ideas.

    http://alexandria.rubyforge.org/
     
    Adam Funk, Nov 28, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HDL Book Seller
    Replies:
    0
    Views:
    951
    HDL Book Seller
    Dec 1, 2004
  2. Guest

    Books, Books, Books...

    Guest, Sep 19, 2004, in forum: C++
    Replies:
    3
    Views:
    554
    ÁÍÄÑÅÁÓ ÔÁÓÏÕËÁÓ
    Sep 19, 2004
  3. John Bradbury

    Web-crawling

    John Bradbury, Oct 4, 2003, in forum: Python
    Replies:
    4
    Views:
    428
    John J. Lee
    Oct 4, 2003
  4. S Borg

    web crawling.

    S Borg, Jan 19, 2006, in forum: Python
    Replies:
    4
    Views:
    438
    John M. Gabriele
    Jan 20, 2006
  5. Remarkable
    Replies:
    1
    Views:
    324
Loading...

Share This Page