web crawling for books

alexxx.magni · Nov 25, 2007

I have a large list of my library's books,
and I would like to setup a Perl spider, going on the web for each
author/title information, and returning useful info I didnt put into
the records (editor, year, topic, isbn, ...).
I already wrote down the basic spider's structure, but I'm not sure
which site is more apt to such a search (considering also that its
robots.txt should allow me access).
Which site would you suggest for such a task?

Thank you!

Alessandro Magni

Spiros Denaxas · Nov 25, 2007

I have a large list of my library's books,
and I would like to setup a Perl spider, going on the web for each
author/title information, and returning useful info I didnt put into
the records (editor, year, topic, isbn, ...).
I already wrote down the basic spider's structure, but I'm not sure
which site is more apt to such a search (considering also that its
robots.txt should allow me access).
Which site would you suggest for such a task?

Thank you!

Alessandro Magni

Hi,

speaking from experience, I think you will be able to obtain higher
quality results which are more relevant using API's instead of just
scraping sites. For example, check out the Amazon Web Services API at
http://www.amazon.com/AWS-home-page-Money/b?ie=UTF8&node=3435361
You could also potentially use http://books.google.com/.

Spiros

Adam Funk · Nov 28, 2007

I have a large list of my library's books,
and I would like to setup a Perl spider, going on the web for each
author/title information, and returning useful info I didnt put into
the records (editor, year, topic, isbn, ...).
I already wrote down the basic spider's structure, but I'm not sure
which site is more apt to such a search (considering also that its
robots.txt should allow me access).
Which site would you suggest for such a task?

You might want to look at Alexandria, which already does quite a bit
of this. It's written in Ruby, but the source code might give you
some ideas.

http://alexandria.rubyforge.org/

Call for Papers Reminder (extended): IAENG International Conferenceon Internet Computing and Web Ser	0	Dec 16, 2011
comp.lang.vhdl FAQ part 2 of 4: books	0	Jul 8, 2003
Announce SiSU - publishing for e-documents, books, libraries, relational databases	1	Jan 4, 2005
Call for Papers Reminder (extended): International Conference on Internet Computing and Web Services	0	Dec 17, 2010
Call for Papers: IAENG International Conference on SoftwareEngineering ICSE 2014	0	Nov 26, 2013
Last Call for Papers Reminder (extended): International Conference on Internet Computing and Web Ser	0	Jan 4, 2011
Call for Papers: The 2013 IAENG International Conference on Communication Systems and Applications (	0	Nov 5, 2012
Call for Papers Reminder (extended): The 2013 IAENG International Conference on Internet Computing a	0	Dec 20, 2012

web crawling for books

alexxx.magni

Spiros Denaxas

Adam Funk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads