Looking for Java web crawler api

Discussion in 'Java' started by pm, Jul 12, 2011.

  1. pm

    pm Guest

    Hello, I am working on a project that requires me to do custom search on
    different websites. I am using Java and while I can write this from
    ground up, I am looking at using existing APIs that can be used due to
    time limit. So far I have came across Apache's HttpClient.
    I am wondering if there are any others that can be effective or
    give more options for web searching/scraping. I plan to create a GUI
    based application and need something quick and effective while not being
    too complex.
    I appreciate any feedback.
     
    pm, Jul 12, 2011
    #1
    1. Advertising

  2. I found JSoup (jsoup.org) to be a fine library for web scraping. It
    lets you easily set cookies and headers, fetches the URL for you, and
    converts the tangled mess of HTML you tend to receive into a
    well-formed XML document model.

    Cheers,
    Bent D.
    --
    Bent Dalager - - http://www.pvv.org/~bcd
    powered by emacs
     
    Bent C Dalager, Jul 12, 2011
    #2
    1. Advertising

  3. pm

    Durango2011 Guest

    On Tue, 12 Jul 2011 09:44:38 +0000, Bent C Dalager wrote:

    > I found JSoup (jsoup.org) to be a fine library for web scraping. It lets
    > you easily set cookies and headers, fetches the URL for you, and
    > converts the tangled mess of HTML you tend to receive into a well-formed
    > XML document model.
    >
    > Cheers,
    > Bent D.


    Thank you very much that looks like what I am looking for.
     
    Durango2011, Jul 13, 2011
    #3
  4. pm

    Roedy Green Guest

    On 12 Jul 2011 07:14:45 GMT, pm <0m> wrote, quoted
    or indirectly quoted someone who said :

    > I am wondering if there are any others that can be effective or
    >give more options for web searching/scraping. I plan to create a GUI
    >based application and need something quick and effective while not being
    >too complex.


    If you want something very simple, see
    http://mindprod.com/products1.html#HTTP

    see http://mindprod.com/jgloss/screenscraping.html
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    One thing I love about having a website, is that when I complain about
    something, I only have to do it once. It saves me endless hours of grumbling.
     
    Roedy Green, Jul 14, 2011
    #4
  5. pm

    iadb Guest

    On Jul 12, 3:14 am, pm <0m> wrote:
    > Hello, I am working on a project that requires me to do custom search on
    > different websites.  I am using Java and while I can write this from
    > ground up, I am looking at using existing APIs that can be used due to
    > time limit.  So far I have came across Apache's HttpClient.  
    >         I am wondering if there are any others that can be effective or
    > give more options for web searching/scraping. I plan to create a GUI
    > based application and need something quick and effective while not being
    > too complex.
    > I appreciate any feedback.


    Look at the attached example, it works fine with little
    customization..
    http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/


    http://www.internetarticlesdb.com
     
    iadb, Jul 19, 2011
    #5
  6. pm

    Durango2011 Guest

    On Tue, 12 Jul 2011 07:14:45 +0000, pm wrote:


    Thanks for all the great feedback :)
     
    Durango2011, Jul 21, 2011
    #6
  7. On 7/12/2011 3:14 AM, pm wrote:
    > Hello, I am working on a project that requires me to do custom search on
    > different websites. I am using Java and while I can write this from
    > ground up, I am looking at using existing APIs that can be used due to
    > time limit. So far I have came across Apache's HttpClient.
    > I am wondering if there are any others that can be effective or
    > give more options for web searching/scraping. I plan to create a GUI
    > based application and need something quick and effective while not being
    > too complex.


    http://nutch.apache.org/ should contain a crawler and it comes with
    a searchable database (Lucene).

    Arne
     
    Arne Vajhøj, Jul 21, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Morrison

    Web Crawler

    Paul Morrison, Oct 17, 2005, in forum: Java
    Replies:
    3
    Views:
    4,938
    lamantpirate
    Jun 30, 2012
  2. Sanjay Patra

    Web Crawler

    Sanjay Patra, Nov 17, 2004, in forum: C++
    Replies:
    2
    Views:
    758
  3. abhinav

    web crawler in python or C?

    abhinav, Feb 16, 2006, in forum: Python
    Replies:
    13
    Views:
    1,292
  4. Sanjay Patra

    C Web crawler code

    Sanjay Patra, Nov 18, 2004, in forum: C Programming
    Replies:
    1
    Views:
    1,534
    Raymond Martineau
    Nov 18, 2004
  5. Kev
    Replies:
    6
    Views:
    196
    James Britt
    Feb 2, 2006
Loading...

Share This Page