JSP or httpservlet for Java spider?

Discussion in 'Java' started by Greg Peters, Dec 24, 2005.

  1. Greg Peters

    Greg Peters Guest

    Hi. I want to spider just a few websites, not the entire site, just 1 or 2
    levels deep. So can I use JSP or httpservlets for this? Does anyone know of
    some tutorial/code/book that explains this? I usually use JSP and
    httpservlets for processing requests, but I want to get the data from a
    different website.

    Or do I have to spider using perl, then store it in a database and retrieve
    it using JSP/httpservlets? Thank you.
     
    Greg Peters, Dec 24, 2005
    #1
    1. Advertising

  2. Greg Peters

    Roedy Green Guest

    On 24 Dec 2005 05:25:54 GMT, Greg Peters <> wrote,
    quoted or indirectly quoted someone who said :

    >Hi. I want to spider just a few websites, not the entire site, just 1 or 2
    >levels deep. So can I use JSP or httpservlets for this? Does anyone know of
    >some tutorial/code/book that explains this? I usually use JSP and
    >httpservlets for processing requests, but I want to get the data from a
    >different website.


    see http://mindprod.com/applets/fileio.htm
    for how to do GET.

    Then you have to find the links to spider e.g.

    with pattern
    <a href="xxxx"

    you can crudely use indexOf "<a href="
    or you can use a regex if you want to catch squirrelly stuff like
    extra spaces or parms.

    See http://mindprod.com/jgloss/regex.html

    You add the links to a queue of links to be spidered.
    See http://mindprod.com/queue.html

    Then you spawn up to N threads that grab the next queue items and
    spider it.

    See http://mindprod.com/projects/htmlbrokenlink.html
    for more details.

    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Dec 24, 2005
    #2
    1. Advertising

  3. Greg Peters wrote:
    > Hi. I want to spider just a few websites, not the entire site, just 1 or 2
    > levels deep. So can I use JSP or httpservlets for this? Does anyone know of
    > some tutorial/code/book that explains this? I usually use JSP and
    > httpservlets for processing requests, but I want to get the data from a
    > different website.
    >
    > Or do I have to spider using perl, then store it in a database and retrieve
    > it using JSP/httpservlets? Thank you.


    JSP and servlets are mechanisms for generating dynamic responses to HTTP
    requests. They are most often used for serving HTML pages. They have
    no special mechanism beyond any other Java code for making
    general-purpose HTTP requests are doing anything with the results of
    such a request.

    Even though JSP and servlets specifically would be inappropriate choices
    for a web spider, that does not mean that Java in general is wrong for
    the task. To the contrary, the Java platform library has good support
    for a wide variety of network- and web-oriented tasks, and there are a
    multitude of 3rd party libraries that build further on that foundation.
    Look at the URL, URLConnection, and HttpURLConnection classes in the
    java.net package to start, and perhaps at DOM (package org.w3c.dom) for
    document analysis. You might also find the Jakarta HTTP Client library
    useful: http://jakarta.apache.org/commons/httpclient/ There are many
    other resources available.

    As for displaying pages previously retrieved by your spider, chances are
    that a fairly simple servlet could handle the job admirably. There
    might be reasons to do it with JSP / custom tags instead, but that
    approach wouldn't be my first inclination.


    --
    John Bollinger
     
    John C. Bollinger, Dec 26, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Laurent Beaubier \(free.fr\)

    HttpServlet implementation in Tomcat 4.1

    Laurent Beaubier \(free.fr\), Jul 3, 2003, in forum: Java
    Replies:
    1
    Views:
    773
    Sudsy
    Jul 3, 2003
  2. brijesh

    HTTPSERVLET

    brijesh, Jul 31, 2003, in forum: Java
    Replies:
    1
    Views:
    3,342
    Kim Andreassen
    Jul 31, 2003
  3. Raoul Markus

    parameter in HttpServlet.doGet

    Raoul Markus, Aug 21, 2003, in forum: Java
    Replies:
    0
    Views:
    685
    Raoul Markus
    Aug 21, 2003
  4. Robert Maas, see http://tinyurl.com/uh3t

    Question about invoking HttpServlet on Linux, by URL or via JSP

    Robert Maas, see http://tinyurl.com/uh3t, May 11, 2005, in forum: Java
    Replies:
    3
    Views:
    432
    John Bailo
    May 12, 2005
  5. Replies:
    0
    Views:
    390
Loading...

Share This Page