Web scrapping

Discussion in 'Java' started by raybonds@gmail.com, May 3, 2007.

  1. Guest

    I am trying to extract data from a website and store it. Would
    someone pose different ways to approach this problem or even
    literature that I could read to help?
     
    , May 3, 2007
    #1
    1. Advertising

  2. Lulu58e2 Guest

    On May 3, 7:05 am, wrote:
    > I am trying to extract data from a website and store it. Would
    > someone pose different ways to approach this problem or even
    > literature that I could read to help?


    This is pretty quick in Groovy using the following:

    def parser = new org.cyberneko.html.parsers.SAXParser()
    parser.setFeature('http://xml.org/sax/features/namespaces', false)
    def HTML = new XmlSlurper(parser).parse('http://www.somepage.html')
    HTML.BODY.DIV[2].P[4].LI[2].TABLE[0].TR.each() { /* do something
    */ } // as an example

    C>
     
    Lulu58e2, May 3, 2007
    #2
    1. Advertising

  3. wrote:
    > I am trying to extract data from a website and store it. Would
    > someone pose different ways to approach this problem or even
    > literature that I could read to help?

    Linux has the command-line-tool "wget" for downloading web-sites.
    See http://www.google.com/search?q=wget

    --
    Thomas
     
    Thomas Fritsch, May 3, 2007
    #3
  4. burped up warm pablum in
    news::

    > I am trying to extract data from a website and store it. Would
    > someone pose different ways to approach this problem or even
    > literature that I could read to help?


    Here's the info from a spider I have used a few times:

    /**
    * That class implements a reusable spider. To use this
    * class you must have a class setup to recieve
    * the information found by the spider. This class must
    * implement the ISpiderReportable method. Written by
    * Jeff Heaton. Jeff Heaton is the author of "Programming
    * Spiders, Bots, and Aggregators" by Sybex. Jeff can be
    * contacted through his web site at http://www.jeffheaton.com.
    *
    * @author Jeff Heaton(http://www.jeffheaton.com)
    * @version 1.0
    */


    --
    Tris Orendorff
    [Q: What kind of modem did Jimi Hendrix use?
    A: A purple Hayes.]
     
    Tris Orendorff, May 3, 2007
    #4
  5. Ian Wilson Guest

    wrote:
    > I am trying to extract data from a website and store it. Would
    > someone pose different ways to approach this problem or even
    > literature that I could read to help?
    >


    1. Use the site's API or RSS instead. If available.
    2. Check the site's terms and conditions of use.
     
    Ian Wilson, May 4, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fran Cottone

    Tomcat 5.5 and web scrapping

    Fran Cottone, Mar 22, 2005, in forum: Java
    Replies:
    2
    Views:
    1,925
    Simon Shearn
    Mar 24, 2005
  2. james
    Replies:
    9
    Views:
    504
    james
    Sep 20, 2005
  3. Horacio Sanson

    Mechanize for BIG website scrapping...

    Horacio Sanson, Sep 21, 2006, in forum: Ruby
    Replies:
    2
    Views:
    131
    Horacio Sanson
    Sep 21, 2006
  4. Vikash Kumar
    Replies:
    8
    Views:
    107
  5. Deepanshu

    Page Scrapping with sessions

    Deepanshu, Feb 12, 2008, in forum: Ruby
    Replies:
    1
    Views:
    102
    7stud --
    Feb 12, 2008
Loading...

Share This Page