Web scrapping

R

raybonds

I am trying to extract data from a website and store it. Would
someone pose different ways to approach this problem or even
literature that I could read to help?
 
L

Lulu58e2

I am trying to extract data from a website and store it. Would
someone pose different ways to approach this problem or even
literature that I could read to help?

This is pretty quick in Groovy using the following:

def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)
def HTML = new XmlSlurper(parser).parse('http://www.somepage.html')
HTML.BODY.DIV[2].P[4].LI[2].TABLE[0].TR.each() { /* do something
*/ } // as an example

C>
 
T

Tris Orendorff

(e-mail address removed) burped up warm pablum in
I am trying to extract data from a website and store it. Would
someone pose different ways to approach this problem or even
literature that I could read to help?

Here's the info from a spider I have used a few times:

/**
* That class implements a reusable spider. To use this
* class you must have a class setup to recieve
* the information found by the spider. This class must
* implement the ISpiderReportable method. Written by
* Jeff Heaton. Jeff Heaton is the author of "Programming
* Spiders, Bots, and Aggregators" by Sybex. Jeff can be
* contacted through his web site at http://www.jeffheaton.com.
*
* @author Jeff Heaton(http://www.jeffheaton.com)
* @version 1.0
*/
 
I

Ian Wilson

I am trying to extract data from a website and store it. Would
someone pose different ways to approach this problem or even
literature that I could read to help?

1. Use the site's API or RSS instead. If available.
2. Check the site's terms and conditions of use.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top