HTML parsing using Java and Xerces

C

Camk

Hey, Is it possible to do the following?

1. Enter a search term in ask.com (Manually) and hit search
2. Once the Result page is shown, view the source file and save it to
the hard disk (Manually)
3. Use a Java program with the HTML parser embedded to extract the
returned URLs
4. Once the URLs are returned, they are to be automatically stored in
a MYSQL database.
The database has a Single table with the following columns:
Query - Stores a string of the search query used.
SearchEngine - Stores a string of the search engine (e.g. Ask)
ReturnedURL - Stores a string of the returned URL (this is got from
the parsed source sheet)
URLNo - Stores an int the position of the Returned URL (i.e. the first
URL is number 1 and so on)
 
C

Chris

Camk said:
Hey, Is it possible to do the following?

1. Enter a search term in ask.com (Manually) and hit search
2. Once the Result page is shown, view the source file and save it to
the hard disk (Manually)
3. Use a Java program with the HTML parser embedded to extract the
returned URLs
4. Once the URLs are returned, they are to be automatically stored in
a MYSQL database.
The database has a Single table with the following columns:
Query - Stores a string of the search query used.
SearchEngine - Stores a string of the search engine (e.g. Ask)
ReturnedURL - Stores a string of the returned URL (this is got from
the parsed source sheet)
URLNo - Stores an int the position of the Returned URL (i.e. the first
URL is number 1 and so on)

Yes, it is possible. Lots of ways to do it. The trick is to find a
reliable way to recognize the various entities in the page.

I would start by reading the page into a String or char array, and then
seeing if I could write regular expressions to recognize things. See
java.util.regex.

Don't use Xerces. It will choke on any ill-formed html.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top