Parsing a html page

S

swetha

I'm working on research projects in which i need to extract specified
content from a html page using Java code. There are many URLs in the
html page and i have to write the code in such a way that it goes
through the link and then extract the contents from that html page.
In this way it must see through all the links.Can anyone help me in
this? If anyone has a code for it please say it to me.
thanks,
Swetha.
 
B

Boris Werner

swetha said:
I'm working on research projects in which i need to extract specified
content from a html page using Java code. There are many URLs in the
html page and i have to write the code in such a way that it goes
through the link and then extract the contents from that html page.
In this way it must see through all the links.Can anyone help me in
this? If anyone has a code for it please say it to me.
thanks,
Swetha.
Hi!

I found this article about programming a Webcrawler...

http://www.devarticles.com/c/a/Java/Crawling-the-Web-with-Java/

It might help you, because it does exactly what you want to.
Just have a closer look at the methods (especially retrieveLinks() )

Hope this helps

Boris
 
J

jcsnippets.atspace.com

swetha said:
I'm working on research projects in which i need to extract specified
content from a html page using Java code. There are many URLs in the
html page and i have to write the code in such a way that it goes
through the link and then extract the contents from that html page.
In this way it must see through all the links.Can anyone help me in
this? If anyone has a code for it please say it to me.

Have a look at the following articles, I think this will help you get
started:
http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html
http://jcsnippets.atspace.com/java/regular-expressions/regular-expressions-f
ind-href.html

These will allow you to save a webpage, and extract links from said page. If
you'd like to extract more information, you need to use another regular
expression.

When you have a list of links, repeat the process of extracting those
webpages.

Best regards,

JayCee
 
W

Wibble

jcsnippets.atspace.com said:
Have a look at the following articles, I think this will help you get
started:
http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html
http://jcsnippets.atspace.com/java/regular-expressions/regular-expressions-f
ind-href.html

These will allow you to save a webpage, and extract links from said page. If
you'd like to extract more information, you need to use another regular
expression.

When you have a list of links, repeat the process of extracting those
webpages.

Best regards,

JayCee

We use HtmlUnit for testing servlets and jsp's but its a pretty
good screen scraper, javascript aware.

http://htmlunit.sourceforge.net/
 
Joined
Jun 24, 2010
Messages
1
Reaction score
0
Parsing web page in php

I'm working on project in which i need to extract images
and alt tags from a html page using PHP code. There are many images in the
html page and i have to write the code in such a way that it goes
through the image and then extract the alt tag from that <img>

Can anyone help me in
this?

If anyone has a code for it please say it to me.

Thanks,
Rujuta
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top