E
Ezee
Hi,
I am trying to make a web crawler which will be topic focused. For
this, I have to make some calculations on the contents of url before
adding that url into my database.
I had found a very useful program of Word Count from sun java forum,
but its problem is that it also includes the HTML tags in calculation.
Can anybody please tell me is there any Java api or online help
available for
i) A program which counts words in HTML file but doesnt include HTML
tags.
ii) A program which counts only Bolds and Italics in HTML file.
Thanx in anticipation
I am trying to make a web crawler which will be topic focused. For
this, I have to make some calculations on the contents of url before
adding that url into my database.
I had found a very useful program of Word Count from sun java forum,
but its problem is that it also includes the HTML tags in calculation.
Can anybody please tell me is there any Java api or online help
available for
i) A program which counts words in HTML file but doesnt include HTML
tags.
ii) A program which counts only Bolds and Italics in HTML file.
Thanx in anticipation