Is there any utility / code that will return a file containing the *content* differences in two HTML

B

balma

Hi
I needed to know if there is a code that will return the textual
differences in two HTML pages.
Am curious to know what kind of algorithm does that require
I do have an HTML parser library with me - but don't have an idea on
what kind of algorithm to utilise
Thanks for any suggestions
 
M

Marco Schmidt

balma:
I needed to know if there is a code that will return the textual
differences in two HTML pages.
Am curious to know what kind of algorithm does that require
I do have an HTML parser library with me - but don't have an idea on
what kind of algorithm to utilise

You'll have to throw away the markup and then normalize the texts,
replacing all sequences of whitespace characters with a single space
character - if that doesn't get into the way of the comparison. Maybe
you _do_ care about whitespace.

There is diff, a tool that determines the differences between two
texts. IIRC there is a Java implementation of diff.

Do you need all differences, or some percentage value of similarity,
or a simple decision "texts are equal" / "texts are not equal"?

Regards,
Marco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top