P
pabloski
I need to parse real world HTML/XML documents and I found two nice python
solution: BeautifulSoup and Tidy.
However I found pyXPCOM that is a wrapper for Gecko. So I was thinking
Gecko surely handles bad html in a more consistent and error-proof way
than BS and Tidy.
I'm interested in using Mozilla DOM from inside a Python script, however
I'm a bit confused about how can I use pyXPCOM to accomplish this job.
Any suggestions?
solution: BeautifulSoup and Tidy.
However I found pyXPCOM that is a wrapper for Gecko. So I was thinking
Gecko surely handles bad html in a more consistent and error-proof way
than BS and Tidy.
I'm interested in using Mozilla DOM from inside a Python script, however
I'm a bit confused about how can I use pyXPCOM to accomplish this job.
Any suggestions?