parse HTML by class rather than tag

L

lorean2007

Hello,

i'm would be interested in parsing a HTML files by its corresponding
opening and closing tags but by taking into account the class
attributes and its values,

<html>
<body>
....
<div class="one">
....
<div class="two">
</div>
....
</div>
....
<div class="one">...</div>
<a href="..." class="three">
</body>
</html>

in this example, i will need all content inside div with class="two",
or only class="one",

so i wondering if i should go with regular expression, but i do not
think so as i must jumpt after inner closing div, or with a simple
parser, i've searched and found
http://www.diveintopython.org/html_processing/basehtmlprocessor.html
but i would like the parser not to change anything at all (no
lowercase).

can you help ?

best.
 
G

gatti

Hello,

i'm would be interested in parsing a HTML files by its corresponding
opening and closing tags but by taking into account the class
attributes and its values, [...]
so i wondering if i should go with regular expression, but i do not
think so as i must jumpt after inner closing div, or with a simple
parser, i've searched and foundhttp://www.diveintopython.org/html_processing/basehtmlprocessor.html
but i would like the parser not to change anything at all (no
lowercase).

Horribly brittle idea. Use a robust HTML parser (e.g.
http://www.crummy.com/software/BeautifulSoup/) to build a document
tree, then visit it top down and look at the value of the 'class'
attributes.

Regards,
Lorenzo Gatti
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top