Hpricot - best way to parse based on comments

J

Jerome ---

I am trying to parse some files that contain comments like this:

<html>
<body>

<!-- BEGIN ad_content -->

images, text, etc...

<!-- END ad_content -->

Interesting text of site here.

</body>
</html>


I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Thanks for any ideas!

- Jerome
 
K

Keith Fahlgren

I am trying to parse some files that contain comments like this:
...
I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

The XPath comment() selector will select all comments:

For example (xpath after -m flag):
keith@devel ~ $ xml sel -t -m '//comment()' -v '.' -n simple.xml
one comment
two comment

keith@devel ~ $ cat simple.xml
<simple>
<!-- one comment -->
<foo/>
<!-- two comment -->
<bar/>
</simple>


HTH,
Keith
 
K

Ken Bloom

I am trying to parse some files that contain comments like this:

<html>
<body>

<!-- BEGIN ad_content -->

images, text, etc...

<!-- END ad_content -->

Interesting text of site here.

</body>
</html>


I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Thanks for any ideas!

- Jerome

Why not gsub out the unwanted sections before parsing with hpricot, or
if the data you want is nested between comments, use a regexp to narrow
down the document to only the text between the comments before parsing
with hpricot?

--Ken Bloom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top