H
HH
I've been messing with Hpricot and I'm trying to do a few things that
aren't apparently documented or available as part of Hpricot. Can
someone verify the following...
1) Is there a simple way to determine the element's current path /
location? For example, if I find a text node, is there a simple way to
determine the path of that text node so I can find it again later using
that path / location as a parameter to the search method? I assume I
can use the parent method to find the parent and recurse through until
I get to the root node...is there an easier way?
2) Is there a simple way to find all elements with non-empty text
nodes? It appears that Hpricot is focused on providing methods for
finding something if you know the element tag / attributes / classes /
etc. I've been using traverse_text which requires going through every
text node and filtering out the ones that are empty / whitespace. Is
there an easier way to find all elements with non-empty text nodes?
This is in reference to parsing HTML pages which may or may not be
well-formed.
All in all - I really like Hpricot. I was using REXML and tidy before,
but this is alot simplier and faster!
Thanks to _why the lucky stiff for a great little HTML parser...
aren't apparently documented or available as part of Hpricot. Can
someone verify the following...
1) Is there a simple way to determine the element's current path /
location? For example, if I find a text node, is there a simple way to
determine the path of that text node so I can find it again later using
that path / location as a parameter to the search method? I assume I
can use the parent method to find the parent and recurse through until
I get to the root node...is there an easier way?
2) Is there a simple way to find all elements with non-empty text
nodes? It appears that Hpricot is focused on providing methods for
finding something if you know the element tag / attributes / classes /
etc. I've been using traverse_text which requires going through every
text node and filtering out the ones that are empty / whitespace. Is
there an easier way to find all elements with non-empty text nodes?
This is in reference to parsing HTML pages which may or may not be
well-formed.
All in all - I really like Hpricot. I was using REXML and tidy before,
but this is alot simplier and faster!
Thanks to _why the lucky stiff for a great little HTML parser...