W
Wild Al
Hi everyone:
I'm trying to strip html with the exception of a few html tags.
I have found the following code:
def strip_tags(html)
if html.index("<")
text = ""
tokenizer = HTML::Tokenizer.new(html)
while token = tokenizer.next
node = HTML::Node.parse(nil, 0, 0, token, false)
# result is only the content of any Text nodes
text << node.to_s if node.class == HTML::Text
end
# strip any comments, and if they have a newline at the end (ie.
line with
# only a comment) strip that too
text.gsub(/<!--(.*?)-->[\n]?/m, "")
else
html # already plain text
end
end
I'm trying to understand what is going on in this code but cannot find
documenation for HTML::Tokenizer or HTML::Node.parse. Does anyone know
the use of the parameters in the parse method?
In the while loop, how do you access the html tag. If I could access
the html tags, I could then decide if I wanted to keep the tag or not.
Thanks for reading,
Wild Al
I'm trying to strip html with the exception of a few html tags.
I have found the following code:
def strip_tags(html)
if html.index("<")
text = ""
tokenizer = HTML::Tokenizer.new(html)
while token = tokenizer.next
node = HTML::Node.parse(nil, 0, 0, token, false)
# result is only the content of any Text nodes
text << node.to_s if node.class == HTML::Text
end
# strip any comments, and if they have a newline at the end (ie.
line with
# only a comment) strip that too
text.gsub(/<!--(.*?)-->[\n]?/m, "")
else
html # already plain text
end
end
I'm trying to understand what is going on in this code but cannot find
documenation for HTML::Tokenizer or HTML::Node.parse. Does anyone know
the use of the parameters in the parse method?
In the while loop, how do you access the html tag. If I could access
the html tags, I could then decide if I wanted to keep the tag or not.
Thanks for reading,
Wild Al