hpricot - parse html

K

K. R.

hi @all

I would like to parse html code and remove all tags that starts with
<!-- and end with -->

How can I remove this tags with regex? I used the gsub! function to
manipulate the string.

Thanks for helping...
 
J

Jim Clark

Try this...

C:\temp>irb
irb(main):001:0> mystring = "xxx<!-- and end with --> yy <!-- another
comment --> zz"
=> "xxx<!-- and end with --> yy <!-- another comment --> zz"
irb(main):002:0> mystring.gsub(/<!--.*?-->/,'')
=> "xxx yy zz"

Regards,
Jim
 
S

sishen

[Note: parts of this message were removed to make it a legal post.]

You should also process the \n, \r char.

So I think the regex should be "<!--(.|\n|\r)*?-->".
 
D

Daniel Brumbaugh Keeney

You should also process the \n, \r char.

So I think the regex should be "<!--(.|\n|\r)*?-->".

Don't forget about the multiline option, it's easy, just stick an 'm'
after the regexp.

Daniel Brumbaugh Keeney
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top