hpricot - parse html

K. R. · Jan 2, 2008

hi @all

I would like to parse html code and remove all tags that starts with


How can I remove this tags with regex? I used the gsub! function to
manipulate the string.

Thanks for helping...

Jim Clark · Jan 3, 2008

Try this...

C:\temp>irb
irb(main):001:0> mystring = "xxx yy  zz"
=> "xxx yy  zz"
irb(main):002:0> mystring.gsub(//,'')
=> "xxx yy zz"

Regards,
Jim

sishen · Jan 3, 2008

[Note: parts of this message were removed to make it a legal post.]

You should also process the \n, \r char.

So I think the regex should be "".

Daniel Brumbaugh Keeney · Jan 3, 2008

You should also process the \n, \r char.

So I think the regex should be "".

Don't forget about the multiline option, it's easy, just stick an 'm'
after the regexp.

Daniel Brumbaugh Keeney

Hpricot files ------	2	Apr 27, 2010
Changing .html in URL	3	Jul 11, 2022
Html parsing with Hpricot	2	Jun 9, 2010
HTML parser using Hpricot	0	Jan 8, 2010
inner_html = "" in hpricot	0	Jan 25, 2010
Hpricot question	0	Jan 30, 2008
Image upload not working in browser	4	Sep 9, 2022
[ANN] Hpricot 0.8.2 released	1	Nov 6, 2009

hpricot - parse html

K. R.

Jim Clark

sishen

Daniel Brumbaugh Keeney

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads