hpricot - parse html

Discussion in 'Ruby' started by K. R., Jan 2, 2008.

  1. K. R.

    K. R. Guest

    hi @all

    I would like to parse html code and remove all tags that starts with
    <!-- and end with -->

    How can I remove this tags with regex? I used the gsub! function to
    manipulate the string.

    Thanks for helping...
    --
    Posted via http://www.ruby-forum.com/.
    K. R., Jan 2, 2008
    #1
    1. Advertising

  2. K. R.

    Jim Clark Guest

    Try this...

    C:\temp>irb
    irb(main):001:0> mystring = "xxx<!-- and end with --> yy <!-- another
    comment --> zz"
    => "xxx<!-- and end with --> yy <!-- another comment --> zz"
    irb(main):002:0> mystring.gsub(/<!--.*?-->/,'')
    => "xxx yy zz"

    Regards,
    Jim

    K. R. wrote:
    > hi @all
    >
    > I would like to parse html code and remove all tags that starts with
    > <!-- and end with -->
    >
    > How can I remove this tags with regex? I used the gsub! function to
    > manipulate the string.
    >
    > Thanks for helping...
    >
    Jim Clark, Jan 3, 2008
    #2
    1. Advertising

  3. K. R.

    sishen Guest

    [Note: parts of this message were removed to make it a legal post.]

    You should also process the \n, \r char.

    So I think the regex should be "<!--(.|\n|\r)*?-->".

    On Jan 3, 2008 11:37 AM, Jim Clark <> wrote:

    > Try this...
    >
    > C:\temp>irb
    > irb(main):001:0> mystring = "xxx<!-- and end with --> yy <!-- another
    > comment --> zz"
    > => "xxx<!-- and end with --> yy <!-- another comment --> zz"
    > irb(main):002:0> mystring.gsub(/<!--.*?-->/,'')
    > => "xxx yy zz"
    >
    > Regards,
    > Jim
    >
    > K. R. wrote:
    > > hi @all
    > >
    > > I would like to parse html code and remove all tags that starts with
    > > <!-- and end with -->
    > >
    > > How can I remove this tags with regex? I used the gsub! function to
    > > manipulate the string.
    > >
    > > Thanks for helping...
    > >

    >
    >
    >
    sishen, Jan 3, 2008
    #3
  4. On Jan 3, 2008 4:37 AM, sishen <> wrote:
    > You should also process the \n, \r char.
    >
    > So I think the regex should be "<!--(.|\n|\r)*?-->".


    Don't forget about the multiline option, it's easy, just stick an 'm'
    after the regexp.

    Daniel Brumbaugh Keeney
    Daniel Brumbaugh Keeney, Jan 3, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jerome ---
    Replies:
    2
    Views:
    167
    Ken Bloom
    Nov 21, 2006
  2. Ehud Rosenberg
    Replies:
    2
    Views:
    135
    Ehud Rosenberg
    Nov 14, 2007
  3. Adam Dullenty

    using HPricot to parse a fiddly table

    Adam Dullenty, Jan 6, 2008, in forum: Ruby
    Replies:
    2
    Views:
    112
    Adam Dullenty
    Jan 7, 2008
  4. Christiaan Venter
    Replies:
    1
    Views:
    134
    7stud --
    May 22, 2009
  5. No Uu
    Replies:
    1
    Views:
    98
    Rob Biedenharn
    May 25, 2009
Loading...

Share This Page