remove all html tags by perl

Discussion in 'Perl' started by jjliu, Oct 10, 2003.

  1. jjliu

    jjliu Guest

    Could someone tell me how to remove all html tags (and anything inside tags)
    by perl. Some people suggested me to use HTML::TagFilter but i could not
    find window version. Thanks very much for your help.

    JJL
    jjliu, Oct 10, 2003
    #1
    1. Advertising

  2. Gunnar Hjalmarsson, Oct 10, 2003
    #2
    1. Advertising

  3. jjliu

    jjliu Guest

    Thanks.What i wanted is to remove head tag and anything inside it. Could you
    help me out.

    "Gunnar Hjalmarsson" <> ????
    news:KDxhb.32297$...
    > jjliu wrote:
    > > Could someone tell me how to remove all html tags (and anything
    > > inside tags) by perl.

    >
    > Sure.
    >
    > s/.*//s;
    >
    > --
    > Gunnar Hjalmarsson
    > Email: http://www.gunnar.cc/cgi-bin/contact.pl
    >
    jjliu, Oct 10, 2003
    #3
  4. jjliu

    Kris Wempa Guest

    "Gunnar Hjalmarsson" <> wrote in message
    news:KDxhb.32297$...
    > jjliu wrote:
    > > Could someone tell me how to remove all html tags (and anything
    > > inside tags) by perl.

    >
    > Sure.
    >
    > s/.*//s;
    >


    That will remove ALL characters. He really needs something along the lines
    of:

    s/\<[^\<]+\>//;

    This only works if the entire TAG is within the same string. If the tag
    spans multiple lines, they will need to be concatenated into 1 string.
    Kris Wempa, Oct 10, 2003
    #4
  5. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    "Kris Wempa" <calmincents(NO_SPAM)@yahoo.com> wrote in
    news:bm6vql$:

    >
    > "Gunnar Hjalmarsson" <> wrote in message
    > news:KDxhb.32297$...
    >> jjliu wrote:
    >> > Could someone tell me how to remove all html tags (and anything
    >> > inside tags) by perl.

    >>
    >> Sure.
    >>
    >> s/.*//s;
    >>

    >
    > That will remove ALL characters.


    Gunnar knows that. :)


    > He really needs something along the
    > lines of:
    >
    > s/\<[^\<]+\>//;


    Why all the backslashes?
    Also, I suspect you meant the second < to be a >.


    > This only works if the entire TAG is within the same string. If the
    > tag spans multiple lines, they will need to be concatenated into 1
    > string.


    It also doesn't work if anything within the tag or its attributes contain
    a > symbol. Example:

    <img src="mathexpression.gif" alt="5 is > 4" />
    <input type="submit" onclick="if (count > 1) true else false" />

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP4ftJGPeouIeTNHoEQJxpACghIOdjOo5xr7rh9N5zQ6d9EF3KvIAmwdA
    R0qdv3U33ZyBzW4L7u8Vq6jf
    =sIdz
    -----END PGP SIGNATURE-----
    Eric J. Roode, Oct 11, 2003
    #5
  6. jjliu wrote:
    > Thanks.What i wanted is to remove head tag and anything inside it.
    > Could you help me out.


    Only the head tag? Well, in that case a regexp similar to what Kris
    suggested might be sufficient. But please note that normally you'd
    better use a module when dealing with HTML code, and even if I have
    never used the one you mentioned, it appears to be a good suggestion.

    > Some people suggested me to use HTML::TagFilter but i could not
    > find window version.


    What do you mean by Windows version? What makes you think that
    HTML::TagFilter doesn't work on Windows?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Oct 15, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    1
    Views:
    7,041
    Ice Demon
    Jul 15, 2003
  2. Rob Nicholson
    Replies:
    3
    Views:
    657
    Rob Nicholson
    May 28, 2005
  3. Shiperton Henethe
    Replies:
    169
    Views:
    2,853
    J.S. Ferguson
    Sep 26, 2003
  4. Charles L.
    Replies:
    0
    Views:
    101
    Charles L.
    Mar 23, 2009
  5. Rob
    Replies:
    4
    Views:
    1,105
    George Mpouras
    Feb 14, 2012
Loading...

Share This Page