remove all html tags by perl

Discussion in 'Perl' started by jjliu, Oct 10, 2003.

  1. jjliu

    jjliu Guest

    Could someone tell me how to remove all html tags (and anything inside tags)
    by perl. Some people suggested me to use HTML::TagFilter but i could not
    find window version. Thanks very much for your help.

    jjliu, Oct 10, 2003
  2. Sure.

    Gunnar Hjalmarsson, Oct 10, 2003
  3. jjliu

    jjliu Guest

    Thanks.What i wanted is to remove head tag and anything inside it. Could you
    help me out.
    jjliu, Oct 10, 2003
  4. jjliu

    Kris Wempa Guest

    That will remove ALL characters. He really needs something along the lines


    This only works if the entire TAG is within the same string. If the tag
    spans multiple lines, they will need to be concatenated into 1 string.
    Kris Wempa, Oct 10, 2003
    Gunnar knows that. :)

    Why all the backslashes?
    It also doesn't work if anything within the tag or its attributes contain
    a > symbol. Example:

    <img src="mathexpression.gif" alt="5 is > 4" />
    <input type="submit" onclick="if (count > 1) true else false" />

    Eric J. Roode, Oct 11, 2003
  6. Only the head tag? Well, in that case a regexp similar to what Kris
    suggested might be sufficient. But please note that normally you'd
    better use a module when dealing with HTML code, and even if I have
    never used the one you mentioned, it appears to be a good suggestion.
    What do you mean by Windows version? What makes you think that
    HTML::TagFilter doesn't work on Windows?
    Gunnar Hjalmarsson, Oct 15, 2003
