Remove javascript content from HTML page using Perl

Discussion in 'Perl' started by Mark, Aug 12, 2004.

  1. Mark

    Mark Guest

    I am creating a search engine that will scan pages on my Web site. I do not
    want any false-positive hits, being defined as any match that does not appear
    on the page using a Web browser.

    I am using a string of regexp statements to do this. The first one removes
    anything before the <body> tag since it contains, among other things, general
    keywords and function definitions. The goal of the next one is to remove
    <script including attributes>all code</script>. The third regexp strips any
    HTML tag, <anything>. I have a Javascript function called presentphoto(...)
    that I don't want to be found when using the search function to find 'photo'.
    The first and third regexp work fine. I thought the second regexp,
    /<script[^>]*>[^<]*<\/script>//gs would do what I wanted. It does handle the
    photo problem but something else is wrong as searches for write find
    document.write(...) between the <script>document.write(...)</script> tag.

    Can anyone help with a better expression to remove everything between <script
    type="text/javascript">...</script>?

    Thanks.
     
    Mark, Aug 12, 2004
    #1
    1. Advertising

  2. Mark

    Joe Smith Guest

    Mark wrote:

    /<script[^>]*>[^<]*<\/script>//gs would do what I wanted. It does handle the

    s%<script.*?>.*?</script>%%gsi;

    If that does not help, re-post your question to comp.lang.perl.misc
    instead of comp.lang.perl .
    -Joe
     
    Joe Smith, Aug 12, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon-Pierre  Jarry
    Replies:
    2
    Views:
    2,412
    Henrik
    Aug 10, 2005
  2. hazz
    Replies:
    6
    Views:
    50,032
    SkyUCHC
    Jun 9, 2010
  3. Dave L
    Replies:
    3
    Views:
    3,368
    Göran Andersson
    Mar 4, 2010
  4. Charles L.
    Replies:
    0
    Views:
    123
    Charles L.
    Mar 23, 2009
  5. Duke
    Replies:
    1
    Views:
    183
    Tom de Neef
    Apr 18, 2008
Loading...

Share This Page