M
Mark
I am creating a search engine that will scan pages on my Web site. I do not
want any false-positive hits, being defined as any match that does not appear
on the page using a Web browser.
I am using a string of regexp statements to do this. The first one removes
anything before the <body> tag since it contains, among other things, general
keywords and function definitions. The goal of the next one is to remove
<script including attributes>all code</script>. The third regexp strips any
HTML tag, <anything>. I have a Javascript function called presentphoto(...)
that I don't want to be found when using the search function to find 'photo'.
The first and third regexp work fine. I thought the second regexp,
/<script[^>]*>[^<]*<\/script>//gs would do what I wanted. It does handle the
photo problem but something else is wrong as searches for write find
document.write(...) between the <script>document.write(...)</script> tag.
Can anyone help with a better expression to remove everything between <script
type="text/javascript">...</script>?
Thanks.
want any false-positive hits, being defined as any match that does not appear
on the page using a Web browser.
I am using a string of regexp statements to do this. The first one removes
anything before the <body> tag since it contains, among other things, general
keywords and function definitions. The goal of the next one is to remove
<script including attributes>all code</script>. The third regexp strips any
HTML tag, <anything>. I have a Javascript function called presentphoto(...)
that I don't want to be found when using the search function to find 'photo'.
The first and third regexp work fine. I thought the second regexp,
/<script[^>]*>[^<]*<\/script>//gs would do what I wanted. It does handle the
photo problem but something else is wrong as searches for write find
document.write(...) between the <script>document.write(...)</script> tag.
Can anyone help with a better expression to remove everything between <script
type="text/javascript">...</script>?
Thanks.