RegEx with multiple occurrences

Mike · May 4, 2006

Hi again.

I'm trying to strip all script blocks from HTML, and am using the
following re to do it:

p = re.compile("(\<script.*>*\</script>)",re.IGNORECASE | re.DOTALL)
m = p.search(data)

The problem is that I'm getting everything from the 1st script's start
tag to the last script's end tag in one group - so it seems like it
parses the string from both ends therefore removing far more from that
data than I want. What am I doing wrong?

Tim Chase · May 4, 2006

p = re.compile("(\ said:
m = p.search(data)

First, I presume you didn't copy & paste your expression, as
it looks like you're missing a period before the second
asterisk. Otherwise, all you'd get is any number of
greater-than signs followed by a closing "</script>" tag.

Second, you're likely getting some foobar results because
you're not using a "real" string of the form

r'(\ said:
The problem is that I'm getting everything from the 1st
script's start tag to the last script's end tag in one
group - so it seems like it parses the string from both
ends therefore removing far more from that data than I
want. What am I doing wrong?

Looks like you want the non-greedy modifier to the "*"
described at

http://docs.python.org/lib/re-syntax.html

(searching the page for "greedy" should turn up the
paragraph on the modifiers)

You likely want something more like:

r'<script[^>]*>.*?</script>'

In the first atom, you're looking for the remainder of the
script tag (as much stuff that isn't a ">" as possible).
Then you close the tag with the ">", and then take as little
as possible (".*?") of anything until you find the closing
"</script>" tag.

HTH,

-tkc

Mike · May 4, 2006

Tim - you're a legend. Thanks.

Tim Chase · May 4, 2006

Tim - you're a legend. Thanks.

A leg-end? I always knew something was a-foot. Sorry to
make myself the butt of such joking.

My pleasure...glad it seems to be working for you.

-tkc (not much of a legend at all...just a regexp wonk)

Find and replace multiple RegEx search expressions	0	Mar 18, 2014
Doing both regex match and assignment within a If loop?	7	Mar 29, 2013
[2.5] Regex doesn't support MULTILINE?	9	Jul 22, 2007
Python regex	5	Mar 13, 2008
Puzzled about this regex	0	Apr 18, 2009
Does Python mess with CRLFs?	4	Nov 12, 2008
Convert AWK regex to Python	6	May 16, 2011
Matching XML Tag Contents with Regex	6	Dec 11, 2007

RegEx with multiple occurrences

Mike

Tim Chase

Mike

Tim Chase

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads