T
Tim_Mac
hi,
i have a tricky problem and my regex expertise has reached its limit.
i have read other posts on this newsgroup that pull out the plain text
from a html string, but that won't work for me because i want to
preserve the html, and replace some of the plain text.
i basically want to show the user's search terms highlighted in the
page, like google does, but i want to do this server side (i have the
mechanics of intercepting the html sorted out, by overriding the
Page.Render method). i can use a simple regex pattern like (keyword)
and replace with <span class='highlight'>$1</span> but this causes
problems because the keyword may appear in markup tags or attribute
values, which the above example will also replace, screwing up the html
structure.
what i want to express is: match the keyword, where it is not contained
inside a html tag, i.e. between a < and > character
my most obvious attempt is too simplistic and doesn't work:
[^<]*(keyword)[^>]*
i did come up with another regex which i am almost embarassed to show
it essentially matches the keyword inside the inner text of a html tag
set. but the problem is that it misses subsequent occurrences of the
keyword in the same match.
here is the pattern:
<(?<tag>\w+)([^>]*>[^<]*)(?<innerText>KeyWord)([^<]*</\k<tag>>)
and the replace: <$3$1<span class='highlight'>$4</span>$2
it actually works, but as i mentioned it does miss multiple occurrences
inside the same tag, and requires all the text to be within an open +
close html tag.
i would be really grateful if anyone had a suggestion
thanks
tim
i have a tricky problem and my regex expertise has reached its limit.
i have read other posts on this newsgroup that pull out the plain text
from a html string, but that won't work for me because i want to
preserve the html, and replace some of the plain text.
i basically want to show the user's search terms highlighted in the
page, like google does, but i want to do this server side (i have the
mechanics of intercepting the html sorted out, by overriding the
Page.Render method). i can use a simple regex pattern like (keyword)
and replace with <span class='highlight'>$1</span> but this causes
problems because the keyword may appear in markup tags or attribute
values, which the above example will also replace, screwing up the html
structure.
what i want to express is: match the keyword, where it is not contained
inside a html tag, i.e. between a < and > character
my most obvious attempt is too simplistic and doesn't work:
[^<]*(keyword)[^>]*
i did come up with another regex which i am almost embarassed to show
it essentially matches the keyword inside the inner text of a html tag
set. but the problem is that it misses subsequent occurrences of the
keyword in the same match.
here is the pattern:
<(?<tag>\w+)([^>]*>[^<]*)(?<innerText>KeyWord)([^<]*</\k<tag>>)
and the replace: <$3$1<span class='highlight'>$4</span>$2
it actually works, but as i mentioned it does miss multiple occurrences
inside the same tag, and requires all the text to be within an open +
close html tag.
i would be really grateful if anyone had a suggestion
thanks
tim