Search and Replace while ignoring HTML formatting?

J

Josiwe

I have a search program that returns an HTML string which I display to
the user. I want to highlight the search terms. However a simple
search and replace on the HTML causes problems.

If the user searches on Georgia and I get back this:
<div style="font-name:Arial">Georgia, Alabama, and Louisiana</div>

It works fine:
<div style="font-name:Arial"><span style="background-
color:yellow;">Georgia</span>, Alabama, and Louisiana</div>

However if the HTML that comes back is this:
<div style="font-name:Georgia">Georgia, Alabama, and Louisiana</div>

I get a serious problem which breaks the formatting and looks
terrible:
<div style="font-name:<span style="background-color:yellow;">Georgia</
span>"><span style="background-color:yellow;">Georgia</span>, Alabama,
and Louisiana</div>

The HTML I'm getting back is quite complex, with nested spans, style
tags, etc. I'm stuck for how to solve this problem - is there a
regular expression I can use to match chunks of non-formatting text to
replace? I have neither the time nor the resources to write a full
blown html tokenizer.
 
B

Brandon Gano

You will probably want to use XHTML instead of HTML and use an XML parser to
do the work. You should be able to loop through text nodes and apply the
search/replace there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top