Search and Replace while ignoring HTML formatting?

Discussion in 'ASP .Net' started by Josiwe, Jul 24, 2007.

  1. Josiwe

    Josiwe Guest

    I have a search program that returns an HTML string which I display to
    the user. I want to highlight the search terms. However a simple
    search and replace on the HTML causes problems.

    If the user searches on Georgia and I get back this:
    <div style="font-name:Arial">Georgia, Alabama, and Louisiana</div>

    It works fine:
    <div style="font-name:Arial"><span style="background-
    color:yellow;">Georgia</span>, Alabama, and Louisiana</div>

    However if the HTML that comes back is this:
    <div style="font-name:Georgia">Georgia, Alabama, and Louisiana</div>

    I get a serious problem which breaks the formatting and looks
    <div style="font-name:<span style="background-color:yellow;">Georgia</
    span>"><span style="background-color:yellow;">Georgia</span>, Alabama,
    and Louisiana</div>

    The HTML I'm getting back is quite complex, with nested spans, style
    tags, etc. I'm stuck for how to solve this problem - is there a
    regular expression I can use to match chunks of non-formatting text to
    replace? I have neither the time nor the resources to write a full
    blown html tokenizer.
    Josiwe, Jul 24, 2007
    1. Advertisements

  2. Josiwe

    Brandon Gano Guest

    You will probably want to use XHTML instead of HTML and use an XML parser to
    do the work. You should be able to loop through text nodes and apply the
    search/replace there.
    Brandon Gano, Jul 24, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.