Discussion in 'ASP General' started by DrewM, Oct 13, 2003.

    I'm attempting to clean up HTML in a database by quoting all unquoted

    So far, I have this:

    oRegExp.Pattern = "<([^>]+)=([^>""]+)>"
    sHtml = oRegExp.Replace(sHtml, "<$1=""$2"">")

    which I can use to replace single attributes:
    <p class=foo> becomes <p class="foo">

    Now I'm trying to deal with multiple attributes and am getting myself
    into a pickle converting:

    <p class=foo name=bar> into <p class="foo" name="bar">

    The best I've come up with so far is:

    oRegExp.Pattern = "<(\w*\s)(([^=>]+=)([^>""\s]+))+>"
    sHtml = oRegExp.Replace(sHtml, "<$1 $3""$4"">")

    which obviously isn't going to work! :)

    How can I match multiple unquoted attributes and replace them with quotes?


  2. You are going to have to do a two pass capture. First capture the tag
    (<something>), then capture the attributes/value pairs in each tag and
    quote-delimit the unquoted values. When regular expression tasks reach
    this level of complexity, I like to drop into JScript, as its native
    support for RE's is more robust. Here's an example:

    <script language="JavaScript" runat="SERVER">
    var s = "<p BadAttribute=unquoted GoodAttribute='<hello>'>Here is some
    text</p><p BadAttribute=NoQuotes>Here's another paragraph</p>";

    Chris Hohmann, Oct 13, 2003
