Regexp: Matching unquoted attributes

Discussion in 'ASP General' started by DrewM, Oct 13, 2003.

  1. DrewM

    DrewM Guest

    I'm attempting to clean up HTML in a database by quoting all unquoted
    attributes.

    So far, I have this:

    oRegExp.Pattern = "<([^>]+)=([^>""]+)>"
    sHtml = oRegExp.Replace(sHtml, "<$1=""$2"">")

    which I can use to replace single attributes:
    <p class=foo> becomes <p class="foo">

    Now I'm trying to deal with multiple attributes and am getting myself
    into a pickle converting:

    <p class=foo name=bar> into <p class="foo" name="bar">

    The best I've come up with so far is:

    oRegExp.Pattern = "<(\w*\s)(([^=>]+=)([^>""\s]+))+>"
    sHtml = oRegExp.Replace(sHtml, "<$1 $3""$4"">")

    which obviously isn't going to work! :)

    How can I match multiple unquoted attributes and replace them with quotes?

    Thanks

    Drew
     
    DrewM, Oct 13, 2003
    #1
    1. Advertisements

  2. You are going to have to do a two pass capture. First capture the tag
    (<something>), then capture the attributes/value pairs in each tag and
    quote-delimit the unquoted values. When regular expression tasks reach
    this level of complexity, I like to drop into JScript, as its native
    support for RE's is more robust. Here's an example:

    <script language="JavaScript" runat="SERVER">
    var s = "<p BadAttribute=unquoted GoodAttribute='<hello>'>Here is some
    text</p><p BadAttribute=NoQuotes>Here's another paragraph</p>";
    Response.Write(s.replace(/<.*>?>/g,function(m,p,s){return
    m.replace(/(\w+=)(\w+)/g,"$1\"$2\"");}));
    </script>

    HTH
    -Chris Hohmann
     
    Chris Hohmann, Oct 13, 2003
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.