Regular Expression help

Discussion in 'ASP General' started by Rob, Apr 26, 2007.

  1. Rob

    Rob Guest

    Hi,
    I need to convert our word documents to html for our website. I've used
    MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
    2.0" to clean up the code but I am stuck with a lot of additional code
    and I want to write a script that will do a custom cleanup.

    The Word document has a "Table of Contents" and when I convert, I get
    links at the top of my page that link to the appropriate section but I
    get code like this:

    <a name="_Toc54767572"></a><a name="_Toc58978952"></a><a
    name="_Toc58980987"></a><a
    name="_Toc58981749"></a><a name="_Toc90871301"></a><a
    name="_Toc93973545"></a><a
    name="_Toc126114863"></a>
    <a name="_Toc157391168">My Title</a>

    I get a whole bunch of empty anchor tags each with a different name and
    only the last anchor tag is correct. I would like to use regular
    expressions to remove all empty "a" tags.

    I know how to use regular expressions with ASP 3.0 but I don't know the
    pattern.

    Does anyone know the regex.pattern to replace all empty <a> tags with an
    empty string?

    Thanks
    Rob



    *** Sent via Developersdex http://www.developersdex.com ***
     
    Rob, Apr 26, 2007
    #1
    1. Advertising

  2. "Rob" <> wrote in message
    news:...
    > Hi,
    > I need to convert our word documents to html for our website. I've used
    > MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
    > 2.0" to clean up the code but I am stuck with a lot of additional code
    > and I want to write a script that will do a custom cleanup.
    >
    > The Word document has a "Table of Contents" and when I convert, I get
    > links at the top of my page that link to the appropriate section but I
    > get code like this:
    >
    > <a name="_Toc54767572"></a><a name="_Toc58978952"></a><a
    > name="_Toc58980987"></a><a
    > name="_Toc58981749"></a><a name="_Toc90871301"></a><a
    > name="_Toc93973545"></a><a
    > name="_Toc126114863"></a>
    > <a name="_Toc157391168">My Title</a>
    >
    > I get a whole bunch of empty anchor tags each with a different name and
    > only the last anchor tag is correct. I would like to use regular
    > expressions to remove all empty "a" tags.
    >


    Rob, I think something similar to

    Set RegularExpressionObject = New RegExp

    With RegularExpressionObject
    ..Pattern = "\<a(.|\n)*\>\<\/a\>"
    ..IgnoreCase = True
    ..Global = True
    End With

    ReplacedText = RegularExpressionObject.Replace(InitialText, "")
     
    Alexey Smirnov, Apr 27, 2007
    #2
    1. Advertising

  3. Rob

    Evertjan. Guest

    Alexey Smirnov wrote on 27 apr 2007 in
    microsoft.public.inetserver.asp.general:

    >
    > "Rob" <> wrote in message
    > news:...

    [..]
    >>
    >> I get a whole bunch of empty anchor tags each with a different name
    >> and only the last anchor tag is correct. I would like to use regular
    >> expressions to remove all empty "a" tags.
    >>

    >
    > Rob, I think something similar to
    >
    > Set RegularExpressionObject = New RegExp
    >
    > With RegularExpressionObject
    > .Pattern = "\<a(.|\n)*\>\<\/a\>"
    > .IgnoreCase = True
    > .Global = True
    > End With
    >
    > ReplacedText = RegularExpressionObject.Replace(InitialText, "")


    ..Pattern = "<a[^>]*>\s*<\/a>"

    will do.

    =================

    However, why [yes, I know it is personal preference] not use a bit of
    jscript even if you use vbs in ASP:


    <% ' vbs
    dim t,result
    t="x<a \nhref='bbb'> \n </a>\n\n<a href='bbb'> x </a>"
    result = deleteEmptyAnchors(t)
    %>


    <script language='jscript' runat='server'>
    function deleteEmptyAnchors(t){
    return t.replace(/<a[^>]*>\s*<\/a>/gi,'');
    };
    </script>


    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Apr 27, 2007
    #3
  4. Rob

    Rob Guest

    Thanks Evertjan

    I tried the other example "\<a(.|\n)*\>\<\/a\>" but my page was taking
    too long to process it. Then I tried your example "<a[^>]*>\s*<\/a>" and
    it works great.

    Thanks again.

    Rob



    *** Sent via Developersdex http://www.developersdex.com ***
     
    Rob, Apr 27, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anand

    Regular Expression help...

    Anand, Jul 9, 2003, in forum: Perl
    Replies:
    1
    Views:
    1,230
    Eric J. Roode
    Jul 9, 2003
  2. Eric B.
    Replies:
    1
    Views:
    432
    Jim Gibson
    Dec 17, 2004
  3. VSK
    Replies:
    2
    Views:
    2,303
  4. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    851
    Alan Moore
    Dec 2, 2005
  5. GIMME
    Replies:
    3
    Views:
    11,970
    vforvikash
    Dec 29, 2008
Loading...

Share This Page