Regular Expression help

R

Rob

Hi,
I need to convert our word documents to html for our website. I've used
MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
2.0" to clean up the code but I am stuck with a lot of additional code
and I want to write a script that will do a custom cleanup.

The Word document has a "Table of Contents" and when I convert, I get
links at the top of my page that link to the appropriate section but I
get code like this:

<a name="_Toc54767572"></a><a name="_Toc58978952"></a><a
name="_Toc58980987"></a><a
name="_Toc58981749"></a><a name="_Toc90871301"></a><a
name="_Toc93973545"></a><a
name="_Toc126114863"></a>
<a name="_Toc157391168">My Title</a>

I get a whole bunch of empty anchor tags each with a different name and
only the last anchor tag is correct. I would like to use regular
expressions to remove all empty "a" tags.

I know how to use regular expressions with ASP 3.0 but I don't know the
pattern.

Does anyone know the regex.pattern to replace all empty <a> tags with an
empty string?

Thanks
Rob
 
G

Guest

Rob said:
Hi,
I need to convert our word documents to html for our website. I've used
MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
2.0" to clean up the code but I am stuck with a lot of additional code
and I want to write a script that will do a custom cleanup.

The Word document has a "Table of Contents" and when I convert, I get
links at the top of my page that link to the appropriate section but I
get code like this:

<a name="_Toc54767572"></a><a name="_Toc58978952"></a><a
name="_Toc58980987"></a><a
name="_Toc58981749"></a><a name="_Toc90871301"></a><a
name="_Toc93973545"></a><a
name="_Toc126114863"></a>
<a name="_Toc157391168">My Title</a>

I get a whole bunch of empty anchor tags each with a different name and
only the last anchor tag is correct. I would like to use regular
expressions to remove all empty "a" tags.

Rob, I think something similar to

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
..Pattern = "\<a(.|\n)*\>\<\/a\>"
..IgnoreCase = True
..Global = True
End With

ReplacedText = RegularExpressionObject.Replace(InitialText, "")
 
E

Evertjan.

Anon User wrote on 27 apr 2007 in
microsoft.public.inetserver.asp.general:
Rob, I think something similar to

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
.Pattern = "\<a(.|\n)*\>\<\/a\>"
.IgnoreCase = True
.Global = True
End With

ReplacedText = RegularExpressionObject.Replace(InitialText, "")

..Pattern = "<a[^>]*>\s*<\/a>"

will do.

=================

However, why [yes, I know it is personal preference] not use a bit of
jscript even if you use vbs in ASP:


<% ' vbs
dim t,result
t="x<a \nhref='bbb'> \n </a>\n\n<a href='bbb'> x </a>"
result = deleteEmptyAnchors(t)
%>


<script language='jscript' runat='server'>
function deleteEmptyAnchors(t){
return t.replace(/<a[^>]*>\s*<\/a>/gi,'');
};
</script>
 
R

Rob

Thanks Evertjan

I tried the other example "\<a(.|\n)*\>\<\/a\>" but my page was taking
too long to process it. Then I tried your example "<a[^>]*>\s*<\/a>" and
it works great.

Thanks again.

Rob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,177
Latest member
OrderGlucea
Top