About a class I wrote to filter bad html input

O

Owen Wong

Please look at my newly written class. It is meant to be used to filter
suspicious html input from an online html editor. I need help about 2
things:
1. Does it need to filter more things? Which I think is of course
needed although I don't know where to improve.
2. You see I try to filter any link. If the target address is not
started with "http://" or "mailto:", it will be replaced with an empty
string. But I think the code I wrote can be rewritten to make it more
performant. But how?
=========================
Public Class strOp
Public Function filterHtml(ByVal s As String)
s = Regex.Replace(s,
"<script>|</script>|<iframe.*?><!--#include.*?>", "",
RegexOptions.IgnoreCase)
s = Regex.Replace(s, "<.*? (?:eek:nload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
Dim re As New Regex("<a .*?href\s*=\s*[""]?([^""
]*)[""]?.*?>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match
Dim s1, s2 As String
Dim ms As MatchCollection
ms = re.Matches(s)
For Each m In ms
s1 = m.Value.ToLower.ToString
s2 = re.Replace(s1, "$1")
If Not (s2.StartsWith("mailto:") Or
s2.StartsWith("http://")) Then
s = s.Replace(s1, "<a href=''>")
End If
Next
Return s
End Function
End Class
 
O

Owen Wong

a little bit update:
I changed the 4th line:
----------------
s = Regex.Replace(s, "<.*? (?:eek:nload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
---------------------------
to:
================
s = Regex.Replace(s, "<.*?\s*(?:eek:n)[a-z]*\s*=\s*.*?>", "",
RegexOptions.IgnoreCase)
=============
so that it can match all dhtml events.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,016
Latest member
TatianaCha

Latest Threads

Top