HTML filtering in weblog/BBS software

A

Alexey Verkhovsky

Hi all,

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

Now, I don't want to let some kiddie include some <javascript/> that
would make an innocent BBS thread pop 50 new browsers - no matter how
cool it might seem.

I wonder if there is any existing code to sanitize user inputs by
replacing dangerous HTML tags (like the aforementioned <javascript/>),
that I could use with RedCloth to alleviate this risk.

Ditto for plain text inputs (user names, subjects and other such).

Alex
 
A

Austin Ziegler

Hi all,

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

Now, I don't want to let some kiddie include some <javascript/> that
would make an innocent BBS thread pop 50 new browsers - no matter how
cool it might seem.

I wonder if there is any existing code to sanitize user inputs by
replacing dangerous HTML tags (like the aforementioned <javascript/>),
that I could use with RedCloth to alleviate this risk.

Ditto for plain text inputs (user names, subjects and other such).

There is some work that I'm doing with Ruwiki that is currently in CVS
that covers this -- it currently covers it too well, but it does cover
it. (I just fixed this.)

# Find HTML tags
SIMPLE_TAG_RE = %r{<[^<>]+?>} # Ensure that only the tag is grabbed.
HTML_TAG_RE = %r{\A< # Tag must be at start of match.
(/)? # Closing tag?
([\w:]+) # Tag name
(?:\s+ # Space
([^>]+) # Attributes
(/)? # Singleton tag?
)? # The above three are optional
ATTRIBUTES_RE = %r{([\w:]+)(=(?:\w+|"[^"]+?"|'[^']+?'))?}x
ALLOWED_ATTR = %w(style title type lang dir class id cite datetime abbr) +
%w(colspan rowspan compact start media)
ALLOWED_HTML = %w(abbr acronym address b big blockquote br caption cite) +
%w(code col colgroup dd del dfn dir div dl dt em h1 h2 h3) +
%w(h4 h5 h6 hr i ins kbd kbd li menu ol p pre q s samp) +
%w(small span span strike strong style sub sup table tbody) +
%w(td tfoot th thead tr tt u ul var)

# Clean the content of unsupported HTML and attributes. This includes
# XML namespaced HTML. Sorry, but there's too much possibility for
# abuse.
def clean(content)
content = content.gsub(SIMPLE_TAG_RE) do |tag|
tagset = HTML_TAG_RE.match(tag)

if tagset.nil?
tag = Ruwiki.clean_entities(tag)
else
closer, name, attributes, single = tagset.captures

if ALLOWED_HTML.include?(name.downcase)
unless closer or attributes.nil?
attributes = attributes.scan(ATTRIBUTES_RE).map do |set|
if ALLOWED_ATTR.include?(set[0].downcase)
set.join
else
""
end
end.compact.join(" ")
tag = "<#{closer}#{name} #{attributes}#{single}>"
else
tag = "<#{closer}#{name}>"
end
else
tag = Ruwiki.clean_entities(tag)
end
end

tag
end
end

Ruwiki.clean_entities converts all instances of & => &amp;, < => &lt;,
and > => &gt;.

-austin
 
F

Florian Gross

Alexey said:
Hi all,
Moin!

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

There's two options for not allowing user-specified HTML and style
sheets. (Even style sheets can contain JavaScript.) Just use RedCloth
like this:

RedCloth.new("h1. A <b>bold</b> man", [:filter_html, :filter_styles])
# => "<h1>A &lt;b&gt;bold&lt;/b&gt; man</h1>"

BlueCloth and RDoc have similar options AFAIK.

Regards,
Florian Gross
 
M

Mauricio Fernández

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

There's two options for not allowing user-specified HTML and style
sheets. (Even style sheets can contain JavaScript.) Just use RedCloth
like this:

RedCloth.new("h1. A <b>bold</b> man", [:filter_html, :filter_styles])
# => "<h1>A &lt;b&gt;bold&lt;/b&gt; man</h1>"

BlueCloth and RDoc have similar options AFAIK.

IIRC RDoc doesn't allow raw HTML by design.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top