security vs. XML-based formats

I

Ivan Shmakov

It seems that one reason behind people implementing various, for
the lack of a better word, “Wiki-like†formats, is the presense
of “security†issues with some XML-based “document†formats.

Shortly put, the question is: are those issues (at least the
ones related to XHTML) documented somewhere?

Here, by “security†issues I mean, mostly, the ability to spawn
fairly arbitrary code execution on the “client's†host. (Note
that even if confined to some language's interpreter, such as
the JavaScript one, the program will still, most of the time,
will be able to eat CPU resources in a hard to control fashion.)

* The purpose

The overall purpose is to establish some kind of a collaborative
environment (say, a Wiki) for me and my occasional collaborators
to use. So far, I've set up an Ikiwiki instance as the base for
such an environment (mostly because of its support for Git [1],
which I've found convenient to use, as the storage backend.)

However, Ikiwiki is somewhat restrictive about the features that
can be used on the pages. E. g.:

--cut: http://ikiwiki.info/plugins/htmlscrubber/ --
The web's security model is fundamentally broken; ikiwiki's html
sanitisation is only a patch on the underlying gaping hole that is
your web browser.
--cut: http://ikiwiki.info/plugins/htmlscrubber/ --

The list of the possibly dangerous tags in the default Ikiwiki
configuration apparently follows the considerations below.

--cut: http://feedparser.org/docs/html-sanitization.html --
Here is an incomplete list of potentially dangerous HTML tags and
attributes:

• script, which can contain malicious script

• applet, embed, and object, which can automatically download and
execute malicious code

• meta, which can contain malicious redirects

• onload, onunload, and all other on* attributes, which can contain
malicious script

• style, link, and the style attribute, which can contain malicious
script

style? Yes, style. CSS definitions can contain executable code.
--cut: http://feedparser.org/docs/html-sanitization.html --

While I understand that such issues could be dealt on the
“client's†side (I'd suggest using Lynx [2] or NoScript [3] for
that matter), it still makes me wonder, is there a more smart
way to manage the things than to entirely disallow a number of
the HTML (XHTML) features?

* References

[1] http://git-scm.com/
[2] http://lynx.isc.org/
[3] http://noscript.net/
 
J

Joe Kesselman

Ivan said:
It seems that one reason behind people implementing various, for
the lack of a better word, “Wiki-like†formats, is the presense
of “security†issues with some XML-based “document†formats.

XML is a shared syntax. It's up to each XML-based language, and the
tools which implement it, to apply whatever security is appropriate.

That said, the claims of security issues in XML appear to be badly
overstated. Yes, some specific tools have specific problems. Other tools
which implement the same standards may not share those problems. Quality
of implementation, rather than XML, appears to be the issue.
• script, which can contain malicious script

Not an issue if the script is appropriately signed/sandboxed.
• applet, embed, and object, which can automatically download and
execute malicious code

Not an issue if the (non-XML) code does not have bugs.
• meta, which can contain malicious redirects

Browsers can flag this for users. You mentioned the NoScript plug-in for
Firefox.
• onload, onunload, and all other on* attributes, which can contain
malicious script

See above re scripts.
• style, link, and the style attribute, which can contain malicious
script

See above re scripts.
style? Yes, style. CSS definitions can contain executable code.

See above re scripts.


"Fixing this on the server end" is standard server security issues; I
don't think XML changes that. If anything, the fact that XML languages
do not have to be as general/uncontrolled as HTML makes servers security
easier.


--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
I

Ivan Shmakov

IS> It seems that one reason behind people implementing various, for
IS> the lack of a better word, “Wiki-like†formats, is the presense of
IS> “security†issues with some XML-based “document†formats.

JK> XML is a shared syntax. It's up to each XML-based language, and the
JK> tools which implement it, to apply whatever security is
JK> appropriate.

That's why I'm interested in the issues of the XML-based
/formats/, not of the XML itself.

The choice of the group to post the question wasn't probably the
most smart one, but I wasn't able to think of a different group
where such a question could be asked.

[...]

JK> Not an issue if the script is appropriately signed/sandboxed.

Regarding digital signatures: this solution just allows
J. R. User to relegate the judgment about the well-behavior of a
script to a trusted party. It doesn't eliminate the possibility
of a malicious code. (As nothing is able to do that.)

With respect to “sandboxingâ€, my own, however poor, perception
of the state of the affairs in this field is that the ways to
constraint the CPU allocation of a sandboxed environment are
still uncommon. Is there an example of a browser which could
determine that all the script in question does is a
sophisticated infinite loop that just slows down the system as
its whole effect, anyway?

JK> Not an issue if the (non-XML) code does not have bugs.

[...]

JK> See above re scripts.

That's the very problem. (Check “The purpose†just below the
question in the original posting.) Namely, if I'm going to
allow the elements listed above on my pages, then, basically,
any user of the site will be able to put any script on any page.

That's why it's disabled in Ikiwiki by default. Unfortunately,
it seems that it impairs the functionality quite broadly. In
particular, this measure prevents the use of SVG, since the code
used to clean HTML knows nothing about it.

JK> "Fixing this on the server end" is standard server security issues;
JK> I don't think XML changes that. If anything, the fact that XML
JK> languages do not have to be as general/uncontrolled as HTML makes
JK> servers security easier.

Do you have a specific language to name?
 
J

Joe Kesselman

Ivan said:
Do you have a specific language to name?

Almost anything other that XHTML. Unconstrainted scripting is _RARE_ in
anything but browsers.

Note that the script isn't a threat to your server if you don't execute
scripts on the server side. It may be a threat between your users, which
is why everyone filters scripts out of input except for a few specially
privileged authors.

It's not an XML problem. It's an HTML and publishing problem. As
publisher, you need to exercise some editorial control. Most WIKIs do
that by supporting only a restricted tagset, often in a non-HTML syntax,
which they either trust they can automatically review or are willing to
review manually (depending on their size/community). Or only permit
contributions containing these from authenticated users, so anyone who
insists on Doing Something Rude can be identified and kicked off the system.

It's really not all that different from someone hitting your system with
a spambot and filling the WIKI with junk. The same solutions -- login
and captcha or similar -- should handle the problem.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
I

Ivan Shmakov

IS> Do you have a specific language to name?

JK> Almost anything other that XHTML. Unconstrainted scripting is
JK> _RARE_ in anything but browsers.

Well, I could probably qualify myself as a “newbie†when it
comes to XML. Any pointers, please?

JK> Note that the script isn't a threat to your server if you don't
JK> execute scripts on the server side. It may be a threat between your
JK> users, which is why everyone filters scripts out of input except
JK> for a few specially privileged authors.

I understand it. However, I found both the “arbitrary code runs
on my server†and “arbitrary … on client's†to be a problem.

JK> It's not an XML problem.

And I've never said it is.

JK> It's an HTML and publishing problem.

Yes. Now, I'm looking for sources where this problem could be
described in some more detail.

And does this problem apply to HTML only? What's about, e. g.,
SVG? or X3D? Couldn't these reference code snippets which the
browsers are to execute just as happily as they do it for HTML?

JK> As publisher, you need to exercise some editorial control. Most
JK> WIKIs do that by supporting only a restricted tagset, often in a
JK> non-HTML syntax, which they either trust they can automatically
JK> review or are willing to review manually (depending on their
JK> size/community). Or only permit contributions containing these from
JK> authenticated users, so anyone who insists on Doing Something Rude
JK> can be identified and kicked off the system.

JK> It's really not all that different from someone hitting your system
JK> with a spambot and filling the WIKI with junk.

Is it? To me, having arbitrary executable code on a wiki is a
completely different problem to that of having arbitrary data.

The reason to differentiate these is the relation of the cause
and effect. E. g., what harm could a deliberately malicious
1 kiB script-free XHTML do to a random browser? Sure, it could
crash particular versions of some browsers, but, eventually, the
users of those will update. Now, think of 1 kiB worth of
JavaScript. To me, the difference seems to be just enormous.

That's why I hope to find a tool that will strip all the traces
of the executable code off an XHTML page, while leaving the rest
reasonably intact. Or, if no such a tool exists, I'd hope to
find some source that could enable me to write it. (Should
everything else fail, I'd resort to grepping the specifications
myself; but I still have a feeling that someone has done it
before, and I somewhat dislike to invent a bicycle.)

JK> The same solutions -- login and captcha or similar -- should handle
JK> the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top