security vs. XML-based formats

Discussion in 'XML' started by Ivan Shmakov, Mar 8, 2010.

  1. Ivan Shmakov

    Ivan Shmakov Guest

    It seems that one reason behind people implementing various, for
    the lack of a better word, “Wiki-like†formats, is the presense
    of “security†issues with some XML-based “document†formats.

    Shortly put, the question is: are those issues (at least the
    ones related to XHTML) documented somewhere?

    Here, by “security†issues I mean, mostly, the ability to spawn
    fairly arbitrary code execution on the “client's†host. (Note
    that even if confined to some language's interpreter, such as
    the JavaScript one, the program will still, most of the time,
    will be able to eat CPU resources in a hard to control fashion.)

    * The purpose

    The overall purpose is to establish some kind of a collaborative
    environment (say, a Wiki) for me and my occasional collaborators
    to use. So far, I've set up an Ikiwiki instance as the base for
    such an environment (mostly because of its support for Git [1],
    which I've found convenient to use, as the storage backend.)

    However, Ikiwiki is somewhat restrictive about the features that
    can be used on the pages. E. g.:

    --cut: http://ikiwiki.info/plugins/htmlscrubber/ --
    The web's security model is fundamentally broken; ikiwiki's html
    sanitisation is only a patch on the underlying gaping hole that is
    your web browser.
    --cut: http://ikiwiki.info/plugins/htmlscrubber/ --

    The list of the possibly dangerous tags in the default Ikiwiki
    configuration apparently follows the considerations below.

    --cut: http://feedparser.org/docs/html-sanitization.html --
    Here is an incomplete list of potentially dangerous HTML tags and
    attributes:

    • script, which can contain malicious script

    • applet, embed, and object, which can automatically download and
    execute malicious code

    • meta, which can contain malicious redirects

    • onload, onunload, and all other on* attributes, which can contain
    malicious script

    • style, link, and the style attribute, which can contain malicious
    script

    style? Yes, style. CSS definitions can contain executable code.
    --cut: http://feedparser.org/docs/html-sanitization.html --

    While I understand that such issues could be dealt on the
    “client's†side (I'd suggest using Lynx [2] or NoScript [3] for
    that matter), it still makes me wonder, is there a more smart
    way to manage the things than to entirely disallow a number of
    the HTML (XHTML) features?

    * References

    [1] http://git-scm.com/
    [2] http://lynx.isc.org/
    [3] http://noscript.net/

    --
    FSF associate member #7257
     
    Ivan Shmakov, Mar 8, 2010
    #1
    1. Advertising

  2. Ivan Shmakov wrote:
    > It seems that one reason behind people implementing various, for
    > the lack of a better word, “Wiki-like†formats, is the presense
    > of “security†issues with some XML-based “document†formats.


    XML is a shared syntax. It's up to each XML-based language, and the
    tools which implement it, to apply whatever security is appropriate.

    That said, the claims of security issues in XML appear to be badly
    overstated. Yes, some specific tools have specific problems. Other tools
    which implement the same standards may not share those problems. Quality
    of implementation, rather than XML, appears to be the issue.

    > • script, which can contain malicious script


    Not an issue if the script is appropriately signed/sandboxed.

    > • applet, embed, and object, which can automatically download and
    > execute malicious code


    Not an issue if the (non-XML) code does not have bugs.

    > • meta, which can contain malicious redirects


    Browsers can flag this for users. You mentioned the NoScript plug-in for
    Firefox.

    > • onload, onunload, and all other on* attributes, which can contain
    > malicious script


    See above re scripts.

    > • style, link, and the style attribute, which can contain malicious
    > script


    See above re scripts.

    > style? Yes, style. CSS definitions can contain executable code.


    See above re scripts.


    "Fixing this on the server end" is standard server security issues; I
    don't think XML changes that. If anything, the fact that XML languages
    do not have to be as general/uncontrolled as HTML makes servers security
    easier.


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Mar 8, 2010
    #2
    1. Advertising

  3. Ivan Shmakov

    Ivan Shmakov Guest

    >>>>> "JK" == Joe Kesselman <> writes:
    >>>>> "IS" == Ivan Shmakov wrote:


    IS> It seems that one reason behind people implementing various, for
    IS> the lack of a better word, “Wiki-like†formats, is the presense of
    IS> “security†issues with some XML-based “document†formats.

    JK> XML is a shared syntax. It's up to each XML-based language, and the
    JK> tools which implement it, to apply whatever security is
    JK> appropriate.

    That's why I'm interested in the issues of the XML-based
    /formats/, not of the XML itself.

    The choice of the group to post the question wasn't probably the
    most smart one, but I wasn't able to think of a different group
    where such a question could be asked.

    [...]

    >> • script, which can contain malicious script


    JK> Not an issue if the script is appropriately signed/sandboxed.

    Regarding digital signatures: this solution just allows
    J. R. User to relegate the judgment about the well-behavior of a
    script to a trusted party. It doesn't eliminate the possibility
    of a malicious code. (As nothing is able to do that.)

    With respect to “sandboxingâ€, my own, however poor, perception
    of the state of the affairs in this field is that the ways to
    constraint the CPU allocation of a sandboxed environment are
    still uncommon. Is there an example of a browser which could
    determine that all the script in question does is a
    sophisticated infinite loop that just slows down the system as
    its whole effect, anyway?

    >> • applet, embed, and object, which can automatically download and
    >> execute malicious code


    JK> Not an issue if the (non-XML) code does not have bugs.

    [...]

    >> style? Yes, style. CSS definitions can contain executable code.


    JK> See above re scripts.

    That's the very problem. (Check “The purpose†just below the
    question in the original posting.) Namely, if I'm going to
    allow the elements listed above on my pages, then, basically,
    any user of the site will be able to put any script on any page.

    That's why it's disabled in Ikiwiki by default. Unfortunately,
    it seems that it impairs the functionality quite broadly. In
    particular, this measure prevents the use of SVG, since the code
    used to clean HTML knows nothing about it.

    JK> "Fixing this on the server end" is standard server security issues;
    JK> I don't think XML changes that. If anything, the fact that XML
    JK> languages do not have to be as general/uncontrolled as HTML makes
    JK> servers security easier.

    Do you have a specific language to name?

    --
    FSF associate member #7257
     
    Ivan Shmakov, Mar 8, 2010
    #3
  4. Ivan Shmakov wrote:
    > Do you have a specific language to name?


    Almost anything other that XHTML. Unconstrainted scripting is _RARE_ in
    anything but browsers.

    Note that the script isn't a threat to your server if you don't execute
    scripts on the server side. It may be a threat between your users, which
    is why everyone filters scripts out of input except for a few specially
    privileged authors.

    It's not an XML problem. It's an HTML and publishing problem. As
    publisher, you need to exercise some editorial control. Most WIKIs do
    that by supporting only a restricted tagset, often in a non-HTML syntax,
    which they either trust they can automatically review or are willing to
    review manually (depending on their size/community). Or only permit
    contributions containing these from authenticated users, so anyone who
    insists on Doing Something Rude can be identified and kicked off the system.

    It's really not all that different from someone hitting your system with
    a spambot and filling the WIKI with junk. The same solutions -- login
    and captcha or similar -- should handle the problem.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Mar 8, 2010
    #4
  5. Ivan Shmakov

    Ivan Shmakov Guest

    >>>>> "JK" == Joe Kesselman <> writes:
    >>>>> "IS" == Ivan Shmakov wrote:


    IS> Do you have a specific language to name?

    JK> Almost anything other that XHTML. Unconstrainted scripting is
    JK> _RARE_ in anything but browsers.

    Well, I could probably qualify myself as a “newbie†when it
    comes to XML. Any pointers, please?

    JK> Note that the script isn't a threat to your server if you don't
    JK> execute scripts on the server side. It may be a threat between your
    JK> users, which is why everyone filters scripts out of input except
    JK> for a few specially privileged authors.

    I understand it. However, I found both the “arbitrary code runs
    on my server†and “arbitrary … on client's†to be a problem.

    JK> It's not an XML problem.

    And I've never said it is.

    JK> It's an HTML and publishing problem.

    Yes. Now, I'm looking for sources where this problem could be
    described in some more detail.

    And does this problem apply to HTML only? What's about, e. g.,
    SVG? or X3D? Couldn't these reference code snippets which the
    browsers are to execute just as happily as they do it for HTML?

    JK> As publisher, you need to exercise some editorial control. Most
    JK> WIKIs do that by supporting only a restricted tagset, often in a
    JK> non-HTML syntax, which they either trust they can automatically
    JK> review or are willing to review manually (depending on their
    JK> size/community). Or only permit contributions containing these from
    JK> authenticated users, so anyone who insists on Doing Something Rude
    JK> can be identified and kicked off the system.

    JK> It's really not all that different from someone hitting your system
    JK> with a spambot and filling the WIKI with junk.

    Is it? To me, having arbitrary executable code on a wiki is a
    completely different problem to that of having arbitrary data.

    The reason to differentiate these is the relation of the cause
    and effect. E. g., what harm could a deliberately malicious
    1 kiB script-free XHTML do to a random browser? Sure, it could
    crash particular versions of some browsers, but, eventually, the
    users of those will update. Now, think of 1 kiB worth of
    JavaScript. To me, the difference seems to be just enormous.

    That's why I hope to find a tool that will strip all the traces
    of the executable code off an XHTML page, while leaving the rest
    reasonably intact. Or, if no such a tool exists, I'd hope to
    find some source that could enable me to write it. (Should
    everything else fail, I'd resort to grepping the specifications
    myself; but I still have a feeling that someone has done it
    before, and I somewhat dislike to invent a bicycle.)

    JK> The same solutions -- login and captcha or similar -- should handle
    JK> the problem.

    --
    FSF associate member #7257
     
    Ivan Shmakov, Mar 8, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sven Steinacker
    Replies:
    1
    Views:
    742
    Johannes Koch
    Oct 27, 2003
  2. Sven Steinacker
    Replies:
    0
    Views:
    697
    Sven Steinacker
    Oct 27, 2003
  3. Jim Mitten
    Replies:
    3
    Views:
    441
    Joseph Kesselman
    Jan 29, 2008
  4. Ivan Shmakov
    Replies:
    3
    Views:
    1,125
    Joe Kesselman
    May 2, 2010
  5. Kursat
    Replies:
    1
    Views:
    339
    Dominick Baier
    May 7, 2007
Loading...

Share This Page