Security implications of using open() on untrusted strings.

Discussion in 'Python' started by r0g, Nov 24, 2008.

  1. r0g

    r0g Guest

    Hi there,

    I'm trying to validate some user input which is for the most part simple
    regexery however I would like to check filenames and I would like this
    code to be multiplatform.

    I had hoped the os module would have a function that would tell me if a
    proposed filename would be valid on the host system but it seems not. I
    have considered whitelisting but it seems a bit unfair to make the rest
    of the world suffer the naming restrictions of windows. Moreover it
    seems both inelegant and hard work to research the valid file/directory
    naming conventions of every platform that this app could conceivably run
    on and write regex's for all of them so...

    I'm tempted to go the witch dunking route, stick it in an open() between
    a Try: & Except: and see if it floats. However...

    Although it's a desktop (not internet facing) app I'm a little squeamish
    piping raw user input into a filesystem function like that and this app
    will be dealing with some particularly sensitive data so I want to be
    careful and minimize exposure where practical.

    Has programming PHP and Web stuff for years made me overly paranoid
    about this or do I should I still be scrubbing input like this before I
    feed it to filesystem functions? If so does anyone know of a module
    that may help or have any other advice.

    Note: In this particular case the user input is only specifying the name
    of a file that will be opened for writing _not_ reading and the
    interface is GUI only (wxWidgets).

    Regards,

    Roger.
     
    r0g, Nov 24, 2008
    #1
    1. Advertising

  2. On Mon, 24 Nov 2008 00:44:45 -0500, r0g wrote:

    > Hi there,
    >
    > I'm trying to validate some user input which is for the most part simple
    > regexery however I would like to check filenames and I would like this
    > code to be multiplatform.
    >
    > I had hoped the os module would have a function that would tell me if a
    > proposed filename would be valid on the host system but it seems not. I
    > have considered whitelisting but it seems a bit unfair to make the rest
    > of the world suffer the naming restrictions of windows. Moreover it
    > seems both inelegant and hard work to research the valid file/directory
    > naming conventions of every platform that this app could conceivably run
    > on and write regex's for all of them so...


    That's probably why nobody has written a function for the os module to do
    the same... and just wait until you get into the murky universe of cross-
    platform Unicode filenames.

    Honestly, I think your best bet is to just trust the file system to
    recognize a bad file name and raise an exception. What counts as a bad
    file name is surprisingly hard to define, especially if you want to be
    cross-platform. See here for more details:


    http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-
    filename-in-python


    --
    Steven
     
    Steven D'Aprano, Nov 24, 2008
    #2
    1. Advertising

  3. r0g

    r0g Guest

    Steven D'Aprano wrote:
    > On Mon, 24 Nov 2008 00:44:45 -0500, r0g wrote:
    >
    >> Hi there,
    >>
    >> I'm trying to validate some user input which is for the most part simple
    >> regexery however I would like to check filenames and I would like this
    >> code to be multiplatform.
    >>
    >> I had hoped the os module would have a function that would tell me if a
    >> proposed filename would be valid on the host system but it seems not. I
    >> have considered whitelisting but it seems a bit unfair to make the rest
    >> of the world suffer the naming restrictions of windows. Moreover it
    >> seems both inelegant and hard work to research the valid file/directory
    >> naming conventions of every platform that this app could conceivably run
    >> on and write regex's for all of them so...

    >
    > That's probably why nobody has written a function for the os module to do
    > the same... and just wait until you get into the murky universe of cross-
    > platform Unicode filenames.
    >
    > Honestly, I think your best bet is to just trust the file system to
    > recognize a bad file name and raise an exception. What counts as a bad
    > file name is surprisingly hard to define, especially if you want to be
    > cross-platform. See here for more details:
    >
    >
    > http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-
    > filename-in-python
    >
    >


    Yep, I spotted that too which is why white-listing is my fallback plan.
    My question is really about the security of using unfiltered data in a
    filesystem function though. Are there particualar exploits that could
    make use of such unfiltered calls? For example I'd imagine jailbreaking
    might be a concern if the app isn't run under it's own restricted user
    account. Do others here consider this when designing applications and
    what techniques/modules, if any, do you use to sanitize path/filename input?

    Roger.
     
    r0g, Nov 24, 2008
    #3
  4. r0g <> wrote:

    > Although it's a desktop (not internet facing) app I'm a little squeamish
    > piping raw user input into a filesystem function like that and this app
    > will be dealing with some particularly sensitive data so I want to be
    > careful and minimize exposure where practical.


    > Has programming PHP and Web stuff for years made me overly paranoid
    > about this or do I should I still be scrubbing input like this before I
    > feed it to filesystem functions? If so does anyone know of a module
    > that may help or have any other advice.


    > Note: In this particular case the user input is only specifying the name
    > of a file that will be opened for writing _not_ reading and the
    > interface is GUI only (wxWidgets).


    Is the user *running* the application the same as the user who
    feeds it input? If it is, then there is no need to filter the
    filenames, since that user could just do "rm bad-file" (or "DEL
    BAD-FILE" on MS-Windows) anyway to destroy it.

    (Of course, if you are passing the filename to, e.g, os.system(),
    you would need to quote it properly, but that is to avoid
    surprising the user; it is one thing to let the user overwrite a
    file named "foo; rm -rf $HOME", quite another to pass that string
    unquoted to /bin/sh when the user thought he was just typing a
    filename.)


    --
    Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
    "I don't think [that word] means what you ! bellman @ lysator.liu.se
    think it means." -- The Princess Bride ! Make Love -- Nicht Wahr!
     
    Thomas Bellman, Nov 24, 2008
    #4
  5. r0g

    Terry Reedy Guest

    r0g wrote:

    > Yep, I spotted that too which is why white-listing is my fallback plan.
    > My question is really about the security of using unfiltered data in a
    > filesystem function though. Are there particualar exploits that could
    > make use of such unfiltered calls?


    The classic one would be submitting a filename such as 'a'*1000, but
    current OSes should be immune from that sort of thing by now.


    For example I'd imagine jailbreaking
    > might be a concern if the app isn't run under it's own restricted user
    > account. Do others here consider this when designing applications and
    > what techniques/modules, if any, do you use to sanitize path/filename input?
     
    Terry Reedy, Nov 24, 2008
    #5
  6. r0g

    Jorgen Grahn Guest

    On Mon, 24 Nov 2008 00:44:45 -0500, r0g <> wrote:
    > Hi there,
    >
    > I'm trying to validate some user input which is for the most part simple
    > regexery however I would like to check filenames and I would like this
    > code to be multiplatform.
    >
    > I had hoped the os module would have a function that would tell me if a
    > proposed filename would be valid on the host system but it seems not. I
    > have considered whitelisting but it seems a bit unfair to make the rest
    > of the world suffer the naming restrictions of windows. Moreover it
    > seems both inelegant and hard work to research the valid file/directory
    > naming conventions of every platform that this app could conceivably run
    > on and write regex's for all of them so...
    >
    > I'm tempted to go the witch dunking route, stick it in an open() between
    > a Try: & Except: and see if it floats. However...
    >
    > Although it's a desktop (not internet facing) app I'm a little squeamish
    > piping raw user input into a filesystem function like that and this app
    > will be dealing with some particularly sensitive data so I want to be
    > careful and minimize exposure where practical.


    Take the Unix 'ls' command (or MS-DOS 'dir'). That's two programs
    which let users pipe raw input into the filesystem functions, and they
    certainly have handled some very sensitive data over the years.

    > Has programming PHP and Web stuff for years made me overly paranoid
    > about this [...]


    Yes. ;-)

    Please explain one thing: what are you looking for? It's not
    "accesses a file outside the user's home directory", "accesses an
    infinite file like /dev/zero" or something like that, or you would
    have said so. Nor seems the "user" input come from some other user
    than the one your program is running as, nor from some input source
    which the user cannot be held responsible for.

    Seems to me you simply want to know beforehand that the reading will
    work. But you can never check that! You can stat(2) the file, or
    open-and-close it -- and then a microsecond later, someone deletes the
    file, or replaces it with another one, or write-protects it, or mounts
    a file system on top of its directory, or drops a nuke over the city,
    or ...

    Two more notes:

    - os.open is not like os.system. If os.open ends up doing
    anything other than trying to open the file corresponding to the
    string you feed it, it's Python's fault, not yours.

    Compare with a language (does Perl allow this?) where if the string
    is "rm -rf /|", open will run "rm -rf /" and start reading its output.
    *That* interface would have been

    - if the OS ends up doing something different when calling open(2) or
    creat(2) or whatever using that string, it's the OSes fault, not
    yours.

    Or am I missing something?

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
    \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
     
    Jorgen Grahn, Nov 24, 2008
    #6
  7. r0g

    r0g Guest

    Jorgen Grahn wrote:
    > On Mon, 24 Nov 2008 00:44:45 -0500, r0g <> wrote:
    >> Hi there,
    >>
    >> I'm trying to validate some user input which is for the most part simple
    >> regexery however I would like to check filenames and I would like this
    >> code to be multiplatform.
    >>
    >> I had hoped the os module would have a function that would tell me if a
    >> proposed filename would be valid on the host system but it seems not. I
    >> have considered whitelisting but it seems a bit unfair to make the rest
    >> of the world suffer the naming restrictions of windows. Moreover it
    >> seems both inelegant and hard work to research the valid file/directory
    >> naming conventions of every platform that this app could conceivably run
    >> on and write regex's for all of them so...
    >>
    >> I'm tempted to go the witch dunking route, stick it in an open() between
    >> a Try: & Except: and see if it floats. However...
    >>
    >> Although it's a desktop (not internet facing) app I'm a little squeamish
    >> piping raw user input into a filesystem function like that and this app
    >> will be dealing with some particularly sensitive data so I want to be
    >> careful and minimize exposure where practical.

    >
    > Take the Unix 'ls' command (or MS-DOS 'dir'). That's two programs
    > which let users pipe raw input into the filesystem functions, and they
    > certainly have handled some very sensitive data over the years.
    >
    >> Has programming PHP and Web stuff for years made me overly paranoid
    >> about this [...]

    >
    > Yes. ;-)
    >
    > Please explain one thing: what are you looking for? It's not
    > "accesses a file outside the user's home directory", "accesses an
    > infinite file like /dev/zero" or something like that, or you would
    > have said so. Nor seems the "user" input come from some other user
    > than the one your program is running as, nor from some input source
    > which the user cannot be held responsible for.
    >
    > Seems to me you simply want to know beforehand that the reading will
    > work. But you can never check that! You can stat(2) the file, or
    > open-and-close it -- and then a microsecond later, someone deletes the
    > file, or replaces it with another one, or write-protects it, or mounts
    > a file system on top of its directory, or drops a nuke over the city,
    > or ...
    >
    > Two more notes:
    >
    > - os.open is not like os.system. If os.open ends up doing
    > anything other than trying to open the file corresponding to the
    > string you feed it, it's Python's fault, not yours.
    >
    > Compare with a language (does Perl allow this?) where if the string
    > is "rm -rf /|", open will run "rm -rf /" and start reading its output.
    > *That* interface would have been
    >
    > - if the OS ends up doing something different when calling open(2) or
    > creat(2) or whatever using that string, it's the OSes fault, not
    > yours.
    >
    > Or am I missing something?
    >
    > /Jorgen
    >


    No Jorgen, that's exactly what I needed to know i.e. that sending
    unfiltered text to open() is not negligent or likely to allow any
    badness to occur.

    As far as what I was looking for: I was not looking for anything in
    particular as I couldn't think of any specific cases where this could be
    a problem however... my background is websites (where input sanitization
    is rule number one) and some of the web exploits I've learned to
    mitigate over the years aren't ones I would have necessarily figured out
    for myself i.e. CSRF So I thought I'd ask you guys in case there's
    anything I haven't considered that I should consider! Thankfully it
    seems I don't have too much to worry about :)

    The only situation where I can forsee potential for mischief is if the
    program, or part thereof, is running as a more privileged user than the
    user it is accepting input from. Thankfully I don't think that will be
    necessary in the prog I'm working on right now as I don't need packet
    capture / low numbered ports etc.

    Thanks for your answer and thanks to everybody else for all their
    comments too.

    Roger.
     
    r0g, Nov 25, 2008
    #7
  8. Jorgen Grahn wrote:

    > Seems to me you simply want to know beforehand that the reading will
    > work. But you can never check that! You can stat(2) the file, or
    > open-and-close it -- and then a microsecond later, someone deletes the
    > file, or replaces it with another one, or write-protects it, or mounts
    > a file system on top of its directory, or drops a nuke over the city,
    > or ...


    Depends on what exactly you're trying to guard against. Your comments would apply, for example, to a set-uid program being run by a potentially hostile local user (except that Linux doesn't allow set-uid scripts).

    But if the script is being run, for example, via a Web interface, where processes on the local system can be trusted but the remote user cannot, then it is perfectly legitimate to use calls like stat(2) to enforce your own permission checks before allowing an operation.
     
    Lawrence D'Oliveiro, Nov 25, 2008
    #8
  9. r0g

    Jorgen Grahn Guest

    On Tue, 25 Nov 2008 20:40:57 +1300, Lawrence D'Oliveiro <_zealand> wrote:
    > Jorgen Grahn wrote:
    >
    >> Seems to me you simply want to know beforehand that the reading will
    >> work. But you can never check that! You can stat(2) the file, or
    >> open-and-close it -- and then a microsecond later, someone deletes the
    >> file, or replaces it with another one, or write-protects it, or mounts
    >> a file system on top of its directory, or drops a nuke over the city,
    >> or ...

    >


    > Depends on what exactly you're trying to guard against. Your
    > comments would apply, for example, to a set-uid program being run by a
    > potentially hostile local user


    Yeah, I know. I covered that in the part you snipped: "Nor seems the
    'user' input come from some other user than the one your program is
    running as, nor from some input source which the user cannot be held
    responsible for."

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
    \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
     
    Jorgen Grahn, Nov 25, 2008
    #9
  10. r0g

    Jorgen Grahn Guest

    On Tue, 25 Nov 2008 02:26:32 -0500, r0g <> wrote:
    > Jorgen Grahn wrote:

    ....
    >> Or am I missing something?


    > No Jorgen, that's exactly what I needed to know i.e. that sending
    > unfiltered text to open() is not negligent or likely to allow any
    > badness to occur.
    >
    > As far as what I was looking for: I was not looking for anything in
    > particular as I couldn't think of any specific cases where this could be
    > a problem however... my background is websites (where input sanitization
    > is rule number one) and some of the web exploits I've learned to
    > mitigate over the years aren't ones I would have necessarily figured out
    > for myself i.e. CSRF


    I have no idea what CSRF is, but I know what you mean. And it applies
    in the safe and cozy Unix account world too -- that the exploits are
    surprising, I mean. Maybe I made it out to be *too* safe in my
    previous posting. But still ...

    > So I thought I'd ask you guys in case there's
    > anything I haven't considered that I should consider! Thankfully it
    > seems I don't have too much to worry about :)


    .... no, in this case you're just doing what everybody else does,
    and you have no alternative plan (filter for what?)

    There ought to be some list "common attacks on applications run by
    local Unix users" which one could learn from. Maybe it's not obvious
    that the content of a local file should, in many situations, be
    handled as untrusted. In the meantime, there's things like this:

    http://www.debian.org/security/2008/

    Many of them are local exploits.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
    \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
     
    Jorgen Grahn, Nov 25, 2008
    #10
  11. r0g

    News123 Guest

    Jorgen Grahn wrote:
    > Compare with a language (does Perl allow this?) where if the string
    > is "rm -rf /|", open will run "rm -rf /" and start reading its output.
    > *That* interface would have been


    Good example. (for perl):

    The problem doesn't exist in python
    open("rm -rf / |") would try to open a file with exactly that name and
    it would fail if it doesn't exist.

    In perl the perl script author has the choice to be safe (three argument
    open) or to allow stupid or nice things with a two argument open.

    In perl:
    open($fh,"rm -rf / |") would execute the command "rm -rf /" and pass
    it's output to perl

    In perl:
    open($fh,"rm -rf / |","<") would work as in python


    The only similiar pitfall for pyhon would be popen() in a context like
    filename=userinput()
    p = os.popen("md5sum "+f)
    here you would have unexpected behavior if filename were something like
    "bla ; rm -rf /"


    Sometimes I miss the 'dangerous variation' in python and I explicitely
    add code in python that the filename '-' will be treated as stdin for
    files to be read and as stdout for files to be written to

    bye N
     
    News123, Nov 25, 2008
    #11
  12. r0g

    Jorgen Grahn Guest

    On Tue, 25 Nov 2008 23:37:25 +0100, News123 <> wrote:
    > Jorgen Grahn wrote:
    >> Compare with a language (does Perl allow this?) where if the string
    >> is "rm -rf /|", open will run "rm -rf /" and start reading its output.
    >> *That* interface would have been


    > Good example. (for perl):


    I should actually have removed that paragraph from my posting.
    I was about to write "*That* interface would have been dangerous!" but
    then I thought "Hm, isn't the user supposed to be in control of that
    string, and isn't it his fault if he enters '-rm -rf |', just as if
    he entered the name of his most valuable file?"

    I don't know ...

    > The problem doesn't exist in python
    > open("rm -rf / |") would try to open a file with exactly that name and
    > it would fail if it doesn't exist.
    >
    > In perl the perl script author has the choice to be safe (three argument
    > open) or to allow stupid or nice things with a two argument open.


    ....

    > Sometimes I miss the 'dangerous variation' in python and I explicitely
    > add code in python that the filename '-' will be treated as stdin for
    > files to be read and as stdout for files to be written to


    That's something I frequently do, too. And I see no harm in it, if I
    document it and people expect it (for those who don't know, reserving
    '-' for this is a Unix tradition).

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
    \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
     
    Jorgen Grahn, Nov 26, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Logu

    Re;Untrusted Webserver

    Logu, Jul 23, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    446
  2. rfractal30
    Replies:
    0
    Views:
    2,356
    rfractal30
    Apr 11, 2005
  3. Lane Friesen
    Replies:
    0
    Views:
    464
    Lane Friesen
    Feb 24, 2005
  4. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    813
    Malcolm
    Jun 24, 2006
  5. GreenLight
    Replies:
    3
    Views:
    212
    Anno Siegel
    May 1, 2004
Loading...

Share This Page