Security implications of using open() on untrusted strings.

r0g · Nov 24, 2008

Hi there,

I'm trying to validate some user input which is for the most part simple
regexery however I would like to check filenames and I would like this
code to be multiplatform.

I had hoped the os module would have a function that would tell me if a
proposed filename would be valid on the host system but it seems not. I
have considered whitelisting but it seems a bit unfair to make the rest
of the world suffer the naming restrictions of windows. Moreover it
seems both inelegant and hard work to research the valid file/directory
naming conventions of every platform that this app could conceivably run
on and write regex's for all of them so...

I'm tempted to go the witch dunking route, stick it in an open() between
a Try: & Except: and see if it floats. However...

Although it's a desktop (not internet facing) app I'm a little squeamish
piping raw user input into a filesystem function like that and this app
will be dealing with some particularly sensitive data so I want to be
careful and minimize exposure where practical.

Has programming PHP and Web stuff for years made me overly paranoid
about this or do I should I still be scrubbing input like this before I
feed it to filesystem functions? If so does anyone know of a module
that may help or have any other advice.

Note: In this particular case the user input is only specifying the name
of a file that will be opened for writing _not_ reading and the
interface is GUI only (wxWidgets).

Regards,

Roger.

Steven D'Aprano · Nov 24, 2008

Hi there,

I'm trying to validate some user input which is for the most part simple
regexery however I would like to check filenames and I would like this
code to be multiplatform.

I had hoped the os module would have a function that would tell me if a
proposed filename would be valid on the host system but it seems not. I
have considered whitelisting but it seems a bit unfair to make the rest
of the world suffer the naming restrictions of windows. Moreover it
seems both inelegant and hard work to research the valid file/directory
naming conventions of every platform that this app could conceivably run
on and write regex's for all of them so...

That's probably why nobody has written a function for the os module to do
the same... and just wait until you get into the murky universe of cross-
platform Unicode filenames.

Honestly, I think your best bet is to just trust the file system to
recognize a bad file name and raise an exception. What counts as a bad
file name is surprisingly hard to define, especially if you want to be
cross-platform. See here for more details:

http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-
filename-in-python

r0g · Nov 24, 2008

Steven said:
That's probably why nobody has written a function for the os module to do
the same... and just wait until you get into the murky universe of cross-
platform Unicode filenames.

Honestly, I think your best bet is to just trust the file system to
recognize a bad file name and raise an exception. What counts as a bad
file name is surprisingly hard to define, especially if you want to be
cross-platform. See here for more details:

http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-
filename-in-python

Yep, I spotted that too which is why white-listing is my fallback plan.
My question is really about the security of using unfiltered data in a
filesystem function though. Are there particualar exploits that could
make use of such unfiltered calls? For example I'd imagine jailbreaking
might be a concern if the app isn't run under it's own restricted user
account. Do others here consider this when designing applications and
what techniques/modules, if any, do you use to sanitize path/filename input?

Roger.

Thomas Bellman · Nov 24, 2008

r0g said:
Although it's a desktop (not internet facing) app I'm a little squeamish
piping raw user input into a filesystem function like that and this app
will be dealing with some particularly sensitive data so I want to be
careful and minimize exposure where practical.

Has programming PHP and Web stuff for years made me overly paranoid
about this or do I should I still be scrubbing input like this before I
feed it to filesystem functions? If so does anyone know of a module
that may help or have any other advice.

Note: In this particular case the user input is only specifying the name
of a file that will be opened for writing _not_ reading and the
interface is GUI only (wxWidgets).

Is the user *running* the application the same as the user who
feeds it input? If it is, then there is no need to filter the
filenames, since that user could just do "rm bad-file" (or "DEL
BAD-FILE" on MS-Windows) anyway to destroy it.

(Of course, if you are passing the filename to, e.g, os.system(),
you would need to quote it properly, but that is to avoid
surprising the user; it is one thing to let the user overwrite a
file named "foo; rm -rf $HOME", quite another to pass that string
unquoted to /bin/sh when the user thought he was just typing a
filename.)

Terry Reedy · Nov 24, 2008

r0g said:
Yep, I spotted that too which is why white-listing is my fallback plan.
My question is really about the security of using unfiltered data in a
filesystem function though. Are there particualar exploits that could
make use of such unfiltered calls?

The classic one would be submitting a filename such as 'a'*1000, but
current OSes should be immune from that sort of thing by now.

For example I'd imagine jailbreaking

Jorgen Grahn · Nov 24, 2008

Hi there,

I'm trying to validate some user input which is for the most part simple
regexery however I would like to check filenames and I would like this
code to be multiplatform.

I had hoped the os module would have a function that would tell me if a
proposed filename would be valid on the host system but it seems not. I
have considered whitelisting but it seems a bit unfair to make the rest
of the world suffer the naming restrictions of windows. Moreover it
seems both inelegant and hard work to research the valid file/directory
naming conventions of every platform that this app could conceivably run
on and write regex's for all of them so...

I'm tempted to go the witch dunking route, stick it in an open() between
a Try: & Except: and see if it floats. However...

Although it's a desktop (not internet facing) app I'm a little squeamish
piping raw user input into a filesystem function like that and this app
will be dealing with some particularly sensitive data so I want to be
careful and minimize exposure where practical.

Take the Unix 'ls' command (or MS-DOS 'dir'). That's two programs
which let users pipe raw input into the filesystem functions, and they
certainly have handled some very sensitive data over the years.

Has programming PHP and Web stuff for years made me overly paranoid
about this [...]

Yes. ;-)

Please explain one thing: what are you looking for? It's not
"accesses a file outside the user's home directory", "accesses an
infinite file like /dev/zero" or something like that, or you would
have said so. Nor seems the "user" input come from some other user
than the one your program is running as, nor from some input source
which the user cannot be held responsible for.

Seems to me you simply want to know beforehand that the reading will
work. But you can never check that! You can stat(2) the file, or
open-and-close it -- and then a microsecond later, someone deletes the
file, or replaces it with another one, or write-protects it, or mounts
a file system on top of its directory, or drops a nuke over the city,
or ...

Two more notes:

- os.open is not like os.system. If os.open ends up doing
anything other than trying to open the file corresponding to the
string you feed it, it's Python's fault, not yours.

Compare with a language (does Perl allow this?) where if the string
is "rm -rf /|", open will run "rm -rf /" and start reading its output.
*That* interface would have been

- if the OS ends up doing something different when calling open(2) or
creat(2) or whatever using that string, it's the OSes fault, not
yours.

Or am I missing something?

/Jorgen

r0g · Nov 25, 2008

Jorgen said:
Hi there,

I'm trying to validate some user input which is for the most part simple
regexery however I would like to check filenames and I would like this
code to be multiplatform.

I had hoped the os module would have a function that would tell me if a
proposed filename would be valid on the host system but it seems not. I
have considered whitelisting but it seems a bit unfair to make the rest
of the world suffer the naming restrictions of windows. Moreover it
seems both inelegant and hard work to research the valid file/directory
naming conventions of every platform that this app could conceivably run
on and write regex's for all of them so...

I'm tempted to go the witch dunking route, stick it in an open() between
a Try: & Except: and see if it floats. However...

Although it's a desktop (not internet facing) app I'm a little squeamish
piping raw user input into a filesystem function like that and this app
will be dealing with some particularly sensitive data so I want to be
careful and minimize exposure where practical.

Click to expand...

Take the Unix 'ls' command (or MS-DOS 'dir'). That's two programs
which let users pipe raw input into the filesystem functions, and they
certainly have handled some very sensitive data over the years.

Has programming PHP and Web stuff for years made me overly paranoid
about this [...]

Click to expand...

Yes. ;-)

Please explain one thing: what are you looking for? It's not
"accesses a file outside the user's home directory", "accesses an
infinite file like /dev/zero" or something like that, or you would
have said so. Nor seems the "user" input come from some other user
than the one your program is running as, nor from some input source
which the user cannot be held responsible for.

Seems to me you simply want to know beforehand that the reading will
work. But you can never check that! You can stat(2) the file, or
open-and-close it -- and then a microsecond later, someone deletes the
file, or replaces it with another one, or write-protects it, or mounts
a file system on top of its directory, or drops a nuke over the city,
or ...

Two more notes:

- os.open is not like os.system. If os.open ends up doing
anything other than trying to open the file corresponding to the
string you feed it, it's Python's fault, not yours.

Compare with a language (does Perl allow this?) where if the string
is "rm -rf /|", open will run "rm -rf /" and start reading its output.
*That* interface would have been

- if the OS ends up doing something different when calling open(2) or
creat(2) or whatever using that string, it's the OSes fault, not
yours.

Or am I missing something?

/Jorgen

No Jorgen, that's exactly what I needed to know i.e. that sending
unfiltered text to open() is not negligent or likely to allow any
badness to occur.

As far as what I was looking for: I was not looking for anything in
particular as I couldn't think of any specific cases where this could be
a problem however... my background is websites (where input sanitization
is rule number one) and some of the web exploits I've learned to
mitigate over the years aren't ones I would have necessarily figured out
for myself i.e. CSRF So I thought I'd ask you guys in case there's
anything I haven't considered that I should consider! Thankfully it
seems I don't have too much to worry about

The only situation where I can forsee potential for mischief is if the
program, or part thereof, is running as a more privileged user than the
user it is accepting input from. Thankfully I don't think that will be
necessary in the prog I'm working on right now as I don't need packet
capture / low numbered ports etc.

Thanks for your answer and thanks to everybody else for all their
comments too.

Roger.

Lawrence D'Oliveiro · Nov 25, 2008

Jorgen said:
Seems to me you simply want to know beforehand that the reading will
work. But you can never check that! You can stat(2) the file, or
open-and-close it -- and then a microsecond later, someone deletes the
file, or replaces it with another one, or write-protects it, or mounts
a file system on top of its directory, or drops a nuke over the city,
or ...

Depends on what exactly you're trying to guard against. Your comments would apply, for example, to a set-uid program being run by a potentially hostile local user (except that Linux doesn't allow set-uid scripts).

But if the script is being run, for example, via a Web interface, where processes on the local system can be trusted but the remote user cannot, then it is perfectly legitimate to use calls like stat(2) to enforce your own permission checks before allowing an operation.

Jorgen Grahn · Nov 25, 2008

Depends on what exactly you're trying to guard against. Your
comments would apply, for example, to a set-uid program being run by a
potentially hostile local user

Yeah, I know. I covered that in the part you snipped: "Nor seems the
'user' input come from some other user than the one your program is
running as, nor from some input source which the user cannot be held
responsible for."

/Jorgen

Jorgen Grahn · Nov 25, 2008

Jorgen Grahn wrote: ....

No Jorgen, that's exactly what I needed to know i.e. that sending
unfiltered text to open() is not negligent or likely to allow any
badness to occur.

As far as what I was looking for: I was not looking for anything in
particular as I couldn't think of any specific cases where this could be
a problem however... my background is websites (where input sanitization
is rule number one) and some of the web exploits I've learned to
mitigate over the years aren't ones I would have necessarily figured out
for myself i.e. CSRF

I have no idea what CSRF is, but I know what you mean. And it applies
in the safe and cozy Unix account world too -- that the exploits are
surprising, I mean. Maybe I made it out to be *too* safe in my
previous posting. But still ...

So I thought I'd ask you guys in case there's
anything I haven't considered that I should consider! Thankfully it
seems I don't have too much to worry about

.... no, in this case you're just doing what everybody else does,
and you have no alternative plan (filter for what?)

There ought to be some list "common attacks on applications run by
local Unix users" which one could learn from. Maybe it's not obvious
that the content of a local file should, in many situations, be
handled as untrusted. In the meantime, there's things like this:

http://www.debian.org/security/2008/

Many of them are local exploits.

/Jorgen

News123 · Nov 25, 2008

Jorgen said:
Compare with a language (does Perl allow this?) where if the string
is "rm -rf /|", open will run "rm -rf /" and start reading its output.
*That* interface would have been

Good example. (for perl):

The problem doesn't exist in python
open("rm -rf / |") would try to open a file with exactly that name and
it would fail if it doesn't exist.

In perl the perl script author has the choice to be safe (three argument
open) or to allow stupid or nice things with a two argument open.

In perl:
open($fh,"rm -rf / |") would execute the command "rm -rf /" and pass
it's output to perl

In perl:
open($fh,"rm -rf / |","<") would work as in python

The only similiar pitfall for pyhon would be popen() in a context like
filename=userinput()
p = os.popen("md5sum "+f)
here you would have unexpected behavior if filename were something like
"bla ; rm -rf /"

Sometimes I miss the 'dangerous variation' in python and I explicitely
add code in python that the filename '-' will be treated as stdin for
files to be read and as stdout for files to be written to

bye N

Jorgen Grahn · Nov 26, 2008

Good example. (for perl):

I should actually have removed that paragraph from my posting.
I was about to write "*That* interface would have been dangerous!" but
then I thought "Hm, isn't the user supposed to be in control of that
string, and isn't it his fault if he enters '-rm -rf |', just as if
he entered the name of his most valuable file?"

I don't know ...

The problem doesn't exist in python
open("rm -rf / |") would try to open a file with exactly that name and
it would fail if it doesn't exist.

In perl the perl script author has the choice to be safe (three argument
open) or to allow stupid or nice things with a two argument open.
....

Sometimes I miss the 'dangerous variation' in python and I explicitely
add code in python that the filename '-' will be treated as stdin for
files to be read and as stdout for files to be written to

That's something I frequently do, too. And I see no harm in it, if I
document it and people expect it (for those who don't know, reserving
'-' for this is a Unix tradition).

/Jorgen

Strings and using quotes	1	Dec 5, 2022
Executing untrusted code	6	Aug 7, 2009
Restricted Execution of untrusted code	7	Oct 30, 2008
Find and count strings of text from multiple files	17	Dec 16, 2021
Reversing output of user input by using while loop...	2	Sep 1, 2022
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Untrusted python code	1	Sep 23, 2007
Password validation security issue	19	Mar 1, 2014

Security implications of using open() on untrusted strings.

r0g

Steven D'Aprano

r0g

Thomas Bellman

Terry Reedy

Jorgen Grahn

r0g

Lawrence D'Oliveiro

Jorgen Grahn

Jorgen Grahn

News123

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads