suggestions please "what should i watch for/guard against' in a fileupload situation?"

geekbuntu · Oct 6, 2010

in general, what are things i would want to 'watch for/guard against'
in a file upload situation?

i have my file upload working (in the self-made framework @ work
without any concession for multipart form uploads), but was told to
make sure it's cleansed and cannot do any harm inside the system.

my checklist so far is basically to check the extension - ensure it
has 3 places, ensure it's in the allowed list (like jpg gif etc...).

not sure what else i could do to guard against anything bad
happening. maybe the file name itself could cause greif?

not sure but any suggestions or examples are most welcome

Seebs · Oct 6, 2010

in general, what are things i would want to 'watch for/guard against'
in a file upload situation?

This question has virtually nothing to do with Python, which means you
may not get very good answers.

my checklist so far is basically to check the extension - ensure it
has 3 places, ensure it's in the allowed list (like jpg gif etc...).

This strikes me as 100% irrelevant. Who cares what the extension is?

not sure what else i could do to guard against anything bad
happening. maybe the file name itself could cause greif?

Obvious things:

* File name causes files to get created outside some particular
upload directory ("../foo")
* File name has spaces
* Crazy stuff like null bytes in file name
* File names which might break things if a user carelessly interacts
with them, such as "foo.jpg /etc/passwd bar.jpg" (all one file name
including two spaces).

Basically, the key question is, could a hostile user come up with
input to your script which could break something?

-s

Tim Chase · Oct 6, 2010

Obvious things:

* File name causes files to get created outside some particular
upload directory ("../foo")
* File name has spaces
* Crazy stuff like null bytes in file name
* File names which might break things if a user carelessly interacts
with them, such as "foo.jpg /etc/passwd bar.jpg" (all one file name
including two spaces).

And depending on the system, Win32 chokes on filenames like
"nul", "con", "com1"..."comN", "lpt1"..."lptN", and a bunch of
others.

-tkc

Diez B. Roggisch · Oct 6, 2010

Seebs said:
This question has virtually nothing to do with Python, which means you
may not get very good answers.

In contrast to "comp.super.web.experts"? There are quite a few people
with web-experience here I'd say.

This strikes me as 100% irrelevant. Who cares what the extension is?

Given that most people are not computer savvy (always remember, the
default for windows is to hide extensions..), using it client-side can
be valuable to prevent long uploads that eventuall need to be rejected
otherwise (no mom, you can't upload word-docs as profile pictures).

Obvious things:

* File name causes files to get created outside some particular
upload directory ("../foo")

Or rather just store that as a simple meta-info, as allowing even the
best-intended "me-in-cool-pose.jpg" to overwrite that of the one other
cool guy using the website isn't gonna fly anyway.

* File name has spaces

See above, but other then that - everything but shell-scripts deal well
with it.

* Crazy stuff like null bytes in file name
* File names which might break things if a user carelessly interacts
with them, such as "foo.jpg /etc/passwd bar.jpg" (all one file name
including two spaces).

Your strange focus on file-names that are pure meta information is a
little bit concerning...

Basically, the key question is, could a hostile user come up with
input to your script which could break something?

Certainly advice. But that's less focussed on filenames or file-uploads, but
on the whole subject of processing HTTP-requestst. Which would make a
point for *not* using a home-grown framework.

But then, Python is a bit less likely to suffer from buffer overflow or
similar kind of attacks.

Diez

Martin Gregorie · Oct 6, 2010

in general, what are things i would want to 'watch for/guard against' in
a file upload situation?

i have my file upload working (in the self-made framework @ work without
any concession for multipart form uploads), but was told to make sure
it's cleansed and cannot do any harm inside the system.

Off the top of my head, and assuming that you get passed the exact
filename that the user entered:

- The user may need to use an absolute pathname to upload a file
that isn't in his current directory, so retain only the basename
by discarding the rightmost slash and everything to the left of it:
/home/auser/photos/my_photo.jpg ===> my_photo.jpg
c:\My Photos\My Photo.jpg ===> My Photo.jpg

- If your target system doesn't like spaces in names or you want to be
on the safe side there, replace spaces in the name with underscores:
My Photo.jpg ===> My_Photo.jpg

- reject any filenames that could cause the receiving system to do
dangerous things, e.g. .EXE or .SCR if the upload target is Windows.
This list will be different for each upload target, so make it
configurable.

You can't assume anything about else about the extension.
.py .c .txt and .html are all valid in the operating systems I use
and so are their capitalised equivalents.

- check whether the file already exists. You need
rules about what to do if it exists (do you reject the upload,
silently overwrite, or alter the name, e.g. by adding a numeric
suffix to make the name unique:

my_photo.jpg ===> my_photo-01.jpg

- run the application in your upload target directory and put the
uploaded file there or, better, into a configured uploads directory
by prepending it to the file name:

my_photo.jpg ===> /home/upload_user/uploads/my_photo.jpg

- make sure you document the process so that a user can work out
what has happened to his file and why if you have to reject it
or alter its name.

not sure but any suggestions or examples are most welcome

There's probably something I've forgotten, but that list should get you
going.

MRAB · Oct 6, 2010

Off the top of my head, and assuming that you get passed the exact
filename that the user entered:

- The user may need to use an absolute pathname to upload a file
that isn't in his current directory, so retain only the basename
by discarding the rightmost slash and everything to the left of it:
/home/auser/photos/my_photo.jpg ===> my_photo.jpg
c:\My Photos\My Photo.jpg ===> My Photo.jpg

- If your target system doesn't like spaces in names or you want to be
on the safe side there, replace spaces in the name with underscores:
My Photo.jpg ===> My_Photo.jpg

- reject any filenames that could cause the receiving system to do
dangerous things, e.g. .EXE or .SCR if the upload target is Windows.
This list will be different for each upload target, so make it
configurable.

You can't assume anything about else about the extension.
.py .c .txt and .html are all valid in the operating systems I use
and so are their capitalised equivalents.

A whitelist is better than a blacklist; instead of rejecting what you
know could be dangerous, accept what you know _isn't_ dangerous.

- check whether the file already exists. You need
rules about what to do if it exists (do you reject the upload,
silently overwrite, or alter the name, e.g. by adding a numeric
suffix to make the name unique:

my_photo.jpg ===> my_photo-01.jpg

- run the application in your upload target directory and put the
uploaded file there or, better, into a configured uploads directory
by prepending it to the file name:

my_photo.jpg ===> /home/upload_user/uploads/my_photo.jpg

- make sure you document the process so that a user can work out
what has happened to his file and why if you have to reject it
or alter its name.

There's probably something I've forgotten, but that list should get you
going.

Maximum file size, perhaps?

Seebs · Oct 6, 2010

In contrast to "comp.super.web.experts"? There are quite a few people
with web-experience here I'd say.

Oh, certainly. But in general, I try to ask questions in a group focused
on their domain, rather than merely a group likely to contain people who
would for other reasons have the relevant experience. I'm sure that a great
number of Python programmers have experience with sex, that doesn't make
this a great newsgroup for sex tips. (Well, maybe it does.)

Given that most people are not computer savvy (always remember, the
default for windows is to hide extensions..), using it client-side can
be valuable to prevent long uploads that eventuall need to be rejected
otherwise (no mom, you can't upload word-docs as profile pictures).

That's a good point. On the other hand, there's a corollary; you may want
to look at the contents of the file in case they're not really what they're
supposed to be.

Your strange focus on file-names that are pure meta information is a
little bit concerning...

If you're uploading files "into a directory", then it is quite likely that
you're getting file names from somewhere. Untrusted file names are a much
more effective attack vector, in most cases, than EXIF information.

Certainly advice. But that's less focussed on filenames or file-uploads, but
on the whole subject of processing HTTP-requestst. Which would make a
point for *not* using a home-grown framework.

Well, yeah. I was assuming that the home-grown framework was mandatory for
some reason. Possibly a very important reason, such as "otherwise we won't
have written it ourselves".

-s

Terry Reedy · Oct 6, 2010

in general, what are things i would want to 'watch for/guard against'
in a file upload situation?

i have my file upload working (in the self-made framework @ work
without any concession for multipart form uploads), but was told to
make sure it's cleansed and cannot do any harm inside the system.

my checklist so far is basically to check the extension - ensure it
has 3 places, ensure it's in the allowed list (like jpg gif etc...).

not sure what else i could do to guard against anything bad
happening. maybe the file name itself could cause greif?

not sure but any suggestions or examples are most welcome

I am not sure whether anyone mentioned limiting the file size, checking
the incoming header, and aborting an upload if it goes over anyway. Most
sites do not want 10 gigabyte files ;-).

Steven D'Aprano · Oct 6, 2010

in general, what are things i would want to 'watch for/guard against' in
a file upload situation?

i have my file upload working (in the self-made framework @ work without
any concession for multipart form uploads), but was told to make sure
it's cleansed and cannot do any harm inside the system.

Make sure *what* is cleansed? Your code? The uploaded files? Define
"cleansed".

Do you have to block viruses, malware, spybots, illegal pornography,
legal pornography, illegal content, warez, copyright violations, stolen
trade secrets, "dirty" words, pictures of cats?

What operating system are you uploading to?

What happens if somebody tries to upload a 1 TB file to your server?

What happens if they try to upload a billion 1 KB files instead?

my checklist so far is basically to check the extension - ensure it has
3 places, ensure it's in the allowed list (like jpg gif etc...).

Do you have something against file extensions like .gz or .jpeg ?

I'm not sure why you think you need to check the file extension.

not sure what else i could do to guard against anything bad happening.
maybe the file name itself could cause greif?

You think?

What happens if the file name has characters in it that your file system
can't deal with? Bad unicode, binary bytes, slashes, colons, question
marks, asterisks, etc.

What about trying to break out of your file storage area using .. paths?

Without knowing what your file upload code actually does, it's hard to
give specific advice.

Diez B. Roggisch · Oct 6, 2010

Seebs said:
Oh, certainly. But in general, I try to ask questions in a group focused
on their domain, rather than merely a group likely to contain people who
would for other reasons have the relevant experience. I'm sure that a great
number of Python programmers have experience with sex, that doesn't make
this a great newsgroup for sex tips. (Well, maybe it does.)

As the OP asked about a Python web framework (self written or not), I
think all advice that can be given is certainly more related to Python
than to airy references to general web programming such as
"oh, make sure if your server side application environment hasn't any
security issues."

Or, to be more concrete: what NG would you suggest for frameworks or webapps
written in python to ask this question?

That's a good point. On the other hand, there's a corollary; you may want
to look at the contents of the file in case they're not really what they're
supposed to be.

For sure. But the focus of you and others seems to be the file-name,
as if that was anything especially dangerous. Matter of factly, it's a
paramteter to a multipart/form-data encoded request body parameter
definition, and as such has a rather locked-down in terms of
null-bytes and such. So you are pretty safe as long as you

- use standard library request parsing modules such as cgi. If
one instist on reading streams bytewise and using ctypes to poke the
results into memory, you can of course provoke unimaginable havoc..

- don't use the filename for anything but meta-info. And ususally, they
are simply regarded as "nice that you've provided us with it, we try
& make our best to fill an <img alt> attribute with the basename".
But not more. Worth pointing out to the OP to do that. But this is
*not* a matter of mapping HTTP-request paths to directories I'd wager
to say.

Something that is of much more importance (I should have mentioned
earlier, shame on me) is of course file-size. Denying requests that come
with CONTENT_LENGTH over a specified limit, of course respecting
CONTENT_LENGTH and not reading beyond it, and possibly dealing with
chunked-encodings in similarily safe ways (I have to admit I haven't yet
dealt with one of those myself on a visceral level -
but as they are part of the HTTP-spec...) is important,
as otherwise DOS attacks are possible.

If you're uploading files "into a directory", then it is quite likely that
you're getting file names from somewhere. Untrusted file names are a much
more effective attack vector, in most cases, than EXIF information.

The "into a directory" quote coming from where? And given that EXIF
information is probably read by some C-lib, I'd say it is much more
dangerous. This is a gut feeling only, but fed by problems with libpng a
year or two ago.

Well, yeah. I was assuming that the home-grown framework was mandatory for
some reason. Possibly a very important reason, such as "otherwise we won't
have written it ourselves".

In Python, it's usually more along the lines of "well, we kinda started,
and now we have it, and are reluctant to switch."

But of course one never knows...

Diez

Diez B. Roggisch · Oct 6, 2010

Martin Gregorie said:
Off the top of my head, and assuming that you get passed the exact
filename that the user entered:

- The user may need to use an absolute pathname to upload a file
that isn't in his current directory, so retain only the basename
by discarding the rightmost slash and everything to the left of it:
/home/auser/photos/my_photo.jpg ===> my_photo.jpg
c:\My Photos\My Photo.jpg ===> My Photo.jpg

- If your target system doesn't like spaces in names or you want to be
on the safe side there, replace spaces in the name with underscores:
My Photo.jpg ===> My_Photo.jpg

- reject any filenames that could cause the receiving system to do
dangerous things, e.g. .EXE or .SCR if the upload target is Windows.
This list will be different for each upload target, so make it
configurable.

Erm, this assumes that the files are executed in some way. Why should
they? It's perfectly fine to upload *anything*, and of course filenames
mean nothing wrt to the actual file contents ("Are you sure you want to
change the extension of this file?").

It might make no sense for the user, because you can't shon an exe as profile
image. But safe-guarding against that has nothing to do with OS. And
even "safe" file formats such as PNGs have been attack
vectors. Precisely because they are processed client-side in the browser
through some library with security issues.

For serving the files, one could rely on the "file"-command or similar
means to determine the mime-type. So far, I've never done that - as
faking the extension for something else doesn't buy you something unless
there is a documented case of "internet explorer ignoring mime-type, and
executing downloaded file as program".

You can't assume anything about else about the extension.
.py .c .txt and .html are all valid in the operating systems I use
and so are their capitalised equivalents.

- check whether the file already exists. You need
rules about what to do if it exists (do you reject the upload,
silently overwrite, or alter the name, e.g. by adding a numeric
suffix to make the name unique:

my_photo.jpg ===> my_photo-01.jpg

Better, associate the file with the uploader and or it's hash. Use the
name as pure meta-information only.

There's probably something I've forgotten, but that list should get you
going.

Dealing with to large upload requests I'd say is much more important, as
careless reading of streams into memory has at least the potential for a
DOS-attack.

Diez

Lawrence D'Oliveiro · Oct 7, 2010

In message

in general, what are things i would want to 'watch for/guard against'
in a file upload situation?

If you stored the file contents as a blob in a database field, you wouldnâ€™t
have to worry about filename problems.

print header for output	0	Jun 19, 2011
Someone please publish this for me...	0	Jan 7, 2005
How do I check if a file is in use?	9	Sep 8, 2006
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
python-dev Summary for 2004-08-01 through 2004-08-15	17	Aug 24, 2004
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
Formatting a BinaryStream for output to Word doc	0	Jul 6, 2004
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008

suggestions please "what should i watch for/guard against' in a fileupload situation?"

geekbuntu

Seebs

Tim Chase

Diez B. Roggisch

Martin Gregorie

MRAB

Seebs

Terry Reedy

Steven D'Aprano

Diez B. Roggisch

Diez B. Roggisch

Lawrence D'Oliveiro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads