Sanitizing filename strings across platforms

T

Tim Chase

Scenario: a file-name from potentially untrusted sources may have
odd filenames that need to be sanitized for the underlying OS.
On *nix, this generally just means "don't use '/' or \x00 in your
string", while on Win32, there are a host of verboten characters
and file-names. Then there's also checking the abspath/normpath
of the resulting name to make sure it's still in the intended folder.


I've read through [1] and have started to glom together various
bits from that thread. My current course of action is something like

SACRED_WIN32_FNAMES = set(
['CON', 'PRN', 'CLOCK$', 'AUX', 'NUL'] +
['LPT%i' % i for i in range(32)] +
['CON%i' % i for i in range(32)] +

def sanitize_filename(fname):
sane = set(string.letters + string.digits + '-_.[]{}()$')
results = ''.join(c for c in fname if c in sane)
# might have to check sans-extension
if results.upper() in SACRED_WIN32_FNAMES:
results = "_" + results
return results

but if somebody already has war-hardened code they'd be willing
to share, I'd appreciate any thoughts.

Thanks,

-tkc

[1]
http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename-in-python
 
J

Jean-Paul Calderone

Scenario: a file-name from potentially untrusted sources may have
odd filenames that need to be sanitized for the underlying OS.
On *nix, this generally just means "don't use '/' or \x00 in your
string", while on Win32, there are a host of verboten characters
and file-names.  Then there's also checking the abspath/normpath
of the resulting name to make sure it's still in the intended folder.

I've read through [1] and have started to glom together various
bits from that thread.  My current course of action is something like

  SACRED_WIN32_FNAMES = set(
    ['CON', 'PRN', 'CLOCK$', 'AUX', 'NUL'] +
    ['LPT%i' % i for i in range(32)] +
    ['CON%i' % i for i in range(32)] +

  def sanitize_filename(fname):
    sane = set(string.letters + string.digits + '-_.[]{}()$')
    results = ''.join(c for c in fname if c in sane)
    # might have to check sans-extension
    if results.upper() in SACRED_WIN32_FNAMES:
      results = "_" + results
    return results

but if somebody already has war-hardened code they'd be willing
to share, I'd appreciate any thoughts.

There's http://pypi.python.org/pypi/filepath/0.1 (taken from
twisted.python.filepath).

Jean-Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top