Which characters are really unsafe to use in Linux filenames (from Perl)?

C

Craig Manley

Hi,

From testing (using Perl + Slackware Linux) I've found that the only
characters I can't use in a directory/file name are the 0 byte and path
seperator /. Below is my test script and function that makes tainted strings
safe to use as directory/file names. Because a mistake or misassumption here
can open a huge security hole I'ld like to know if this is really correct in
the opinions of others and if this idea is valid for all *nix variants. My
goal is to create a filename validator for html form uploaded file names
that is as unrestrictive as possible (yet safe).

Another question for those of you know much about MSWin32: which characters
can't be used in a MSWin32 directory/filename (I think it's much more than
Linux)?

Another question: are these single byte character file systems?

-Craig Manley.

#!/usr/bin/perl -w
use strict;
use bytes;

sub safe {
my $s = shift;
# replace path seperators
$s =~ s|/|_|g;
# replace 0 bytes.
$s =~ s|\000|_|g;
# keep length <= 255 characters
return substr($s,0,255);
}

my $backslash = '\\';

# these all work
#mkdir('hoi' . $backslash . 'nbla') || warn $!;
#mkdir('hoi..bla') || warn $!;
#mkdir('hoi' . $backslash . 'bla') || warn $!;
#mkdir('..hoi') || warn $!;

# these don't work
#mkdir($backslash . '/hoi') || warn $!;
#mkdir($backslash . '../hoi') || warn $!;

# try all possible bytes
my %chars;
for (my $i = 0; $i <= 255; $i++) {
$chars{$i} = chr($i);
}
my $s = join('',sort values(%chars));

if (mkdir(safe($s))) {
my $h;
opendir($h,'.');
my @entries = grep(/.{20,}/,readdir($h));
closedir($h);
open($h, '>t.bin') or die $!;
binmode $h;
print $h join("\n\n",@entries);
close($h);
}
else {
warn($!);
}
 
J

Juha Laiho

Craig Manley said:
From testing (using Perl + Slackware Linux) I've found that the only
characters I can't use in a directory/file name are the 0 byte and path
seperator /.

Correct, in the strictly technical sense. The reason to forbid '/' is that
that is the directory separator character - and thus the ability to use
it would make path names ambiguous -- f.ex. is "/tmp/x" a file named "tmp/x"
in the root directory, or a file named "x" in the /tmp diretory. The reason
to forbid \0 comes from the use of it as the string terminator in C language.
No such reasons exist for any of the other possible byte values, so they
are allowed.

Words of warning, though; there are tools that have problems properly
understanding anything except US-ASCII (so, byte values from 32 to 127
inclusive), and even within this range there are some characters that
I'd consider ill-advised. Space (32) is perhaps the hardest one; there
are tools that emit/expect lists of file names using whitespace as the
separator, and for them whitespace within a file name is a problem that
cannot be overcome. The most commonly seen pair of such tools are "find"
and "xargs" ("cpio" being yet another tool having this problem). There
are implementations (GNU) of these tools that have workarounds for this
problem, but the workarounds are not generally applicable (as the
availability of the GNU tools cannot be universally assumed).

Other characters that I would consider problematic are
!, ", ', `, *, ?, $, {, }, [, ], (, ), ~, <, >, |, #, & and \,
as these have special meanings in various shells and tools.

So, complementing this would leave characters
-, _, ,, ., ;, :, ^, =, +, %, 0-9, a-z and A-Z as the safe ones.
"-" and "." are ill-advised as the first characters in a file name.

Then, lately I've heard some reports of XFS filesystem on Linux having
trouble coping with UTF-8 byte sequences; it apparently is trying to
do something smart with non-US-ASCII file names and failing miserably.
Another question: are these single byte character file systems?

Unix filesystems tend to be; until I heard about the XFS issues, I had
assumed all Unix filesystems to be purely byte-oriented.

What you might do to allow "any" character in names at application level,
though, is to encode known problematic characters -- something like URL
encoding (%xx where xx is the two-digit hexadecimal value for the
character) should be usable -- just remember that when using this, %
becomes an unsafe character, so it needs to be encoded, too).
 
R

Randal L. Schwartz

Craig> From testing (using Perl + Slackware Linux) I've found that the only
Craig> characters I can't use in a directory/file name are the 0 byte and path
Craig> seperator /.

This has been true for every version of Unix I've used since 1977.
Can't say what it was before Unix V6 though... didn't get to use
those. :)

Gets fun when you permit \n in a filename. Lots of programs
don't expect that, and break. But those are broken programs, I say.
Not a broken filename.

print "Just another Perl hacker,"; # the first!
 
B

Bryan Castillo

Craig> From testing (using Perl + Slackware Linux) I've found that the only
Craig> characters I can't use in a directory/file name are the 0 byte and path
Craig> seperator /.

This has been true for every version of Unix I've used since 1977.
Can't say what it was before Unix V6 though... didn't get to use
those. :)

Gets fun when you permit \n in a filename. Lots of programs
don't expect that, and break. But those are broken programs, I say.
Not a broken filename.

I like putting "\x07" in file names. Its great when listing a
directory makes noise.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top