Which characters are really unsafe to use in Linux filenames (from Perl)?

Discussion in 'Perl Misc' started by Craig Manley, Apr 27, 2004.

  1. Craig Manley

    Craig Manley Guest

    Hi,

    From testing (using Perl + Slackware Linux) I've found that the only
    characters I can't use in a directory/file name are the 0 byte and path
    seperator /. Below is my test script and function that makes tainted strings
    safe to use as directory/file names. Because a mistake or misassumption here
    can open a huge security hole I'ld like to know if this is really correct in
    the opinions of others and if this idea is valid for all *nix variants. My
    goal is to create a filename validator for html form uploaded file names
    that is as unrestrictive as possible (yet safe).

    Another question for those of you know much about MSWin32: which characters
    can't be used in a MSWin32 directory/filename (I think it's much more than
    Linux)?

    Another question: are these single byte character file systems?

    -Craig Manley.

    #!/usr/bin/perl -w
    use strict;
    use bytes;

    sub safe {
    my $s = shift;
    # replace path seperators
    $s =~ s|/|_|g;
    # replace 0 bytes.
    $s =~ s|\000|_|g;
    # keep length <= 255 characters
    return substr($s,0,255);
    }

    my $backslash = '\\';

    # these all work
    #mkdir('hoi' . $backslash . 'nbla') || warn $!;
    #mkdir('hoi..bla') || warn $!;
    #mkdir('hoi' . $backslash . 'bla') || warn $!;
    #mkdir('..hoi') || warn $!;

    # these don't work
    #mkdir($backslash . '/hoi') || warn $!;
    #mkdir($backslash . '../hoi') || warn $!;

    # try all possible bytes
    my %chars;
    for (my $i = 0; $i <= 255; $i++) {
    $chars{$i} = chr($i);
    }
    my $s = join('',sort values(%chars));

    if (mkdir(safe($s))) {
    my $h;
    opendir($h,'.');
    my @entries = grep(/.{20,}/,readdir($h));
    closedir($h);
    open($h, '>t.bin') or die $!;
    binmode $h;
    print $h join("\n\n",@entries);
    close($h);
    }
    else {
    warn($!);
    }
    Craig Manley, Apr 27, 2004
    #1
    1. Advertising

  2. Craig Manley

    Juha Laiho Guest

    "Craig Manley" <> said:
    >From testing (using Perl + Slackware Linux) I've found that the only
    >characters I can't use in a directory/file name are the 0 byte and path
    >seperator /.


    Correct, in the strictly technical sense. The reason to forbid '/' is that
    that is the directory separator character - and thus the ability to use
    it would make path names ambiguous -- f.ex. is "/tmp/x" a file named "tmp/x"
    in the root directory, or a file named "x" in the /tmp diretory. The reason
    to forbid \0 comes from the use of it as the string terminator in C language.
    No such reasons exist for any of the other possible byte values, so they
    are allowed.

    Words of warning, though; there are tools that have problems properly
    understanding anything except US-ASCII (so, byte values from 32 to 127
    inclusive), and even within this range there are some characters that
    I'd consider ill-advised. Space (32) is perhaps the hardest one; there
    are tools that emit/expect lists of file names using whitespace as the
    separator, and for them whitespace within a file name is a problem that
    cannot be overcome. The most commonly seen pair of such tools are "find"
    and "xargs" ("cpio" being yet another tool having this problem). There
    are implementations (GNU) of these tools that have workarounds for this
    problem, but the workarounds are not generally applicable (as the
    availability of the GNU tools cannot be universally assumed).

    Other characters that I would consider problematic are
    !, ", ', `, *, ?, $, {, }, [, ], (, ), ~, <, >, |, #, & and \,
    as these have special meanings in various shells and tools.

    So, complementing this would leave characters
    -, _, ,, ., ;, :, ^, =, +, %, 0-9, a-z and A-Z as the safe ones.
    "-" and "." are ill-advised as the first characters in a file name.

    Then, lately I've heard some reports of XFS filesystem on Linux having
    trouble coping with UTF-8 byte sequences; it apparently is trying to
    do something smart with non-US-ASCII file names and failing miserably.

    >Another question: are these single byte character file systems?


    Unix filesystems tend to be; until I heard about the XFS issues, I had
    assumed all Unix filesystems to be purely byte-oriented.

    What you might do to allow "any" character in names at application level,
    though, is to encode known problematic characters -- something like URL
    encoding (%xx where xx is the two-digit hexadecimal value for the
    character) should be usable -- just remember that when using this, %
    becomes an unsafe character, so it needs to be encoded, too).
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
    Juha Laiho, Apr 27, 2004
    #2
    1. Advertising

  3. Craig Manley

    Tom Guest

    Re: Which characters are really unsafe to use in Linux filenames(from Perl)?

    Craig Manley wrote...
    <>
    > Another question for those of you know much about MSWin32: which
    > characters can't be used in a MSWin32 directory/filename (I think
    > it's much more than Linux)?



    \/:*?"<>|


    you shouldn't use a leading space or a leading dot
    you also need to avoid reserved names like:
    COM
    LPT1
    PRN
    AUX

    ....etc.

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;120716
    Tom, Apr 27, 2004
    #3
  4. >>>>> "Craig" == Craig Manley <> writes:

    Craig> From testing (using Perl + Slackware Linux) I've found that the only
    Craig> characters I can't use in a directory/file name are the 0 byte and path
    Craig> seperator /.

    This has been true for every version of Unix I've used since 1977.
    Can't say what it was before Unix V6 though... didn't get to use
    those. :)

    Gets fun when you permit \n in a filename. Lots of programs
    don't expect that, and break. But those are broken programs, I say.
    Not a broken filename.

    print "Just another Perl hacker,"; # the first!
    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    Randal L. Schwartz, Apr 27, 2004
    #4
  5. (Randal L. Schwartz) wrote in message news:<>...
    > >>>>> "Craig" == Craig Manley <> writes:

    >
    > Craig> From testing (using Perl + Slackware Linux) I've found that the only
    > Craig> characters I can't use in a directory/file name are the 0 byte and path
    > Craig> seperator /.
    >
    > This has been true for every version of Unix I've used since 1977.
    > Can't say what it was before Unix V6 though... didn't get to use
    > those. :)
    >
    > Gets fun when you permit \n in a filename. Lots of programs
    > don't expect that, and break. But those are broken programs, I say.
    > Not a broken filename.
    >


    I like putting "\x07" in file names. Its great when listing a
    directory makes noise.

    > print "Just another Perl hacker,"; # the first!
    Bryan Castillo, Apr 28, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. B.J.
    Replies:
    4
    Views:
    725
    Toby Inkster
    Apr 23, 2005
  2. rockdale
    Replies:
    3
    Views:
    3,243
    rockdale
    Nov 3, 2006
  3. Jeannie
    Replies:
    15
    Views:
    872
    Jeannie
    Aug 30, 2005
  4. Kasper

    How to use unsafe code in web service in VS2005

    Kasper, Oct 31, 2006, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    127
    Kasper
    Oct 31, 2006
  5. lucy
    Replies:
    6
    Views:
    145
    Michele Dondi
    Sep 3, 2004
Loading...

Share This Page