Sanitize file name

P

Philipp

Hello,

On some platforms, file names cannot contain certain characters (eg. on
windows no ? is allowed in a file name and path).
Is there a way in the API to sanitize a user-supplied string so that it
can be used as a valid filename?
Is there a way to test if a filename is valid on a certain platform?

Thanks Phil
 
A

Andrew Thompson

Philipp wrote:
...
Is there a way to test if a filename is valid on a certain platform?

This E.G. makes for some interesting results, though I
am not sure if it really helps with the problem. The
programmer would need to specially account for the
'last situation' where the user puts a character in the
name that is used as (or is generally understood to be)
a path separator.

Irritatingly, although Win's path separator is '\', '/'
will apparently also work (here, on this Win XP pro
box).

<sscce>
import java.io.File;
import java.io.IOException;

class TestFileName {

static void testFileName(String name) {
try {
File f = new File(name);
System.out.println( f.getCanonicalPath() );
} catch(IOException ioe) {
System.err.println( ioe.getMessage() + " '" + name + "'");
}
}

public static void main(String[] args) {
testFileName("123.txt");
testFileName("12?3.txt");
testFileName("12[3.txt");
testFileName("12{3.txt");
testFileName("12!3.txt");
testFileName("12/3.txt");
}
}
</sscce>

[OP]
D:\projects\123.txt
Invalid argument '12?3.txt'
D:\projects\12[3.txt
D:\projects\12{3.txt
D:\projects\12!3.txt
D:\projects\12\3.txt
Press any key to continue . . .
[\OP]

--
Andrew Thompson
http://www.athompson.info/andrew/

Message posted via JavaKB.com
http://www.javakb.com/Uwe/Forums.aspx/java-general/200710/1
 
S

Sabine Dinis Blochberger

Philipp said:
Hello,

On some platforms, file names cannot contain certain characters (eg. on
windows no ? is allowed in a file name and path).
Is there a way in the API to sanitize a user-supplied string so that it
can be used as a valid filename?
Is there a way to test if a filename is valid on a certain platform?

Thanks Phil

There's a possible solution in this thread:
http://forum.java.sun.com/thread.jspa?threadID=629458&start=0&tstart=0
 
P

Philipp

Andrew said:
Philipp wrote:
..
Is there a way to test if a filename is valid on a certain platform?

This E.G. makes for some interesting results, though I
am not sure if it really helps with the problem. The
programmer would need to specially account for the
'last situation' where the user puts a character in the
name that is used as (or is generally understood to be)
a path separator.

Irritatingly, although Win's path separator is '\', '/'
will apparently also work (here, on this Win XP pro
box).

<sscce>
import java.io.File;
import java.io.IOException;

class TestFileName {

static void testFileName(String name) {
try {
File f = new File(name);
System.out.println( f.getCanonicalPath() );
} catch(IOException ioe) {
System.err.println( ioe.getMessage() + " '" + name + "'");
}
}

public static void main(String[] args) {
testFileName("123.txt");
testFileName("12?3.txt");
testFileName("12[3.txt");
testFileName("12{3.txt");
testFileName("12!3.txt");
testFileName("12/3.txt");
}
}
</sscce>

[OP]
D:\projects\123.txt
Invalid argument '12?3.txt'
D:\projects\12[3.txt
D:\projects\12{3.txt
D:\projects\12!3.txt
D:\projects\12\3.txt
Press any key to continue . . .
[\OP]

As far as I know the invalid characters for filenames are:
On Windows \ / : * ? " < > |
On UNIX :

Running your SSCCE with these signs (although in a different order)
gives (on WinXP):

[OP]
Invalid argument '12?3.txt'
D:\workspace\test\123.txt
Syntaxe du nom de fichier, de répertoire ou de volume incorrecte '12:3.txt'
D:\workspace\test\12[3.txt
D:\workspace\test\12{3.txt
D:\workspace\test\12!3.txt
D:\workspace\test\12\3.txt
D:\workspace\test\12;3.txt
D:\workspace\test\12<3.txt
D:\workspace\test\12>3.txt
Invalid argument '12*3.txt'
D:\workspace\test\12"3.txt
Syntaxe du nom de fichier, de répertoire ou de volume incorrecte '12|3.txt'
[/OP]

Note that the getCanonicalPath() method throws IOException for only some
of them. So this does not seem a good method to identify bad chars.

The method described in
http://forum.java.sun.com/thread.jspa?threadID=629458&start=0&tstart=0
(thanks Sabine for the link) actually creates the file. Well, that
definitely works, but it's really ugly (IMHO).


Best regards
Phil
 
G

Gordon Beaton

As far as I know the invalid characters for filenames are:
On Windows \ / : * ? " < > |
On UNIX :

No, ':' should be valid on Unix, but read on...

AFAIK the only invalid char on traditional Unixy filesystems is '/'
because it's the path component separator, anything else should be ok.
That doesn't mean that other characters are *easy* to use or supported
by every tool though.

Note that what's valid or not is actually file system dependent, so
the answer really depends. For example, you can mount a VFAT, HPFS or
NTFS volume on your unix host, and the names are then limited by VFAT
or HPFS or NTFS as appropriate. There's more to consider than just
valid filename characters: max filename length and file length are
there too, among other things.

Personally I wouldn't try to put this logic into my application; leave
it in the OS where it belongs (and where it will always be correct).
If the user specifies a filename then just use it and be prepared to
fail gracefully.

/gordon

--
 
S

Stefan Ram

Philipp said:
On some platforms, file names cannot contain certain characters (eg. on
windows no ? is allowed in a file name and path).
Is there a way in the API to sanitize a user-supplied string so that it
can be used as a valid filename?

I have specifed and implemented a conversion for this,
which is called »Filode«.

Since one can never know in advance under which FileSystem a
JVM will be hosted, Filode only assumes that a filename may
contain characters A-Z of a single case.

http://www.purl.org/stefan_ram/pub/filode

http://www.purl.org/stefan_ram/html/ram.jar/de/dclj/ram/notation/filode/Text.html
 
G

Gordon Beaton

I have specifed and implemented a conversion for this,
which is called »Filode«.

Since one can never know in advance under which FileSystem a
JVM will be hosted, Filode only assumes that a filename may
contain characters A-Z of a single case.

One really irritating thing an application can do is prevent me from
using the full capabilities offered by my system. Why would you want
to enforce such a limitation? The OS will tell you whether a filename
was valid or not when you try to create a file with it.

/gordon

--
 
G

Gordon Beaton

On my Fedora 7 box:

$ echo foo > temp\:.txt
$ ls *.txt
temp:.txt
$

And there was probably no need to escape the ':' either (but that
depends on your choice of shell).

/gordon

--
 
E

Eric Sosman

Gordon said:
One really irritating thing an application can do is prevent me from
using the full capabilities offered by my system. Why would you want
to enforce such a limitation? The OS will tell you whether a filename
was valid or not when you try to create a file with it.

On at least some versions of Windows, certain filenames
are valid but surprising. Try using the file name "con.txt"
and see what happens (on my XP box, "type con.anything" echoes
what's typed at the keyboard).
 
G

Gordon Beaton

On at least some versions of Windows, certain filenames
are valid but surprising. Try using the file name "con.txt"
and see what happens (on my XP box, "type con.anything" echoes
what's typed at the keyboard).

The same is true of "cat /dev/stdin" on Linux, and the technique is
actually pretty useful for getting a program that requires a filename
to read from stdin or print to stdout.

/gordon

--
 
E

Eric Sosman

Gordon Beaton wrote On 10/25/07 09:37,:
The same is true of "cat /dev/stdin" on Linux, and the technique is
actually pretty useful for getting a program that requires a filename
to read from stdin or print to stdout.

Perhaps I wasn't clear enough. The surprising thing
isn't that devices have names in the file system, but that
Windows "imports" those names to every directory, and also
gives them an unlimited number of aliases in every directory.
"cat /dev/stdin.dat" will tell you it can't find any such
file, while "type con.dat" and "type con.foobar" will both
go straight to the CON: device.

I had to deal with this once, at a PPOE. Users could
give their documents whatever names they liked, and could
even have multiple identically-named documents in the same
directory. Behind the scenes, the product constructed file
names by "mangling" the user-provided names and attaching
various extensions and disambiguating goodies. When we did
our first Windows port, somebody created a pair of short
documents describing the arguments for and against something,
and assigned them the names "pro" and "con". These name
stems passed through our mangler largely unchanged, yielding
file names like "PRO.DOC" and "PRO.DC@" -- and I was the
guy who fielded the bug report that resulted when we tried
to store data in "CON.DOC" and "CON.DC@" ...
 
R

Roedy Green

On some platforms, file names cannot contain certain characters (eg. on
windows no ? is allowed in a file name and path).
Is there a way in the API to sanitize a user-supplied string so that it
can be used as a valid filename?
Is there a way to test if a filename is valid on a certain platform?

See StringTools.isLegal.

This will test if the filename contains only some limited safe set.

What do you do with the bad chars? Convert them to something else?
Delete them? Complain to the user?

I suppose you could test if the file exists, and if it does not,
create it, and see if it exists. IF not you have a problem. Then
delete it.

If you are going to be moving files from platform to platform, you
want to restrict them ALL to same safe set, e.g.

A-Z a-z 0-9 .

Then allow the platform specific FileSeparator. e.g. / \ or only allow
/ which seems to work universally.

I woud would avoid space, particularly lead or trail space.

You might also allow _, but to be safe, I would leave that out too. It
has magic meaning to various people.
 
D

Daniel Pitts

Philipp said:
Hello,

On some platforms, file names cannot contain certain characters (eg. on
windows no ? is allowed in a file name and path).
Is there a way in the API to sanitize a user-supplied string so that it
can be used as a valid filename?
Is there a way to test if a filename is valid on a certain platform?

Thanks Phil
If you don't need to protect the current system from the user (eg,
you're running locally on the user's computer), then let the user enter
whatever they want. If it's not a valid filename, let the file operation
throw the exception. Ofcourse, you should catch it and display the
appropriate dialog.

On the other hand, just because a string has no "forbidden" characters,
doesn't mean its a valid file for your purpose. If it happens to be the
same name as a directory, then reading and/or writing to it will fail in
most cases. If it happens to be a read-only file, then writing to it
might fail, depending on user privilege. If the file doesn't exist, but
the parent path is read-only, you won't be able to create.

So, in short. Let if its not a security issue, let the OS tell you the
validity. If it *is* a security issue, let the security manager tell
you the validity.

Hope this helps,
Daniel.
 
D

Daniel Pitts

Gordon said:
One really irritating thing an application can do is prevent me from
using the full capabilities offered by my system. Why would you want
to enforce such a limitation? The OS will tell you whether a filename
was valid or not when you try to create a file with it.

/gordon

Heh, that was basically my reply. Ohwell.
 
D

Daniel Pitts

Eric said:
Gordon Beaton wrote On 10/25/07 09:37,:

Perhaps I wasn't clear enough. The surprising thing
isn't that devices have names in the file system, but that
Windows "imports" those names to every directory, and also
gives them an unlimited number of aliases in every directory.
"cat /dev/stdin.dat" will tell you it can't find any such
file, while "type con.dat" and "type con.foobar" will both
go straight to the CON: device.

I had to deal with this once, at a PPOE. Users could
give their documents whatever names they liked, and could
even have multiple identically-named documents in the same
directory. Behind the scenes, the product constructed file
names by "mangling" the user-provided names and attaching
various extensions and disambiguating goodies. When we did
our first Windows port, somebody created a pair of short
documents describing the arguments for and against something,
and assigned them the names "pro" and "con". These name
stems passed through our mangler largely unchanged, yielding
file names like "PRO.DOC" and "PRO.DC@" -- and I was the
guy who fielded the bug report that resulted when we tried
to store data in "CON.DOC" and "CON.DC@" ...
Wow, talk about namespace clutter. You can't even use relative or
absolute paths. Stupid M$.
 
S

Sherman Pendley

Andrew Thompson said:
Irritatingly, although Win's path separator is '\', '/'
will apparently also work (here, on this Win XP pro
box).

Any programmatic API I can think of that runs on Windows, in any language,
will accept both '/'s and '/'s as path delimiters. The command shell still
wants you to use backslashes, but that's pretty much it.

That being the case, I'd be irritated if '/' *didn't* work - it's a valid
path delimiter on Windows, so there's no reason it should be rejected.

sherm--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top