Making a string, file-safe (file-encode??)

A

adamorn

I was wondering if there was a quick way to ensure that a filename is
a safe.

What I mean is that if I am creating a file from a string variable, I
want to ensure that the file will actually be able to be created. So
if it contains a "?", then clearly I would want to eliminate it.

I know that there is something like URL encode that encodes strings
for use in urls, but is there another function that works similarly
for strings for files that I want to create?

Thanks!
 
S

Stefan Ram

I know that there is something like URL encode that encodes strings
for use in urls, but is there another function that works similarly
for strings for files that I want to create?

The GPL library »ram.jar« contains a class to convert an
arbitrary Unicode string to a string of only uppercase latin
letters and digits. This intended to convert any text to a
text acceptable accross most file systems as a filename.

http://www.purl.org/stefan_ram/pub/filode
 
D

Daniel Pitts

I was wondering if there was a quick way to ensure that a filename is
a safe.

What I mean is that if I am creating a file from a string variable, I
want to ensure that the file will actually be able to be created. So
if it contains a "?", then clearly I would want to eliminate it.

I know that there is something like URL encode that encodes strings
for use in urls, but is there another function that works similarly
for strings for files that I want to create?

Thanks!
? is not invalid on all system, linux handles it perfectly. The
characters that are invalid are system specific, and some systems don't
have limitations at all.

The only portable way to handle this is to catch exceptions and report
them to the user.
 
A

adamorn

? is not invalid on all system, linux handles it perfectly. The
characters that are invalid are system specific, and some systems don't
have limitations at all.

The only portable way to handle this is to catch exceptions and report
them to the user.


ah, but Im actually pulling the filename from a variable that the user
does not set...
 
D

Daniel Pitts

ah, but Im actually pulling the filename from a variable that the user
does not set...
Then make sure the variable is being set by something that doesn't add
invalid characters. Details might help us better help you.
 
T

Tom Anderson

The "alphabets" for file names vary from system to system,
and there are systems on which '?' is perfectly legal. So your
"clearly" isn't really all that clear ...

Oh come on, this is ridiculous. The only safe and sane thing to do is to
target the common set of valid filenames - so exclude ?, /, \, , *, ",
etc. Surely this is blindingly obvious? This is not a complicated
question, it's quite clear what the OP wants to know, and you're not
helping anyone by making a mountain out of a molehill.

The answer to the question, though, is no - there's no library method that
checks if a filename is safe, or escapes one to make it safe, at least
none that i know of. However, it wouldn't be too hard to write a regular
expression to validate filenames, or a sequence of replace calls to
replace dangerous characters with safe versions.

Roedy's advice is pretty good:

http://mindprod.com/jgloss/filenames.html

I'd be tempted to go wild and insist that filenames contain only letters,
digits, underscores, dashes and full stops, and don't have a punctuation
symbol as the first character. If a user came up with a good reason to use
some other character, i'd happily consider adding it, but until then, keep
it simple, keep it safe.
In general, though, you can't guarantee that a file will be creatable
just by examining its name. On one widespread system, "D:\\README.TXT"
is a perfectly valid file name but you are unlikely to succeed in
creating a new file on a CD-ROM ... Or you may lack permission to create
files in some folders, or the file system may be full, or ...

True. And completely unconnected to what the OP asked.

tom
 
R

RedGrittyBrick

Tom said:
Oh come on, this is ridiculous. The only safe and sane thing to do is to
target the common set of valid filenames - so exclude ?, /, \, , *, ",
etc. Surely this is blindingly obvious? This is not a complicated
question, it's quite clear what the OP wants to know, and you're not
helping anyone by making a mountain out of a molehill.

$ perl file.pl
'aaa*bbb?ccc.txt' written.
'aaa*bbb?ccc.txt' contains ...
Hello File


$ ls -l aaa*
-rw-rw-r-- 1 rgb rgb 11 Jun 18 10:24 aaa*bbb?ccc.txt


$ cat file.pl
#!/usr/bin/perl
#
use strict;
use warnings;

my $filename = 'aaa*bbb?ccc.txt';
open my $fh, '>', $filename
or die "can't write '$filename' because $!\n";
print $fh "Hello File\n";
close $fh;
print "'$filename' written.\n";


open my $fh2, '<', $filename
or die "can't read '$filename' because $!\n";
print "'$filename' contains ...\n";
while (<$fh2>) {
print;
}
close $fh2;


I was too lazy to write it in Java. Sorry :)
 
R

RedGrittyBrick

Lew said:
But doesn't everyone use the Joliet extensions?

It is only a few days since I received a ISO 9660 CD without Joliet
extensions. So no.

The originator admitted he'd made a mistake though.
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lew schreef:
| Tom Anderson wrote:
|> Oh come on, this is ridiculous. The only safe and sane thing to do is
|> to target the common set of valid filenames - so exclude ?, /, \, , *,
|> ", etc. Surely this is blindingly obvious? This is not a complicated
|> question, it's quite clear what the OP wants to know, and you're not
|> helping anyone by making a mountain out of a molehill.
|
| Many people's situation differs, and they are fine with using those
| characters in file names, even from Java, so no, the common subset is
| not the only "safe and sane thing to do".

I’ve been using names like ‘(∃y)(y∈--).mona’ and ‘E1 x (E1 y (& (& (>+ x
y) (cat x NF)) (cat y PX))).gta’ without problems, on Linux. Haven’t
been able to test my program on Windows until now, though, since I
haven’t managed to compile the JNI on it. But since these are files the
user doesn’t need to care about, it would be no problem to use ‘safe’
names once it turns out not to work. So I guess I’m interested in this
routine as well.

Cheers, H.
- --
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIWOZGe+7xMGD3itQRAtk9AJ9pO7Jq+4xiZ6OVo+bKC7nDtOUmhQCaAzRa
Xvwp/f5t86JNCp5zEGDqapw=
=bOPT
-----END PGP SIGNATURE-----
 
G

Gordon Beaton

I?ve been using names like ?(?y)(y?--).mona? and ?E1 x (E1 y (& (& (>+ x
y) (cat x NF)) (cat y PX))).gta? without problems, on Linux.

I like to use filenames like "-rf ~ &;" followed by random strings.
Keeps my users on their toes.

/gordon

--
 
T

Tom Anderson

I surmise you've never needed to write code for multiple
file systems.


Very well, then: "All portable file names shall consist
of one to six decimal digits or upper-case English letters, one
period, and zero to three decimal digits or upper-case English
letters." If you're content with this as a least common denominator, you're
all set.

Ah, i had indeed forgotten that there were filesystems like that!

Okay, lowest common denominator of filesystems in widespread use on
computers at present. Pre-LFN FAT32 and non-Joliet ISO 9660 don't qualify.

Although, is LFNless FAT32 used on memory cards for cameras?
He asked for a file name that would quote ensure that the file will
actually be able to be created end quote.

It was pretty clear to me from his post that that wasn't what he was
asking.

tom
 
T

Tom Anderson

Tom said:
It was pretty clear to me from his post that that wasn't what he was
asking.

The OP asked, in the first post:
What I mean is that if I am creating a file from a string variable,
I want to ensure that the file will actually be able to be created.

Eric gave an exact quote, and even said, "quote ... end quote". How was
it "pretty clear to [you] that that wasn't what [the OP] was asking",
when it was word for word exactly what they asked?

Because the OP, i believe, expressed himself imperfectly, and the text
quoted did not accurately represent his query. If you examine the rest of
his post, that is clear. Cherry-picking sentences and interpreting them
literally is a useful rhetorical tool, but it doesn't help answer
questions.

tom
 
T

Tom Anderson

Tom said:
Eric Sosman wrote:
He asked for a file name that would quote ensure that the file will
actually be able to be created end quote.

Tom Anderson wrote:
It was pretty clear to me from his post that that wasn't what he was
asking.

The OP asked, in the first post:
What I mean is that if I am creating a file from a string variable,
I want to ensure that the file will actually be able to be created.

Eric gave an exact quote, and even said, "quote ... end quote". How was
it "pretty clear to [you] that that wasn't what [the OP] was asking", when
it was word for word exactly what they asked?

Because the OP, i believe, expressed himself imperfectly, and the text
quoted did not accurately represent his query. If you examine the rest of
his post, that is clear. Cherry-picking sentences and interpreting them
literally is a useful rhetorical tool, but it doesn't help answer
questions.

The original post contained three count them three paragraphs. The
first was introductory, sort of a title for the rest. The second had
two sentences, one whose operative portion was the material I quoted,
and a second making it clear that the poster was thinking of lexical
tests. The third made an analogy with lexical manipulation of URLs.

Right, so it was obvious that he was thinking of lexical tests, then?
You've called me ridiculous,

Eric, i don't think i called you ridiculous, and i certainly didn't mean
to imply that. I apologise if it came across like that. I called something
you said ridiculous, and i think it was.
you've accused me of twisting the O.P.'s words, and I'm starting to find
your style of argumentation lacking in, well, style.

Again, my apologies. I'll try and be more entertaining in future.

tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top