Making a string, file-safe (file-encode??)

Discussion in 'Java' started by adamorn@gmail.com, Jun 17, 2008.

  1. Guest

    I was wondering if there was a quick way to ensure that a filename is
    a safe.

    What I mean is that if I am creating a file from a string variable, I
    want to ensure that the file will actually be able to be created. So
    if it contains a "?", then clearly I would want to eliminate it.

    I know that there is something like URL encode that encodes strings
    for use in urls, but is there another function that works similarly
    for strings for files that I want to create?

    Thanks!
    , Jun 17, 2008
    #1
    1. Advertising

  2. Stefan Ram Guest

    Re: Making a string, file-safe (file-encode??)

    writes:
    >I know that there is something like URL encode that encodes strings
    >for use in urls, but is there another function that works similarly
    >for strings for files that I want to create?


    The GPL library »ram.jar« contains a class to convert an
    arbitrary Unicode string to a string of only uppercase latin
    letters and digits. This intended to convert any text to a
    text acceptable accross most file systems as a filename.

    http://www.purl.org/stefan_ram/pub/filode
    Stefan Ram, Jun 17, 2008
    #2
    1. Advertising

  3. Daniel Pitts Guest

    wrote:
    > I was wondering if there was a quick way to ensure that a filename is
    > a safe.
    >
    > What I mean is that if I am creating a file from a string variable, I
    > want to ensure that the file will actually be able to be created. So
    > if it contains a "?", then clearly I would want to eliminate it.
    >
    > I know that there is something like URL encode that encodes strings
    > for use in urls, but is there another function that works similarly
    > for strings for files that I want to create?
    >
    > Thanks!

    ? is not invalid on all system, linux handles it perfectly. The
    characters that are invalid are system specific, and some systems don't
    have limitations at all.

    The only portable way to handle this is to catch exceptions and report
    them to the user.
    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Jun 17, 2008
    #3
  4. Guest

    Re: Making a string, file-safe (file-encode??)

    On Jun 17, 3:13 pm, Daniel Pitts
    <> wrote:
    > wrote:
    > > I was wondering if there was a quick way to ensure that a filename is
    > > a safe.

    >
    > > What I mean is that if I am creating afilefrom a string variable, I
    > > want to ensure that thefilewill actually be able to be created.  So
    > > if it contains a "?", then clearly I would want to eliminate it.

    >
    > > I know that there is something like URLencodethat encodes strings
    > > for use in urls, but is there another function that works similarly
    > > for strings for files that I want to create?

    >
    > > Thanks!

    >
    > ? is not invalid on all system, linux handles it perfectly. The
    > characters that are invalid are system specific, and some systems don't
    > have limitations at all.
    >
    > The only portable way to handle this is to catch exceptions and report
    > them to the user.
    > --
    > Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>



    ah, but Im actually pulling the filename from a variable that the user
    does not set...
    , Jun 17, 2008
    #4
  5. Roedy Green Guest

    On Tue, 17 Jun 2008 11:54:43 -0700 (PDT), wrote,
    quoted or indirectly quoted someone who said :

    >I was wondering if there was a quick way to ensure that a filename is
    >a safe.


    see http://mindprod.com/jgloss/filenames.html for some thoughts on the
    problem.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Jun 17, 2008
    #5
  6. Daniel Pitts Guest

    Re: Making a string, file-safe (file-encode??)

    wrote:
    > On Jun 17, 3:13 pm, Daniel Pitts
    > <> wrote:
    >> wrote:
    >>> I was wondering if there was a quick way to ensure that a filename is
    >>> a safe.
    >>> What I mean is that if I am creating afilefrom a string variable, I
    >>> want to ensure that thefilewill actually be able to be created. So
    >>> if it contains a "?", then clearly I would want to eliminate it.
    >>> I know that there is something like URLencodethat encodes strings
    >>> for use in urls, but is there another function that works similarly
    >>> for strings for files that I want to create?
    >>> Thanks!

    >> ? is not invalid on all system, linux handles it perfectly. The
    >> characters that are invalid are system specific, and some systems don't
    >> have limitations at all.
    >>
    >> The only portable way to handle this is to catch exceptions and report
    >> them to the user.
    >> --
    >> Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

    >
    >
    > ah, but Im actually pulling the filename from a variable that the user
    > does not set...

    Then make sure the variable is being set by something that doesn't add
    invalid characters. Details might help us better help you.


    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Jun 17, 2008
    #6
  7. Tom Anderson Guest

    On Tue, 17 Jun 2008, Eric Sosman wrote:

    > wrote:
    >> I was wondering if there was a quick way to ensure that a filename is
    >> a safe.
    >>
    >> What I mean is that if I am creating a file from a string variable, I
    >> want to ensure that the file will actually be able to be created. So
    >> if it contains a "?", then clearly I would want to eliminate it.

    >
    > The "alphabets" for file names vary from system to system,
    > and there are systems on which '?' is perfectly legal. So your
    > "clearly" isn't really all that clear ...


    Oh come on, this is ridiculous. The only safe and sane thing to do is to
    target the common set of valid filenames - so exclude ?, /, \, , *, ",
    etc. Surely this is blindingly obvious? This is not a complicated
    question, it's quite clear what the OP wants to know, and you're not
    helping anyone by making a mountain out of a molehill.

    The answer to the question, though, is no - there's no library method that
    checks if a filename is safe, or escapes one to make it safe, at least
    none that i know of. However, it wouldn't be too hard to write a regular
    expression to validate filenames, or a sequence of replace calls to
    replace dangerous characters with safe versions.

    Roedy's advice is pretty good:

    http://mindprod.com/jgloss/filenames.html

    I'd be tempted to go wild and insist that filenames contain only letters,
    digits, underscores, dashes and full stops, and don't have a punctuation
    symbol as the first character. If a user came up with a good reason to use
    some other character, i'd happily consider adding it, but until then, keep
    it simple, keep it safe.

    > In general, though, you can't guarantee that a file will be creatable
    > just by examining its name. On one widespread system, "D:\\README.TXT"
    > is a perfectly valid file name but you are unlikely to succeed in
    > creating a new file on a CD-ROM ... Or you may lack permission to create
    > files in some folders, or the file system may be full, or ...


    True. And completely unconnected to what the OP asked.

    tom

    --
    Judge Dredd. Found dead. Face down in Snoopy's bed.
    Tom Anderson, Jun 18, 2008
    #7
  8. Tom Anderson wrote:
    > On Tue, 17 Jun 2008, Eric Sosman wrote:
    >
    >> wrote:
    >>> I was wondering if there was a quick way to ensure that a filename is
    >>> a safe.
    >>>
    >>> What I mean is that if I am creating a file from a string variable, I
    >>> want to ensure that the file will actually be able to be created. So
    >>> if it contains a "?", then clearly I would want to eliminate it.

    >>
    >> The "alphabets" for file names vary from system to system,
    >> and there are systems on which '?' is perfectly legal. So your
    >> "clearly" isn't really all that clear ...

    >
    > Oh come on, this is ridiculous. The only safe and sane thing to do is to
    > target the common set of valid filenames - so exclude ?, /, \, , *, ",
    > etc. Surely this is blindingly obvious? This is not a complicated
    > question, it's quite clear what the OP wants to know, and you're not
    > helping anyone by making a mountain out of a molehill.


    $ perl file.pl
    'aaa*bbb?ccc.txt' written.
    'aaa*bbb?ccc.txt' contains ...
    Hello File


    $ ls -l aaa*
    -rw-rw-r-- 1 rgb rgb 11 Jun 18 10:24 aaa*bbb?ccc.txt


    $ cat file.pl
    #!/usr/bin/perl
    #
    use strict;
    use warnings;

    my $filename = 'aaa*bbb?ccc.txt';
    open my $fh, '>', $filename
    or die "can't write '$filename' because $!\n";
    print $fh "Hello File\n";
    close $fh;
    print "'$filename' written.\n";


    open my $fh2, '<', $filename
    or die "can't read '$filename' because $!\n";
    print "'$filename' contains ...\n";
    while (<$fh2>) {
    print;
    }
    close $fh2;


    I was too lazy to write it in Java. Sorry :)

    --
    RGB
    RedGrittyBrick, Jun 18, 2008
    #8
  9. Lew wrote:
    > Lew wrote:
    >>> And why are we forbidding lower-case letters?

    >
    > Eric Sosman wrote:
    >> Because of file systems that don't support them.
    >> ISO 9660 (aka HSFS), for example.
    >>
    >> http://en.wikipedia.org/wiki/ISO_9660

    >
    > But doesn't everyone use the Joliet extensions?
    >


    It is only a few days since I received a ISO 9660 CD without Joliet
    extensions. So no.

    The originator admitted he'd made a mistake though.

    --
    RGB
    RedGrittyBrick, Jun 18, 2008
    #9
  10. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Lew schreef:
    | Tom Anderson wrote:
    |> Oh come on, this is ridiculous. The only safe and sane thing to do is
    |> to target the common set of valid filenames - so exclude ?, /, \, , *,
    |> ", etc. Surely this is blindingly obvious? This is not a complicated
    |> question, it's quite clear what the OP wants to know, and you're not
    |> helping anyone by making a mountain out of a molehill.
    |
    | Many people's situation differs, and they are fine with using those
    | characters in file names, even from Java, so no, the common subset is
    | not the only "safe and sane thing to do".

    I’ve been using names like ‘(∃y)(y∈--).mona’ and ‘E1 x (E1 y (& (& (>+ x
    y) (cat x NF)) (cat y PX))).gta’ without problems, on Linux. Haven’t
    been able to test my program on Windows until now, though, since I
    haven’t managed to compile the JNI on it. But since these are files the
    user doesn’t need to care about, it would be no problem to use ‘safe’
    names once it turns out not to work. So I guess I’m interested in this
    routine as well.

    Cheers, H.
    - --
    Hendrik Maryns
    http://tcl.sfs.uni-tuebingen.de/~hendrik/
    ==================
    http://aouw.org
    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.4-svn0 (GNU/Linux)
    Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

    iD8DBQFIWOZGe+7xMGD3itQRAtk9AJ9pO7Jq+4xiZ6OVo+bKC7nDtOUmhQCaAzRa
    Xvwp/f5t86JNCp5zEGDqapw=
    =bOPT
    -----END PGP SIGNATURE-----
    Hendrik Maryns, Jun 18, 2008
    #10
  11. On Wed, 18 Jun 2008 12:41:10 +0200, Hendrik Maryns wrote:
    > I?ve been using names like ?(?y)(y?--).mona? and ?E1 x (E1 y (& (& (>+ x
    > y) (cat x NF)) (cat y PX))).gta? without problems, on Linux.


    I like to use filenames like "-rf ~ &;" followed by random strings.
    Keeps my users on their toes.

    /gordon

    --
    Gordon Beaton, Jun 18, 2008
    #11
  12. Tom Anderson Guest

    On Tue, 17 Jun 2008, Eric Sosman wrote:

    > Tom Anderson wrote:
    >> On Tue, 17 Jun 2008, Eric Sosman wrote:
    >>
    >>> wrote:
    >>>> I was wondering if there was a quick way to ensure that a filename is
    >>>> a safe.
    >>>>
    >>>> What I mean is that if I am creating a file from a string variable, I
    >>>> want to ensure that the file will actually be able to be created. So
    >>>> if it contains a "?", then clearly I would want to eliminate it.
    >>>
    >>> The "alphabets" for file names vary from system to system,
    >>> and there are systems on which '?' is perfectly legal. So your
    >>> "clearly" isn't really all that clear ...

    >>
    >> Oh come on, this is ridiculous.

    >
    > I surmise you've never needed to write code for multiple
    > file systems.
    >
    >> The only safe and sane thing to do is to target the common set of valid
    >> filenames - so exclude ?, /, \, , *, ", etc. Surely this is blindingly
    >> obvious? This is not a complicated question, it's quite clear what the OP
    >> wants to know, and you're not helping anyone by making a mountain out of a
    >> molehill.

    >
    > Very well, then: "All portable file names shall consist
    > of one to six decimal digits or upper-case English letters, one
    > period, and zero to three decimal digits or upper-case English
    > letters." If you're content with this as a least common denominator, you're
    > all set.


    Ah, i had indeed forgotten that there were filesystems like that!

    Okay, lowest common denominator of filesystems in widespread use on
    computers at present. Pre-LFN FAT32 and non-Joliet ISO 9660 don't qualify.

    Although, is LFNless FAT32 used on memory cards for cameras?

    >>> In general, though, you can't guarantee that a file will be creatable
    >>> just by examining its name. On one widespread system,
    >>> "D:\\README.TXT" is a perfectly valid file name but you are unlikely
    >>> to succeed in creating a new file on a CD-ROM ... Or you may lack
    >>> permission to create files in some folders, or the file system may be
    >>> full, or ...

    >>
    >> True. And completely unconnected to what the OP asked.

    >
    > He asked for a file name that would quote ensure that the file will
    > actually be able to be created end quote.


    It was pretty clear to me from his post that that wasn't what he was
    asking.

    tom

    --
    The future is still out there, somewhere.
    Tom Anderson, Jun 18, 2008
    #12
  13. Roedy Green Guest

    On Tue, 17 Jun 2008 20:41:47 GMT, Roedy Green
    <> wrote, quoted or indirectly quoted
    someone who said :

    >see http://mindprod.com/jgloss/filenames.html for some thoughts on the
    >problem


    I have revised the essay based on some thoughts from Eric Sosman.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Jun 18, 2008
    #13
  14. Tom Anderson Guest

    On Wed, 18 Jun 2008, Lew wrote:

    > Eric Sosman wrote:
    >>> He asked for a file name that would quote ensure that the file will
    >>> actually be able to be created end quote.

    >
    > Tom Anderson wrote:
    >> It was pretty clear to me from his post that that wasn't what he was
    >> asking.

    >
    > The OP asked, in the first post:
    >> What I mean is that if I am creating a file from a string variable,
    >> I want to ensure that the file will actually be able to be created.

    >
    > Eric gave an exact quote, and even said, "quote ... end quote". How was
    > it "pretty clear to [you] that that wasn't what [the OP] was asking",
    > when it was word for word exactly what they asked?


    Because the OP, i believe, expressed himself imperfectly, and the text
    quoted did not accurately represent his query. If you examine the rest of
    his post, that is clear. Cherry-picking sentences and interpreting them
    literally is a useful rhetorical tool, but it doesn't help answer
    questions.

    tom

    --
    IT'S OVER NINE THOUSAND!!!
    Tom Anderson, Jun 19, 2008
    #14
  15. Tom Anderson Guest

    On Thu, 19 Jun 2008, Eric Sosman wrote:

    > Tom Anderson wrote:
    >> On Wed, 18 Jun 2008, Lew wrote:
    >>
    >>> Eric Sosman wrote:
    >>>>> He asked for a file name that would quote ensure that the file will
    >>>>> actually be able to be created end quote.
    >>>
    >>> Tom Anderson wrote:
    >>>> It was pretty clear to me from his post that that wasn't what he was
    >>>> asking.
    >>>
    >>> The OP asked, in the first post:
    >>>> What I mean is that if I am creating a file from a string variable,
    >>>> I want to ensure that the file will actually be able to be created.
    >>>
    >>> Eric gave an exact quote, and even said, "quote ... end quote". How was
    >>> it "pretty clear to [you] that that wasn't what [the OP] was asking", when
    >>> it was word for word exactly what they asked?

    >>
    >> Because the OP, i believe, expressed himself imperfectly, and the text
    >> quoted did not accurately represent his query. If you examine the rest of
    >> his post, that is clear. Cherry-picking sentences and interpreting them
    >> literally is a useful rhetorical tool, but it doesn't help answer
    >> questions.

    >
    > The original post contained three count them three paragraphs. The
    > first was introductory, sort of a title for the rest. The second had
    > two sentences, one whose operative portion was the material I quoted,
    > and a second making it clear that the poster was thinking of lexical
    > tests. The third made an analogy with lexical manipulation of URLs.


    Right, so it was obvious that he was thinking of lexical tests, then?

    > You've called me ridiculous,


    Eric, i don't think i called you ridiculous, and i certainly didn't mean
    to imply that. I apologise if it came across like that. I called something
    you said ridiculous, and i think it was.

    > you've accused me of twisting the O.P.'s words, and I'm starting to find
    > your style of argumentation lacking in, well, style.


    Again, my apologies. I'll try and be more entertaining in future.

    tom

    --
    Throwin' Lyle's liquor away is like pickin' a fight with a meat packing
    plant! -- Ray Smuckles
    Tom Anderson, Jun 20, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Soefara
    Replies:
    2
    Views:
    4,579
    Soefara
    Feb 11, 2004
  2. sumit
    Replies:
    0
    Views:
    344
    sumit
    Mar 10, 2012
  3. Andrew Berg

    Making safe file names

    Andrew Berg, May 7, 2013, in forum: Python
    Replies:
    2
    Views:
    125
    Andrew Berg
    May 8, 2013
  4. Terry Jan Reedy

    Re: Making safe file names

    Terry Jan Reedy, May 7, 2013, in forum: Python
    Replies:
    0
    Views:
    113
    Terry Jan Reedy
    May 7, 2013
  5. Fábio Santos

    Re: Making safe file names

    Fábio Santos, May 7, 2013, in forum: Python
    Replies:
    0
    Views:
    108
    Fábio Santos
    May 7, 2013
Loading...

Share This Page