Making safe file names

A

Andrew Berg

Currently, I keep Last.fm artist data caches to avoid unnecessary API calls and have been naming the files using the artist name. However,
artist names can have characters that are not allowed in file names for most file systems (e.g., C/A/T has forward slashes). Are there any
recommended strategies for naming such files while avoiding conflicts (I wouldn't want to run into problems for an artist named C-A-T or
CAT, for example)? I'd like to make the files easily identifiable, and there really are no limits on what characters can be in an artist name.
 
J

Jens Thoms Toerring

Andrew Berg said:
Currently, I keep Last.fm artist data caches to avoid unnecessary API calls
and have been naming the files using the artist name. However, artist names
can have characters that are not allowed in file names for most file systems
(e.g., C/A/T has forward slashes). Are there any recommended strategies for
naming such files while avoiding conflicts (I wouldn't want to run into
problems for an artist named C-A-T or CAT, for example)? I'd like to make
the files easily identifiable, and there really are no limits on what
characters can be in an artist name. --

It's not clear what the context that you need this for. You
could e.g. replace all characters not allowed by the file
system by their hexidecimal (ASCII) values, preceeded by a
'%" (so '/' would be changed to '%2F', and also encode a '%'
itself in a name by '%25'). Then you have a well-defined
two-way mapping ("isomorphic" if I remember my math-lear-
nining days correctly) between the original name and the
way you store it. E.g.

"C/A/T" would become "C%2FA%2FT"

and

"C%2FA/T" would become "C%252FA%2FT"

You can translate back and forth between them with not too
much effort.

Of course, that assumes that '%' is a character allowed by
your file system - otherwise pick some other one, any one
will do in principle. It's a bit harder for a human to in-
terpret but rathe likely not that much of a problem. You
probably will have seen that kind of scheme used in URLs.
The concept is rather old and called 'escape character',
i.e. have one character that assumes some special meaning
and also "escaped" it.

If, on the hand, those names are never to be translated back
to the original name another strategy would be to use the SHA1
hash value of the artists name. Since clashes between SHA1 hash
values are rather hard to produce it's a rather safe method of
converting something (i.e. the artists name) to a number. The
drawback, of course, is that you can't translate back from the
hash value to the original name (if that would be simple the
whole thing wouldn't work;-)

Regards, Jens
 
A

Andrew Berg

You
could e.g. replace all characters not allowed by the file
system by their hexidecimal (ASCII) values, preceeded by a
'%" (so '/' would be changed to '%2F', and also encode a '%'
itself in a name by '%25'). Then you have a well-defined
two-way mapping ("isomorphic" if I remember my math-lear-
nining days correctly) between the original name and the
way you store it. E.g.

"C/A/T" would become "C%2FA%2FT"

and

"C%2FA/T" would become "C%252FA%2FT"

You can translate back and forth between them with not too
much effort.

Of course, that assumes that '%' is a character allowed by
your file system - otherwise pick some other one, any one
will do in principle. It's a bit harder for a human to in-
terpret but rathe likely not that much of a problem.
Yes, something like this is what I am trying to achieve. Judging by the responses I've gotten so far, I think I'll have to roll my own
transformation scheme since URL encoding and the like transform Unicode characters. I can memorize that æ¤æ¾ä¼¸å¤« is a Japanese composer who
is well-known for his works in the Final Fantasy series of video games. Trying to match up the URL-encoded version to an artist would be
almost impossible when I have several other artist names that have no ASCII characters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top