Methods to handle filename extensions?

G

gmtonyhoyt

I need assistance coming up with a clean way to handle filename
extensions with my application.

While I can come up with several ways of doing so on my own, I felt
perhaps it would be worth asking here to see what more effective or
accepted methods for doing would be presented.

The issue is this. My application accepts files names, intended for
a cross platform unix and windows environment, from the command line.
So the application may get a file name '/home/joe/myExample.ct',
'C:\Work\myExample.ct' or more generically 'myExample.ct' My
application is to convert the data in the .ct file, into something new,
then enter the data into a file in the local directory the application
was executed named 'myExample.out'.

Is their a simplified way to handle this problem?

Tony
 
E

Eric Sosman

I need assistance coming up with a clean way to handle filename
extensions with my application.

While I can come up with several ways of doing so on my own, I felt
perhaps it would be worth asking here to see what more effective or
accepted methods for doing would be presented.

The issue is this. My application accepts files names, intended for
a cross platform unix and windows environment, from the command line.
So the application may get a file name '/home/joe/myExample.ct',
'C:\Work\myExample.ct' or more generically 'myExample.ct' My
application is to convert the data in the .ct file, into something new,
then enter the data into a file in the local directory the application
was executed named 'myExample.out'.

Is their a simplified way to handle this problem?

The C language and library don't say much about the
syntax of file names. They're represented as strings and
therefore can't contain '\0' characters, there are a few
values like FILENAME_MAX and L_tmpnam, and that's about
all C has to say about filenames. In particular, C has
no built-in facilities for picking apart filenames and
putting them back together again to generate new ones --
well, there's tmpnam(), but that hardly meets your need.

That said, if you happen to know that file names on
the systems you care about look like those above, you can
use straightforward string-bashing to get the job done.
Something like this might work:

const char *old = "/home/joe/myExample.ct";
const char *ext = ".out";
char *new;
char *p, *q;

/* Find rightmost "component" */
p = strrchr(old, '/');
p = (p == NULL) ? old : p + 1;
q = strrchr(p, '\\');
p = (q == NULL) ? p : q + 1;

/* Find start of "extension" */
q = strrchr(p, '.');
p = (q == NULL) ? p + strlen(p) : q;

/* Assemble new name */
new = malloc(p - old + sizeof ext);
if (new == NULL) abort();
memcpy (new, old, p - old);
strcpy (new + (p - old), ext);

This will mis-handle some unusual file names -- for example,
it would botch the valid POSIX file name "myExample.\\", and
its treatment of "/home/joe/.cshrc" is suspect. You might
be able to improve matters with some extra tests and a few
system-specific #ifdef's. In any case, thank your lucky
stars that you aren't dealing with (Open)VMS!

$DISK1:<ROOT.>[HOME.JOE]MYEXAMPLE.CT;3
 
N

No Such Luck

Is their a simplified way to handle this problem?

I've used these functions before, without problems. Any suggestions?

#define MAXFNAME 2048 // max length of filenames expected
#define EXTLEN 10 // max extension length expected

char * get_ext (char * fname) // returns the extension from a filename
{
char * ext;
int i, j;
ext = (char *)malloc(sizeof(char) * EXTLEN);

for ( i = strlen(fname)+1; i > 0; i--)
{
if (fname == '.')
{
for (j = i+1; j < strlen(fname)+1; j++)
{
ext[j-i-1] = fname[j];
}
i = 0;
}
}
return ext;
}

char * get_basefilename (char * fname) // returns the filename minus
the extension
{
char * ext;
int i, j;
ext = (char *)malloc(sizeof(char) * MAXFNAME);

for ( i = strlen(fname)+1; i > 0; i--)
{
if (fname == '.')
{
for (j = 0; j < i; j++)
{
ext[j] = fname[j];
}
ext = '\0';
i = 0;
}
}
return ext;
}
 
M

Martin Ambuhl

I need assistance coming up with a clean way to handle filename
extensions with my application.

While I can come up with several ways of doing so on my own, I felt
perhaps it would be worth asking here to see what more effective or
accepted methods for doing would be presented.

The issue is this. My application accepts files names, intended for
a cross platform unix and windows environment, from the command line.
So the application may get a file name '/home/joe/myExample.ct',
'C:\Work\myExample.ct' or more generically 'myExample.ct' My
application is to convert the data in the .ct file, into something new,
then enter the data into a file in the local directory the application
was executed named 'myExample.out'.

Is their a simplified way to handle this problem?

There is a simple way to generate such names (you have not specified the
problem). The following replaces the extension; I'm sure you can work
out the simple problem of replacing an initial segment. Note that C has
no concept of directories, local or otherwise. From the standpoint of C
you are doing nothing more than trivial manipulation of the character
strings that represent the names associated with the streams.

#include <stdio.h>
#include <string.h>

int main(void)
{
char input_name[] = "/home/joe/myExample.ct", new_extension[] =
".out";
char output_name[FILENAME_MAX], *dot;
strcpy(output_name, input_name);
if ((dot = strrchr(output_name, '.')))
*dot = 0;
strcat(output_name, new_extension);
printf("The input filename was \"%s\"\n"
"The output filename is \"%s\"\n", input_name, output_name);
return 0;
}

The input filename was "/home/joe/myExample.ct"
The output filename is "/home/joe/myExample.out"
 
C

Chris Johnson

While I have no direct experience with it, I believe pcre.h would
provide powerful capabilities that the str* functions do not.

This would allow you to deal with files and directories that have
unusual period placement.

Chris
 
L

Lawrence Kirby

Is their a simplified way to handle this problem?

I've used these functions before, without problems. Any suggestions?

#define MAXFNAME 2048 // max length of filenames expected
#define EXTLEN 10 // max extension length expected

char * get_ext (char * fname) // returns the extension from a filename
{
char * ext;
int i, j;
ext = (char *)malloc(sizeof(char) * EXTLEN);

for ( i = strlen(fname)+1; i > 0; i--)
{
if (fname == '.')
{
for (j = i+1; j < strlen(fname)+1; j++)
{
ext[j-i-1] = fname[j];
}
i = 0;
}
}
return ext;
}


You could simplify this by using the standard C library function strrchr()
(note the 2 r's). But consider what it does for pathnames like foo.bar/baz

Lawrence
 
C

CBFalconer

I need assistance coming up with a clean way to handle filename
extensions with my application.

While I can come up with several ways of doing so on my own, I
felt perhaps it would be worth asking here to see what more
effective or accepted methods for doing would be presented.

The issue is this. My application accepts files names, intended
for a cross platform unix and windows environment, from the
command line. So the application may get a file name
'/home/joe/myExample.ct', 'C:\Work\myExample.ct' or more
generically 'myExample.ct' My application is to convert the data
in the .ct file, into something new, then enter the data into a
file in the local directory the application was executed named
'myExample.out'.

Is their a simplified way to handle this problem?

It is up to you to construct a suitable file name from the users
input. Once you have done that, the rest of the work is done by
passing that name to fopen.

The construction is system dependent, and has nothing to do with
the language. If you can come up with exact definitions, and want
help on code to do specific conversions, then this is the place to
ask. The place to ask about the definitions is on newsgroups
dedicated to the systems involved.
 
G

gmtonyhoyt

CBFalconer said:
The construction is system dependent, and has nothing to do with
the language. If you can come up with exact definitions, and want
help on code to do specific conversions, then this is the place to
ask. The place to ask about the definitions is on newsgroups
dedicated to the systems involved.

I know how to open a file, insert data, close it, move the file
pointer around, etc. That stuff I do know. I also understand that
what I'm being passed is a string initaly, and it's up to me to open
the input and output file handles.

What I'm asking about is more related to string manipulation. That
is, I want a way to know what the actual 'filename' without extension
or path, and then create a new file using that filename alone with the
new extension I've created on my own. I've already defined what a .ct
and a .out files are. This isn't about the specifications of a
specific file type, it's how to in a generic C sence, manipulate a
string containing what's expected to be a valid file name.

Ya know, Their has to be a small library for this sort of thing. I
bet someone's got some kind of class or at least a structure and
function library that takes care of all this for me. I should google
some for that idea and see what comes up. Might even support file
access too.

Tony
 
G

Gordon Burditt

What I'm asking about is more related to string manipulation. That
is, I want a way to know what the actual 'filename' without extension
or path, and then create a new file using that filename alone with the
new extension I've created on my own.

A key thing to remember here is that *YOU*, the programmer, are
responsible for all the memory management. You want to copy
something? You find a place to put it. And it's your problem
to make sure there's enough space for it.
I've already defined what a .ct
and a .out files are. This isn't about the specifications of a
specific file type, it's how to in a generic C sence, manipulate a
string containing what's expected to be a valid file name.

First, you need to define what a file extension is.
This isn't necessarily easy. First try at it:

1. The file extension is the part of the file/path name between
the last period and the end of the file name.

In this case, you can copy from the beginning of the file name
up to and including the last period (copy into a buffer that's
big enough for the new file name), then append your own new extension.
Finding the last period is easy with strrchr(). Be sure to deal
with the case where there isn't one.

This has a number of problems. The normal use of the term doesn't
match up with the definition above in a number of cases, such as:

a. /home/bob/.login no extension
b. /usr/bin/diff no extension
c. /home/bob/main. no extension
d. /home/bob.j/main no extension
e. .login no extension

2. The file extension is the part of the file/path name between
the last period and the end of the file name. If there is a directory
separator (/ in UNIX, typically \ in Windows or MS-DOS) between the
last period and the end of the file name, the extension is null.
(takes care of (d)). If there is a directory separator or the
beginning of the file name immediately preceeding the last period,
the extension is null. (takes care of (a) and (e)). If there is
no period in the file name, the extension is null. (takes care of
(b)). If the file name ends with a period, the extension is null.
But be careful not to end up doubling up on the periods. (takes
care of (c)).

You divide the file name into three parts: the prefix, the period,
and the extension. The extension may be null (and does not include
the period). The period may or may not actually be there (see cases
(a), (b), (d), and (e). The prefix is the part of the file name
before the extension and before the immediately preceeding period,
if there is one. If the extension is null, the prefix is the whole
file name, minus a trailing period, if any.

To make a new file name with a different extension, concatenate the
prefix, a period, and the new extension. To make a new file name
without an extension, use the prefix.

I'm sure someone is going to come up with some special cases I've
forgotten. And someone may not like the way I handled case (c)
identically to leaving off the trailing period.

Gordon L. Burditt
 
J

Jean-Claude Arbaut

Le 15/06/2005 22:58, dans (e-mail address removed), « Gordon
Burditt » said:
I'm sure someone is going to come up with some special cases I've
forgotten.

More or less in your list, but:
INSTALL.macppc one dot, but not really an extension
patch-2.6.11.12.bz2 many dots
linux-2.6.11.12.tar.gz many dots and extension with 2 dots
(if we consider .tar.gz is the extension)

There is yet another problem: on some architectures, there may be
non-ascii characters, and depending on their coding, things can
get complicated. Not for the extension, but filename length at least,
if multibyte chars are used (e.g. UTF8 on HFS+ partitions).
 
C

CBFalconer

I know how to open a file, insert data, close it, move the file
pointer around, etc. That stuff I do know. I also understand that
what I'm being passed is a string initaly, and it's up to me to open
the input and output file handles.

What I'm asking about is more related to string manipulation. That
is, I want a way to know what the actual 'filename' without extension
or path, and then create a new file using that filename alone with the
new extension I've created on my own. I've already defined what a .ct
and a .out files are. This isn't about the specifications of a
specific file type, it's how to in a generic C sence, manipulate a
string containing what's expected to be a valid file name.

You haven't come up with the definitions. There are string
manipulating routines, such as strcat and strcpy, available. We
haven't the foggiest idea what string can represent a file on your
system. Maybe it needs a 12 digit hexadecimal representation, for
all I know. If you are dealing with MsDos there are some rules,
CP/M has other rules, Unix still others, Windoze others,
UncleBobsEmbeddedMachines has still another set. Not to mention
VMS, OS2, MACs, Burroughs, etc. Some systems have paths and/or
extensions, others do not. So go somewhere where the mouth of the
horse can be seen and heard.
 
G

Gordon Burditt

I'm sure someone is going to come up with some special cases I've
More or less in your list, but:
INSTALL.macppc one dot, but not really an extension

I think this one really *IS* an extension.
patch-2.6.11.12.bz2 many dots

Here, it works like I think it should: bz2 is the extension.
Although, after you uncompress it, .12 is the extension,
and that doesn't really fit in with the usual use of extensions.
linux-2.6.11.12.tar.gz many dots and extension with 2 dots
(if we consider .tar.gz is the extension)

You have a point here, but I really don't know how to generalize
it. "If the string immediately following an extension commonly
used by a compression program is ".tar.", the "tar." is part of the
extension?" That's a bit wierd. It's also true that since the
LAST extension is .gz, the appropriate thing to feed it to is
gunzip, or you could feed it to tar with the z option. On the
other hand, .tar.bz2 should be fed to bunzip2, and as far as I
know, there is *NO* option to tar to directly unpack a .tar.bz2
file.

There is yet another problem: on some architectures, there may be
non-ascii characters, and depending on their coding, things can
get complicated. Not for the extension, but filename length at least,
if multibyte chars are used (e.g. UTF8 on HFS+ partitions).

If you know what character set your file name is in, you shouldn't
have too much trouble, except for the problem of "what's a period".
of there is more than one character code used for one.

Gordon L. Burditt
 
J

Jean-Claude Arbaut

Le 16/06/2005 00:33, dans (e-mail address removed), « Gordon
Burditt » said:
I think this one really *IS* an extension.

Bof... It's the help file for installing OpenBSD on Macintosh.
Only a plain ASCII text file. The extension should be .txt.
I should have clarified this I think.

Sometimes, files have dots, but no extension (I thought
this was a valid example, if not I'm sure you will find some).
I once thought it was possible to cope with that by enforcing
length(ext)<=5 (or any other value), but MacOSX showed up
with some exotic extensions, like "blabla.webarchive".

There is even a strange situation on MacOSX: some folders
are treated like plain files, under unix command line
they are folders, but they should be managed like files.
An example: "blabla.rtfd" would be a folder containing
an RTF file, and some images. An "application" is
merely a unix executable, plus all its resource files,
etc... Not much to do with your problem, but if you
write a program to manage rtfd "files", you need to
be careful with that.
 
K

Keith Thompson

You have a point here, but I really don't know how to generalize
it. "If the string immediately following an extension commonly
used by a compression program is ".tar.", the "tar." is part of the
extension?" That's a bit wierd. It's also true that since the
LAST extension is .gz, the appropriate thing to feed it to is
gunzip, or you could feed it to tar with the z option. On the
other hand, .tar.bz2 should be fed to bunzip2, and as far as I
know, there is *NO* option to tar to directly unpack a .tar.bz2
file.

<OT>
Yes, there is (at least in recent versions of GNU tar).
</OT>

To add to the confusion, compressed tar files are common called
"foo.tgz" rather than "foo.tar.gz".
 
K

Keith Thompson

I need assistance coming up with a clean way to handle filename
extensions with my application.

While I can come up with several ways of doing so on my own, I felt
perhaps it would be worth asking here to see what more effective or
accepted methods for doing would be presented.

The issue is this. My application accepts files names, intended for
a cross platform unix and windows environment, from the command line.
So the application may get a file name '/home/joe/myExample.ct',
'C:\Work\myExample.ct' or more generically 'myExample.ct' My
application is to convert the data in the .ct file, into something new,
then enter the data into a file in the local directory the application
was executed named 'myExample.out'.

Is their a simplified way to handle this problem?

You might consider simplifying the problem. When I do this kind of
thing, I generally just append a suffix to the original file name,
without worrying about what the initial suffix is, if any.

For example, "/home/joe/myExample.ct" would be mapped to
"/home/joe/myExample.ct.out".

Whether you can get away with this depends on the requirements, and
possibly on what the OS supports. Some systems (VMS, for example)
don't allow multiple '.' characters in file names, and some may impose
limits on how long a file name can be.

If you really need to replace the extension with a different one,
you're going to have to define exactly what you mean by that before we
can help you. Show us a few more examples, and be prepared to tell us
what you need for any examples we might offer. Consider what to do if
the file name has no extension. On Unix, a '\' is a valid (but
unlikely) character in a file name, so "ab.cd\ef" probably has an
extension of "ef" on Winddows and "cd\ef" on Unix. Or you can reject
anything that's too weird, but then you need to define what "too
weird" means.

Defining the problem precisely is up to you.
 
L

Lawrence Kirby

On Thu, 16 Jun 2005 05:52:31 +0000, Keith Thompson wrote:

....

On Unix, a '\' is a valid (but
unlikely) character in a file name, so "ab.cd\ef" probably has an
extension of "ef" on Winddows and "cd\ef" on Unix.

Under Windows I would expect it to have a base filename of ef with no
extension.

Lawrence
 
C

Chris Croughton

What I'm asking about is more related to string manipulation. That
is, I want a way to know what the actual 'filename' without extension
or path, and then create a new file using that filename alone with the
new extension I've created on my own. I've already defined what a .ct
and a .out files are. This isn't about the specifications of a
specific file type, it's how to in a generic C sence, manipulate a
string containing what's expected to be a valid file name.

strchr(), strrchr(), strcpy(), strncpy(), etc.

More than that can't be determined until you know which filename syntax
you are using.
Ya know, Their has to be a small library for this sort of thing. I
bet someone's got some kind of class or at least a structure and
function library that takes care of all this for me. I should google
some for that idea and see what comes up. Might even support file
access too.

I have a function which works for MSDOS/Windows and Unix file syntax.
It doesn't always get things right (as I recall it treats .bashrc as
having no base name and all extension) but works for the files I use it
with. Email me if you want a copy.

Chris C
 
G

gmtonyhoyt

Keith said:
If you really need to replace the extension with a different one,
you're going to have to define exactly what you mean by that before we
can help you.

Okay, the intended platforms, and only supported platforms are going
to be unix based (Solaris and Linux) and Microsoft based (Windows and
MSDos) only. Here, I'll simplify the whole solution even further.

-- 1) If the file ends with '.ct' everything before that, that is not a
directory definition, will be used as a filename, and have the '.out'
extension appended to it.

-- 2) All other paths+filenames will have the path portion striped off,
and the remaining filename used, with '.out' appended to the end of it.

-- 3) A filename must contain only alpha numeric characters and in
addition may contain a period, dash, underscore and the left and right
brackets []. While the unix enviorment may support more, I'm going to
limit it to this for now. This actually may add complexity to the
code, so excuse me if it does.

So, for example

i) 'myFile.ct' -> 'myFile.out'
ii) '.myfile' -> '.myfile.out'
iii) '/home/myDir/myFile.dat' -> 'myFile.dat.out'
iv) 'C:\home\myDir.ct\myFile' -> 'myFile.out'
v) 'C:\myFile.dat.bak.ct' -> 'myFile.dat.bak.out'
vi) 'my_file[123].' -> 'my_file[123].out'
Defining the problem precisely is up to you.

Well I can only hope that this sufficently resolves the issue but in
general, I'm not exactly sure what more you could use, but ask away.

Tony
 
E

Eric Sosman

Keith said:
If you really need to replace the extension with a different one,
you're going to have to define exactly what you mean by that before we
can help you.


Okay, the intended platforms, and only supported platforms are going
to be unix based (Solaris and Linux) and Microsoft based (Windows and
MSDos) only. Here, I'll simplify the whole solution even further.

-- 1) If the file ends with '.ct' everything before that, that is not a
directory definition, will be used as a filename, and have the '.out'
extension appended to it.

-- 2) All other paths+filenames will have the path portion striped off,
and the remaining filename used, with '.out' appended to the end of it.

-- 3) A filename must contain only alpha numeric characters and in
addition may contain a period, dash, underscore and the left and right
brackets []. While the unix enviorment may support more, I'm going to
limit it to this for now. This actually may add complexity to the
code, so excuse me if it does.

So, for example

i) 'myFile.ct' -> 'myFile.out'
ii) '.myfile' -> '.myfile.out'
iii) '/home/myDir/myFile.dat' -> 'myFile.dat.out'
iv) 'C:\home\myDir.ct\myFile' -> 'myFile.out'
v) 'C:\myFile.dat.bak.ct' -> 'myFile.dat.bak.out'
vi) 'my_file[123].' -> 'my_file[123].out'

Hmmm: Example vi seems to contradict Requirement 2.
I'm going to take Example vi as indicating your true intent.

#define VALID_CHARS "abcdefghijklmnopqrstuvwxyz" \
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" \
"0123456789.-_[]"
#define OLD_EXTENSION ".ct"
#define NEW_EXTENSION ".out"

const char *old = "C:\\home\\myDir.ct\\myFile";
char *new;
const char *p;

/* Skip past directory components */
p = strrchr(old, '/');
if (p != NULL)
old = p + 1;
p = strrchr(old, '\\');
if (p != NULL)
old = p + 1;

/* Check for valid alphabet (Examples iii, iv, and v
* suggest that you don't want this check to be made
* on "the path portion")
*/
if (strspn(old, VALID_CHARS) < strlen(old))
reject_bad_filename();

/* Find extension (if any) and decide whether to
* replace or retain it
*/
p = strrchr(old, '.');
if (p == NULL || strcmp(p, OLD_EXTENSION) != 0)
p = old + strlen(old);

/* Assemble new name */
new = malloc( (p - old) + sizeof NEW_EXTENSION );
if (new == NULL)
die_horribly();
memcpy (new, old, p - old);
strcpy (new + (p - old), NEW_EXTENSION);

Note that there's no special handling for inputs with
a Windows drive designator but no path components: "c:file"
will be rejected as bad because of the colon in what the
code thinks of as "the file name." If that's not what you
want, change it.
 
C

CBFalconer

.... snip ...

Okay, the intended platforms, and only supported platforms are going
to be unix based (Solaris and Linux) and Microsoft based (Windows and
MSDos) only. Here, I'll simplify the whole solution even further.

-- 1) If the file ends with '.ct' everything before that, that is not a
directory definition, will be used as a filename, and have the '.out'
extension appended to it.

-- 2) All other paths+filenames will have the path portion striped off,
and the remaining filename used, with '.out' appended to the end of it.

-- 3) A filename must contain only alpha numeric characters and in
addition may contain a period, dash, underscore and the left and right
brackets []. While the unix enviorment may support more, I'm going to
limit it to this for now. This actually may add complexity to the
code, so excuse me if it does.

So, for example

i) 'myFile.ct' -> 'myFile.out'
ii) '.myfile' -> '.myfile.out'
iii) '/home/myDir/myFile.dat' -> 'myFile.dat.out'
iv) 'C:\home\myDir.ct\myFile' -> 'myFile.out'
v) 'C:\myFile.dat.bak.ct' -> 'myFile.dat.bak.out'
vi) 'my_file[123].' -> 'my_file[123].out'

Now you are getting somewhere. Does the char set apply to paths,
and what constitutes a path? Does the presense of a '\' or '/'
define the system and thus the path portions? How, if at all, is a
MsDos disk identifier to be translated to or from?
Well I can only hope that this sufficently resolves the issue but in
general, I'm not exactly sure what more you could use, but ask away.

This is for us to use? I didn't realize you were expecting to pay
good money for us to do your programming for you. The rate here,
for quickie jobs like this, is USD200 per hour, with a 4 hour
minimum. We can throw in prodding you into defining the job for
free in this case, although that would normally be an additional
charge. We will have to define the gazinta and gazouta for the
supplied module. I suggest a char string for each, with the
gazouta being a malloced string, which will require eventual
freeing. So there will be two routines, with a and b depending on
your naming preference:

char *afromb(char *bstring);

Boy, look at all you are getting for free!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top