Getting path components

A

Aaron Gray

Has anyone got a code snippet to separate out the path components, ie drive,
path, filename, and extension ?

Many thanks in advance,

Aaron
 
W

Walter Roberson

Has anyone got a code snippet to separate out the path components, ie drive,
path, filename, and extension ?

No, there is no portable way to do that.

Standard C does not place any interpretation upon filenames
passed to the Standard I/O library functions, and has no idea
what a "drive", "path", or "extension" is. Standard C also has
no idea of what character or characters are used to seperate
the various components, including having no opinion on the very
existance of directories let alone the directory seperator.

Furthermore, with Microsoft's NTFS filesystem, the old drive lettering
system is superceeded, and drive names there can look like directory
names; NTFS also supports virtual mount points, so what -looks- like
a file X in directory Y might in fact be program Y with
command argument X. This use of Alternate Data Streams is not
distinguished in any syntactic way.

There are valid Unix file names that look -exactly- the same
as some valid MS Windows file names, and yet have completely
different interpretation. Then there's openVMS, and MVS, and lots
of other strange filesystem naming schemes. Oh yes and remember
that in MacOS (before MacOS X) that ':' was the directory seperator...
 
A

Antoninus Twink

Has anyone got a code snippet to separate out the path components, ie drive,
path, filename, and extension ?

With POSIX, you can use dirname() and basename() in combination. There's
also realpath() if you need it.

I don't know what the equivalent functions are on MS-Windows - maybe
jacob navia will drop in with the answer.
 
A

Aaron Gray

Antoninus Twink said:
With POSIX, you can use dirname() and basename() in combination. There's
also realpath() if you need it.

I don't know what the equivalent functions are on MS-Windows - maybe
jacob navia will drop in with the answer.

basename.c from BSD :-

http://www.krugle.org/kse/files?query=basename#3

is just :-

strrchr(path, '/')

basename from OpenSSH :-

http://www.krugle.org/kse/codespaces/DXRo7v

dirname from OpenSSH :-

http://www.krugle.org/kse/codespaces/B4cBVF

realpath from OpenSSH :-

http://www.krugle.org/kse/codespaces/Cn10Ja

Should be nice and secure :)

Thanks,

Aaron


Aaron
 
W

Walter Roberson

With POSIX, you can use dirname() and basename() in combination. There's
also realpath() if you need it.

Those are not part of POSIX.1-1990; they were not added until
Single Unix Specification Issue 4, Version 2, and were not
moved from X/OPEN UNIX extension to BASE until Issue 5 (in 2002, I think,
but I'm not sure.)
 
W

Walter Roberson

B

Bartc

Aaron Gray said:
Has anyone got a code snippet to separate out the path components, ie
drive, path, filename, and extension ?
If on Windows, I think there are the Path functions in the Windows Shell
(although I've never used them).

But code to parse a filespec isn't that difficult. I put together some code
below which may or may not be useful. It doesn't extract drive letters,
that's considered part of the path.

This may be specific to Windows of course and probably won't work for
anything else or for network/internet paths.

#include <stdio.h>
#include <string.h>

/* Extract only the path from a filespec */
/* All dest parameters here must point to a string at least the same length
as fs */
void extractpath(char *fs, char *dest) {
int l=strlen(fs),i;

*dest=0;

for (i=l; i>=1; --i)
if (fs[i-1]=='\\' || fs[i-1]=='/') {
strcpy(dest,fs);
dest=0;
return;
};
return;
}

/* Extract a full filename including extension */
void extractfile(char *fs, char *dest) {
int a,b;

extractpath(fs,dest); /* start with path */

a=strlen(fs);
b=strlen(dest);

if (b>=a) return; /* no file? */

/* return the a-b chars at the end of fs */
strcpy(dest,fs+b);
}

/* Extract the extension only. Period = 0 or 1 to return a null extension as
"" or "." */
void extractext(char *fs,char *dest,int period) {
int i,n;

extractfile(fs,dest);
n=strlen(dest);
if (!n) return; /* no file so no ext */

/* look for right-most period */

for (i=n; i>=1; --i)
if (dest[i-1]=='.') {
strcpy(dest,dest+i); /* possible overlap copy */
if (*dest==0 && period) {
strcpy(dest,".");
return;
};
return;
};

/* no period in file */
*dest=0;
if (period)
strcpy(dest,".");
return;
}

/* Extract the filename excluding the extension, and with no trailing period
*/
void extractbasefile(char *fs, char *dest) {
int a,b;

extractext(fs,dest,0);
a=strlen(dest); /* chars in extension */

extractfile(fs,dest); /* filename with extension */
b=strlen(dest)-a; /* chars in file w/o ext */

dest=0; /* exclude extension */

if (dest[b-1]=='.') dest[b-1]=0; /* avoid trailing period */
}

/* MAIN() */
int main(void) {
char filespec[300];
char path[300];
char file[300];
char basefile[300];
char ext[300];

strcpy(filespec,"C:\\ABC\\DEF\\GHI.JKL");

extractpath(filespec,path);
extractfile(filespec,file);
extractbasefile(filespec,basefile);
extractext(filespec,ext,0);

printf("File spec = %s\n",filespec);
printf("Path = %s\n",path);
printf("File = %s\n",file);
printf("Basefile = %s\n",basefile);
printf("Ext = %s\n",ext);

}
 
W

Walter Roberson

But code to parse a filespec isn't that difficult. I put together some code
below which may or may not be useful. It doesn't extract drive letters,
that's considered part of the path.

Where is the check for leading // ? In POSIX, starting a path with
exactly two / has an implementation defined result for the *entire*
path, but if there are three or more / at the beginning of the path,
then all of them are to be collapsed to a single / .

If your code doesn't handle leading / on paths according to POSIX rules
then it isn't portable to POSIX systems. And since the behaviour
for // is implementation defined, no portable code using only C standard
libraries can determine the proper system behaviour for // .
So contrary to your assertion, code to parse a filespec is not just
difficult but impossible using only standard C.

Remember: the original poster did not specify a target OS, so the
question was a generalized one about parsing *all* possible filenames
on *all* hosted systems that C exists on.
 
A

Aaron Gray

Bartc said:
Aaron Gray said:
Has anyone got a code snippet to separate out the path components, ie
drive, path, filename, and extension ?
If on Windows, I think there are the Path functions in the Windows Shell
(although I've never used them).

But code to parse a filespec isn't that difficult. I put together some
code
below which may or may not be useful. It doesn't extract drive letters,
that's considered part of the path.

This may be specific to Windows of course and probably won't work for
anything else or for network/internet paths.

#include <stdio.h>
#include <string.h>

/* Extract only the path from a filespec */
/* All dest parameters here must point to a string at least the same
length
as fs */
void extractpath(char *fs, char *dest) {
int l=strlen(fs),i;

*dest=0;

for (i=l; i>=1; --i)
if (fs[i-1]=='\\' || fs[i-1]=='/') {
strcpy(dest,fs);
dest=0;
return;
};
return;
}

/* Extract a full filename including extension */
void extractfile(char *fs, char *dest) {
int a,b;

extractpath(fs,dest); /* start with path */

a=strlen(fs);
b=strlen(dest);

if (b>=a) return; /* no file? */

/* return the a-b chars at the end of fs */
strcpy(dest,fs+b);
}

/* Extract the extension only. Period = 0 or 1 to return a null extension
as
"" or "." */
void extractext(char *fs,char *dest,int period) {
int i,n;

extractfile(fs,dest);
n=strlen(dest);
if (!n) return; /* no file so no ext */

/* look for right-most period */

for (i=n; i>=1; --i)
if (dest[i-1]=='.') {
strcpy(dest,dest+i); /* possible overlap copy */
if (*dest==0 && period) {
strcpy(dest,".");
return;
};
return;
};

/* no period in file */
*dest=0;
if (period)
strcpy(dest,".");
return;
}

/* Extract the filename excluding the extension, and with no trailing
period
*/
void extractbasefile(char *fs, char *dest) {
int a,b;

extractext(fs,dest,0);
a=strlen(dest); /* chars in extension */

extractfile(fs,dest); /* filename with extension */
b=strlen(dest)-a; /* chars in file w/o ext */

dest=0; /* exclude extension */

if (dest[b-1]=='.') dest[b-1]=0; /* avoid trailing period */
}

/* MAIN() */
int main(void) {
char filespec[300];
char path[300];
char file[300];
char basefile[300];
char ext[300];

strcpy(filespec,"C:\\ABC\\DEF\\GHI.JKL");

extractpath(filespec,path);
extractfile(filespec,file);
extractbasefile(filespec,basefile);
extractext(filespec,ext,0);

printf("File spec = %s\n",filespec);
printf("Path = %s\n",path);
printf("File = %s\n",file);
printf("Basefile = %s\n",basefile);
printf("Ext = %s\n",ext);

}

Sorry Bartc, I go with the OpenSSH code, its ben tested properly as part of
a secure application.

Drive does not really matter for what I need, I just wrote that for
completeness.

Aaron
 
U

user923005

drive, path, filename, and extension ?

If on Windows, I think there are the Path functions in the Windows Shell
(although I've never used them).

But code to parse a filespec isn't that difficult. I put together some code
below which may or may not be useful. It doesn't extract drive letters,
that's considered part of the path.

This may be specific to Windows of course and probably won't work for
anything else or for network/internet paths.

#include <stdio.h>
#include <string.h>

/* Extract only the path from a filespec */
/* All dest parameters here must point to a string at least the same length
as fs */
void extractpath(char *fs, char *dest) {
int l=strlen(fs),i;

*dest=0;

for (i=l; i>=1; --i)
  if (fs[i-1]=='\\' || fs[i-1]=='/') {

For some file systems, '\' is a valid part of a filename (as opposed
to a separator).
For some file systems, '/' is a valid part of a filename (as opposed
to a separator).
For *either* file system, this loop will misinterpret the character
that is not a separator as a separator.
On OpenVMS, the separator is '.' and paths start with a drive or
symbol, followed by colon, followed by '[' and ending with ']' before
the file name.
    strcpy(dest,fs);
    dest=0;
    return;
  };
return;

}

[snip]

Relatively portable solutions to problems of this nature are found in
SFL:
http://legacy.imatix.com/html/sfl/
see: strip_file_path(), strip_file_name(), file_is_directory()

and ACE:
http://www.cs.wustl.edu/~schmidt/ACE.html
See: basename()
 
K

Kenny McCormack

Walter Roberson said:
Remember: the original poster did not specify a target OS, so the
question was a generalized one about parsing *all* possible filenames
on *all* hosted systems that C exists on.

Only if you are insane.

Q: Can I get a drink of water?
A: What is water - do you mean all possible liquids, in all possible
containers, in all possible molecular configurations, etc, etc, etc?
Q: No, I'm just thirsty. If you can't help, just don't bother.
 
B

Bartc

Sorry Bartc, I go with the OpenSSH code, its ben tested properly as part
of a secure application.

No problem. I translated those functions from some interpreted code into C,
and will use them myself.
 
B

Bartc

Where is the check for leading // ? In POSIX, starting a path with

What's the problem with // or ///? They are just considered part of the
path. Any semantics are applied by something else. This is just a bit of
syntax handling after all, not half a complete file system.

Anyway, I did say:
This may be specific to Windows of course and probably won't work for
anything else or for network/internet paths.

The OP mentioned drive letters, which suggested Windows. And he wanted a
'code snippet', which indicated something informal. To do what you say
probably wouldn't be a snippet!

(And I've since tried this code on URLs and it seems well-behaved, on those
3 or 4 anyway.)
Remember: the original poster did not specify a target OS, so the
question was a generalized one about parsing *all* possible filenames
on *all* hosted systems that C exists on.

Let's see:

Aaron Gray said:
Has anyone got a code snippet to separate out the path components, ie
drive,
path, filename, and extension ?

No, he didn't ask anything of the kind. I doubt if the terms drive, path,
filename and extension are meaningful on every system.
 
B

Bartc

But code to parse a filespec isn't that difficult.

<code snipped>

Don't know if anyone is taking this code fragment seriously, but, to deal
with filespecs containing only drives not paths, and a filespec containing
only a path, not a file, these two fixes are needed on the similar lines in
the code:

if (fs[i-1]=='\\' || fs[i-1]=='/' || fs[i-1]==':') {
....
if (b>=a) { *dest=0; return;}; /* no file? */

Paths containing periods (".") not bounded on the right by / or \ are not
handled at all. This part /is/ difficult because of some ambiguities. That
is left as an exercise :)

Malformed filespecs are also not handled; to deal with user input for
example requires a different approach.
 
A

Aaron Gray

Kenny McCormack said:
Only if you are insane.

Q: Can I get a drink of water?
A: What is water - do you mean all possible liquids, in all possible
containers, in all possible molecular configurations, etc, etc, etc?
Q: No, I'm just thirsty. If you can't help, just don't bother.

Don't be predantic :)

Aaron
 
F

Flash Gordon

Bartc wrote, On 22/04/08 11:02:
Let's see:



No, he didn't ask anything of the kind. I doubt if the terms drive, path,
filename and extension are meaningful on every system.

They are certainly applicable on VMS where, IIRC, a file name including
full path could be:
SYS$USERDISK1:[some.dir.path]filename.ext;42
Where SYS$USERDISK1 is the (possibly logical) name of a disk, the bit in
square brackets is the path, and the 42 at the end is the file version
and *not* part of the file name of extension. So the question would be
applicable to at least one system with a completely different way of
doing things.
 
K

Kenneth Brody

santosh said:
Aaron Gray wrote: [...]
Don't be predantic :)

To be pedantic:

s/predantic/pedantic

:)

<mode pedantic="extra">

s:$:/:

(-:

</mode>

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top