Separating directory from file name: Code Review!

J

Joe Laughlin

I've not used C much before, so I don't know how robust or good this code
is. I'd appreciate any feedback or criticisms anyone has!

Thanks,
Joe



#include <stdio.h>
#include <string.h>

#define BUF_LENGTH 1000

int main()
{
char *string = "/some/file/here/file_name";

/* File should be "/file_name", and "file_name" after incrementing */
char* file_name = strrchr( string, '/');
file_name++;

printf ("The file name is %s\n", file_name);

/* Initialize the path */
char *path = (char*)malloc(BUF_LENGTH);
memset(path, 0, BUF_LENGTH);

/* The length of the file path should be the difference between the
* location of file pointer and the string pointer. I can't always
assume
* this, right? */
int length_of_file_path = file_name - string;

/* Put the first length_of_file_path characters into path */
strncat(path, string, length_of_file_path);

printf("The file path is %s\n", path);

}
 
J

Jack Klein

I've not used C much before, so I don't know how robust or good this code
is. I'd appreciate any feedback or criticisms anyone has!

Thanks,
Joe

Oops, I just answered your original question suggesting the use of
strrchr().
#include <stdio.h>
#include <string.h>

Need to add:

#include <stdlib.h>

....here to have a proper prototype for malloc() in scope.
#define BUF_LENGTH 1000

Did you know that <stdio.h> defines a macro FILENAME_MAX? While I
don't know of any system in common use today that allows a path name
to exceed 1000 characters, if you use FILENAME_MAX the buffer will
always be large enough for any legal file name on any system where it
is compiled.
int main()
{
char *string = "/some/file/here/file_name";

/* File should be "/file_name", and "file_name" after incrementing */
char* file_name = strrchr( string, '/');

You need to check for NULL here. While strrchr() can't return a null
pointer in this example, I assume in the real world you will want to
work with arbitrary strings entered interactively by a user or read
from files or the command line. Sooner or later your program will be
handed a string that does not contain a '/' character. Then when you
increment the null pointer or pass it to printf() or strncat() you
generate undefined behavior, most likely causing your operating system
to terminate your program.
file_name++;

printf ("The file name is %s\n", file_name);

/* Initialize the path */
char *path = (char*)malloc(BUF_LENGTH);

Never cast the return value of malloc() in C. If you did this to shut
up compiler warnings, we fixed that properly by including <stdlib.h>
for malloc's prototype. Without that prototype, the cast eliminates
the warning but specifically does the wrong thing on some platforms.
memset(path, 0, BUF_LENGTH);

First, you don't need to set the entire buffer to 0 for what you are
doing. Second, if you do need to allocate memory for an array of
characters and have it all zeroed, calloc() will do both operations in
one call. Note that calloc() is guaranteed to initialize an array of
characters to all valid '\0' values, it is not so guaranteed to do so
with any other type.

But all you really need is one '\0' at the beginning, so you could
just code:

*path = '\0';
/* The length of the file path should be the difference between the
* location of file pointer and the string pointer. I can't always
assume
* this, right? */
int length_of_file_path = file_name - string;

The length of the string prior to the final '/', excluding the final
'/' but allowing for a '\0' in its place to terminate it, is exactly
file_name - string. It can't ever be anything else (if strrchr() does
not return NULL). You can always assume this.
/* Put the first length_of_file_path characters into path */
strncat(path, string, length_of_file_path);

The problem with this is length_of_file_path includes the final '/',
which you do not want to appear in the path string (or do you?). If
you don't, use length_of_file_path - 1 as the length argument.
 
C

Christian Bau

"Joe Laughlin said:
I've not used C much before, so I don't know how robust or good this code
is. I'd appreciate any feedback or criticisms anyone has!

Thanks,
Joe



#include <stdio.h>
#include <string.h>

#define BUF_LENGTH 1000

int main()
{
char *string = "/some/file/here/file_name";

/* File should be "/file_name", and "file_name" after incrementing */
char* file_name = strrchr( string, '/');
file_name++;

There is no guarantee that there is a '/' in file_name. What will happen
then?
printf ("The file name is %s\n", file_name);

/* Initialize the path */
char *path = (char*)malloc(BUF_LENGTH);
memset(path, 0, BUF_LENGTH);

There is no guarantee that the filename is limited to 1000 characters.
/* The length of the file path should be the difference between the
* location of file pointer and the string pointer. I can't always
assume
* this, right? */
int length_of_file_path = file_name - string;

/* Put the first length_of_file_path characters into path */
strncat(path, string, length_of_file_path);

printf("The file path is %s\n", path);

}

Goes horribly wrong on my Macintosh with filenames like
"CB:Documents:Letters:Letter of 15/May/2004".
 
M

Martin Dickopp

Jack Klein said:
Did you know that <stdio.h> defines a macro FILENAME_MAX? While I
don't know of any system in common use today that allows a path name
to exceed 1000 characters,

On Linux, FILENAME_MAX is 4096. In fact, I don't know of any Unix
system in common use today that does /not/ allow a path name of at least
1024 characters.
if you use FILENAME_MAX the buffer will always be large enough for any
legal file name on any system where it is compiled.

What if the system poses no arbitrary limit on the maximum file name
size? If I take the standard literally (and ignore the non-normative
footnote), `FILENAME_MAX' must be defined to the size of the whole
address space on such systems.

Martin
 
J

Joe Laughlin

Jack said:
Oops, I just answered your original question suggesting
the use of strrchr().


Need to add:

#include <stdlib.h>
Noted.


...here to have a proper prototype for malloc() in scope.


Did you know that <stdio.h> defines a macro FILENAME_MAX?
While I don't know of any system in common use today that
allows a path name to exceed 1000 characters, if you use
FILENAME_MAX the buffer will always be large enough for
any legal file name on any system where it is compiled.

Noted again.
You need to check for NULL here. While strrchr() can't
return a null pointer in this example, I assume in the
real world you will want to work with arbitrary strings
entered interactively by a user or read from files or the
command line. Sooner or later your program will be
handed a string that does not contain a '/' character.
Then when you increment the null pointer or pass it to
printf() or strncat() you generate undefined behavior,
most likely causing your operating system to terminate
your program.

Good point...
Never cast the return value of malloc() in C. If you did
this to shut up compiler warnings, we fixed that properly
by including <stdlib.h> for malloc's prototype. Without
that prototype, the cast eliminates the warning but
specifically does the wrong thing on some platforms.

Why shouldn't I cast the return value of malloc()?
First, you don't need to set the entire buffer to 0 for
what you are doing. Second, if you do need to allocate
memory for an array of characters and have it all zeroed,
calloc() will do both operations in one call. Note that
calloc() is guaranteed to initialize an array of
characters to all valid '\0' values, it is not so
guaranteed to do so with any other type.

calloc()? I tried 'man calloc' on my linux system, but it didn't find the
man page.
But all you really need is one '\0' at the beginning, so
you could just code:

*path = '\0';


The length of the string prior to the final '/',
excluding the final '/' but allowing for a '\0' in its
place to terminate it, is exactly file_name - string. It
can't ever be anything else (if strrchr() does not return
NULL). You can always assume this.

Ok. I was wondering what would happen if the character array was not stored
in one contigous block in memory (i.e. if it were really long). The
subtracting the two addresses of the pointers would not give me the correct
length.
The problem with this is length_of_file_path includes the
final '/', which you do not want to appear in the path
string (or do you?). If you don't, use
length_of_file_path - 1 as the length argument.

I did want the final '/' to be included.

Thanks for your help!
 
J

Joe Laughlin

Martin said:
On Linux, FILENAME_MAX is 4096. In fact, I don't know of
any Unix system in common use today that does /not/ allow
a path name of at least 1024 characters.


What if the system poses no arbitrary limit on the
maximum file name size? If I take the standard literally
(and ignore the non-normative footnote), `FILENAME_MAX'
must be defined to the size of the whole address space on
such systems.

Martin

non-normative footnote?
 
M

Martin Dickopp

Joe Laughlin said:
non-normative footnote?

The standard defines `FILENAME_MAX' in 7.19.1#3:

| [...]
|
| FILENAME_MAX
|
| which expands to an integer constant expression that is the size
| needed for an array of char large enough to hold the longest file
| name string that the implementation guarantees can be opened; [...]

Attached to that text is a footnote which reads

| If the implementation imposes no practical limit on the length of
| file name strings, the value of FILENAME_MAX should instead be the
| recommended size of an array intended to hold a file name string. Of
| course, file name string contents are subject to other system-specific
| constraints; therefore all possible strings of length FILENAME_MAX
| cannot be expected to be opened successfully.

However, footnotes in the standard are not normative.

Martin
 
M

Malcolm

Martin Dickopp said:
What if the system poses no arbitrary limit on the maximum file
name size? If I take the standard literally (and ignore the non-
normative footnote), `FILENAME_MAX' must be defined to the
size of the whole address space on such systems.
This of course is the problem with trying to be totally portable.

People want to do the following

void foo(char *directory)
{
char fname[FILENAME_MAX];

strcpy(fname, directory);
strcat(fname, "afile.name");
fopen(fname);
}

If FILENAME_MAX is too large then this will overflow the stack, whilst if it
is too small you have UB. The code is perfect on a system where a legal
directory will never exceed the limit, and "good enough" for most systems,
but still leaves open potential security holes if the OS somehow allows
arbitrarily long paths to get into the "directory" pointer
 
K

Keith Thompson

Martin Dickopp said:
Joe Laughlin said:
non-normative footnote?

The standard defines `FILENAME_MAX' in 7.19.1#3:

| [...]
|
| FILENAME_MAX
|
| which expands to an integer constant expression that is the size
| needed for an array of char large enough to hold the longest file
| name string that the implementation guarantees can be opened; [...]
[...]

Suppose I write a conforming C implementation (compiler and library)
for some operating system. Suppose the OS itself supports file names
up to, say, 8192 characters (passing a longer string to the OS's "open
file" routine causes an error), but I'm only willing to guarantee up
to 512 characters. Then I can set FILENAME_MAX to 512. Even though
fopen() will happy handle file names up to 8192 characters the
implementation only guarantees 512.

I'm not convinced that FILENAME_MAX is particularly useful.
 
J

Jack Klein

Martin Dickopp said:
What if the system poses no arbitrary limit on the maximum file
name size? If I take the standard literally (and ignore the non-
normative footnote), `FILENAME_MAX' must be defined to the
size of the whole address space on such systems.
This of course is the problem with trying to be totally portable.

People want to do the following

void foo(char *directory)
{
char fname[FILENAME_MAX];

strcpy(fname, directory);
strcat(fname, "afile.name");
fopen(fname);
}

If FILENAME_MAX is too large then this will overflow the stack, whilst if it
is too small you have UB. The code is perfect on a system where a legal
directory will never exceed the limit, and "good enough" for most systems,
but still leaves open potential security holes if the OS somehow allows
arbitrarily long paths to get into the "directory" pointer

Any use of strcpy(), strcat(), sprintf(), and several other functions
can produce buffer overflow. In robust programs, appropriate checks
are performed.

In any case, it is unlikely that FILENAME_MAX is a poorer choice than
an arbitrary value picked because it seemed "bit enough".
 
J

Jack Klein

Jack said:
On Wed, 2 Jun 2004 00:48:45 GMT, "Joe Laughlin"


Need to add:

#include <stdlib.h>
Noted.


...here to have a proper prototype for malloc() in scope.
[snip]
Never cast the return value of malloc() in C. If you did
this to shut up compiler warnings, we fixed that properly
by including <stdlib.h> for malloc's prototype. Without
that prototype, the cast eliminates the warning but
specifically does the wrong thing on some platforms.

Why shouldn't I cast the return value of malloc()?

Casting the return value of malloc() is:

1. Not necessary in C as the memory allocation functions (malloc,
calloc, and realloc) all return a pointer to void. In C, a pointer to
void may be assigned to a pointer to any object type directly, with no
cast.

2. Prevents a required diagnostic if a proper prototype for malloc is
not in scope, as in this case where <stdlib.h> was not included.
Prototypes, or at least declarations, for all functions are required
in the latest (1999) version of the C standard, but most compilers
today still operate according to an earlier standard version. Without
a declaration, the compiler must assume that the function returns an
int and accepts whatever arguments (after default promotions) that you
pass to it. Without the cast the compiler is required to issue a
diagnostic because assigning the assumed int returned by malloc to a
pointer without a cast is a constraint violation, thus warning you
that you have missed something.

3. By preventing the diagnostic that would cause you to fix the
problem properly (by including <stdlib.h>), this can cause some very
nasty errors. There are some platforms where pointers and ints are
returned differently, for example in different processor registers.
And there are 64-bit platforms today, and more of them all the time,
where int and pointers are different sizes, 32 bits for int and 64
bits for pointers.

With the prototype and without the cast, the compiler will happily
take the irrelevant value in the integer return register, convert it
to a pointer, and store that, completely ignoring the real pointer
value in the proper register. Or if ints and pointers are returned in
the same way, it will happily throw away the top 32 bits of the real
64-bit pointer, then zero extend or sign extend the lower 32 bits into
64. The most likely result is a seg fault when the pointer is
dereferenced.

There have been a large number of platforms at various times where int
and pointer happened to have the same size. This was very common in
the early days of 8 and 16-bit processors, and on today's 32-bit
desktops (Windows, *NIX, etc.). The majority of these platforms also
happen to return ints and pointers in the same register, so this sort
of sloppy programming "works by accident" a large part of the time on
many platforms. But it has never been correct C, and there have
always been platforms where it doesn't work. With the ongoing
transition from 32 to 64 bits on the desktop, there will be a lot more
platforms in the future where it doesn't work by accident.
calloc()? I tried 'man calloc' on my linux system, but it didn't find the
man page.

Then your man pages are deficient. The calloc() function has part of
the C library since before the original K&R boot (1978) and part of
every C standard since the original 1989 ANSI version. Get a better
reference.
Ok. I was wondering what would happen if the character array was not stored
in one contigous block in memory (i.e. if it were really long). The
subtracting the two addresses of the pointers would not give me the correct
length.

No matter how long they are, if they exist they are contiguous.
Arrays are always contiguous, as are the bytes allocated by malloc,
calloc, or realloc. Your operating system might provide virtual
memory, so they might be scattered all over physical memory or even
partially in a swap file, but that is all transparent to a C program.
To the program they are always contiguous and always will be.
I did want the final '/' to be included.

OK, but that is not how your original post read.
Thanks for your help!

You're welcome.
 
M

Malcolm

Jack Klein said:
In any case, it is unlikely that FILENAME_MAX is a poorer
choice than an arbitrary value picked because it seemed "bit
enough".
Except that it gives the impression that the code is completely robust, when
this may not be the case.
 
M

Martin Dickopp

Malcolm said:
Martin Dickopp said:
What if the system poses no arbitrary limit on the maximum file
name size? If I take the standard literally (and ignore the non-
normative footnote), `FILENAME_MAX' must be defined to the
size of the whole address space on such systems.
This of course is the problem with trying to be totally portable.

People want to do the following

void foo(char *directory)
{
char fname[FILENAME_MAX];

strcpy(fname, directory);
strcat(fname, "afile.name");
fopen(fname);
}

People /want/ to use fixed sized buffers for a lot of things. AFAIKT,
this is just another case where a fixed size buffer is slightly more
convenient, but can easily be avoided.
If FILENAME_MAX is too large then this will overflow the stack, whilst
if it is too small you have UB. The code is perfect on a system where
a legal directory will never exceed the limit,

If `directory' is a user-provided string, a check that it isn't too long
is still needed.
and "good enough" for most systems, but still leaves open potential
security holes if the OS somehow allows arbitrarily long paths to get
into the "directory" pointer

IMHO, having `FILENAME_MAX' in the standard is a bad idea. It encourages
bad coding practices.

Martin
 
M

Martin Dickopp

Jack Klein said:
Martin Dickopp said:
What if the system poses no arbitrary limit on the maximum file
name size? If I take the standard literally (and ignore the non-
normative footnote), `FILENAME_MAX' must be defined to the
size of the whole address space on such systems.
This of course is the problem with trying to be totally portable.

People want to do the following

void foo(char *directory)
{
char fname[FILENAME_MAX];

strcpy(fname, directory);
strcat(fname, "afile.name");
fopen(fname);
}

If FILENAME_MAX is too large then this will overflow the stack, whilst if it
is too small you have UB. The code is perfect on a system where a legal
directory will never exceed the limit, and "good enough" for most systems,
but still leaves open potential security holes if the OS somehow allows
arbitrarily long paths to get into the "directory" pointer

Any use of strcpy(), strcat(), sprintf(), and several other functions
can produce buffer overflow. In robust programs, appropriate checks
are performed.

In any case, it is unlikely that FILENAME_MAX is a poorer choice than
an arbitrary value picked because it seemed "bit enough".

IMHO, /any/ fixed (at compile time) value is a poor choice.

Martin
 
M

Martin Dickopp

Keith Thompson said:
Martin Dickopp said:
Joe Laughlin said:
Martin Dickopp wrote:

Did you know that <stdio.h> defines a macro
FILENAME_MAX? While I don't know of any system in
common use today that allows a path name to exceed 1000
characters,

On Linux, FILENAME_MAX is 4096. In fact, I don't know of
any Unix system in common use today that does /not/ allow
a path name of at least 1024 characters.

if you use FILENAME_MAX the buffer will always be large
enough for any legal file name on any system where it is
compiled.

What if the system poses no arbitrary limit on the
maximum file name size? If I take the standard literally
(and ignore the non-normative footnote), `FILENAME_MAX'
must be defined to the size of the whole address space on
such systems.

non-normative footnote?

The standard defines `FILENAME_MAX' in 7.19.1#3:

| [...]
|
| FILENAME_MAX
|
| which expands to an integer constant expression that is the size
| needed for an array of char large enough to hold the longest file
| name string that the implementation guarantees can be opened; [...]
[...]

Suppose I write a conforming C implementation (compiler and library)
for some operating system. Suppose the OS itself supports file names
up to, say, 8192 characters (passing a longer string to the OS's "open
file" routine causes an error), but I'm only willing to guarantee up
to 512 characters. Then I can set FILENAME_MAX to 512. Even though
fopen() will happy handle file names up to 8192 characters the
implementation only guarantees 512.

While this aspect of such an implementation would be conforming, I'd
still find it objectionable from a QoI point of view.
I'm not convinced that FILENAME_MAX is particularly useful.

Neither am I.

Martin
 
C

Christian Bau

Martin Dickopp said:
Malcolm said:
Martin Dickopp said:
What if the system poses no arbitrary limit on the maximum file
name size? If I take the standard literally (and ignore the non-
normative footnote), `FILENAME_MAX' must be defined to the
size of the whole address space on such systems.
This of course is the problem with trying to be totally portable.

People want to do the following

void foo(char *directory)
{
char fname[FILENAME_MAX];

strcpy(fname, directory);
strcat(fname, "afile.name");
fopen(fname);
}

People /want/ to use fixed sized buffers for a lot of things. AFAIKT,
this is just another case where a fixed size buffer is slightly more
convenient, but can easily be avoided.
If FILENAME_MAX is too large then this will overflow the stack, whilst
if it is too small you have UB. The code is perfect on a system where
a legal directory will never exceed the limit,

If `directory' is a user-provided string, a check that it isn't too long
is still needed.
and "good enough" for most systems, but still leaves open potential
security holes if the OS somehow allows arbitrarily long paths to get
into the "directory" pointer

IMHO, having `FILENAME_MAX' in the standard is a bad idea. It encourages
bad coding practices.

Absolutely. Just assume that I want to open a file in the same directory
as another file. The other file is "verylongdirectoryname/x" (a bit
longer, of course). The new filename without the directory is
"verylongfilename". If the complete name of the other file just fits
into FILENAME_MAX, and you extract the directory portion, then call
something similar to foo then you might blindly concatenate two strings
that are much longer than FILENAME_MAX and crash or worse.
 
D

Dave Thompson

Casting the return value of malloc() is:

1. Not necessary in C as [void* converts ...]
2. Prevents a required diagnostic if a proper prototype for malloc is
not in scope, as in this case where <stdlib.h> was not included. [...]
3. By preventing the diagnostic that would cause you to fix the
problem properly (by including <stdlib.h>), this can cause some very
nasty errors. There are some platforms where pointers and ints are
returned differently, [...]
With the prototype and without the cast, the compiler will happily
take the [wrong] register [... or bits ...]

"Shirley" YM without the declaration and with the cast.
Then your man pages are deficient. The calloc() function has part of
the C library since before the original K&R boot (1978) and part of

The rare typo which actually produces a meaningful (or nearly so,
should be R&T) but wrong statement. YM book.
every C standard since the original 1989 ANSI version. Get a better
reference.

- David.Thompson1 at worldnet.att.net
 
I

Ike Naar

:> Did you know that <stdio.h> defines a macro FILENAME_MAX? While I
:> don't know of any system in common use today that allows a path name
:> to exceed 1000 characters,
: On Linux, FILENAME_MAX is 4096. In fact, I don't know of any Unix
: system in common use today that does /not/ allow a path name of at least
: 1024 characters.
:> if you use FILENAME_MAX the buffer will always be large enough for any
:> legal file name on any system where it is compiled.

FYI, on HP-UX, FILENAME_MAX equals 14 :-(

Kind regards,
Ike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top