Parsing a PATH null-terminated variable

D

DiAvOl

Hello everyone,

I've recently implemented a function which I use in order to check if
a binary file exists in the PATH environment variable and returns the
full path. I'm not sure however how efficient and fast my
implementation is or if it contains bugs (although it works fine at
the moment) and I am looking forward for a better one.

First I am pasting the code that contains non-portable non-ANSI C code
(please ignore those portions - stat etc -) and the second one is the
same function ( check_variable() ) without the *nix functions which
just prints the contents of the path variable after parsing it (in
short just the code that I am interested in making faster/better)

The full implementation:
-----------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>

#ifndef PATH_MAX
#define PATH_MAX 4096
#endif

#define UNUSED(x) ((void) x)

char *check_variable(const char *var, const char *program) {
char *lvar = NULL;
char *p;
char *t;
const char *pname;
static char path[PATH_MAX] = {0};
struct stat sb;

if (var == NULL)
return NULL;

lvar = strdup(var);

p = t = lvar;

if ( (pname = strchr(program, '/')) != NULL)
pname++;
else
pname = program;

for (;;) {
if ( (*p == '\0') && (*t == '\0')) {
free(lvar);
break;
}

while(*p && *p != ':')
p++;

if (*p)
*p++ = '\0';

strcat(path, t);
strcat(path, "/");
strcat(path, pname);

if (stat(path, &sb) == 0) {
free(lvar);
return path;
}

path[0] = '\0';
t = p;
}

return NULL;
}

int main(int argc, char *argv[]) {
char *path;
UNUSED(argc); /* Ignore argument checks, testing purposes */

if ( (path = check_variable(getenv("PATH"), argv[1])) == NULL) {
fprintf(stderr, "Could not find the executable in the path\n");
exit(EXIT_FAILURE);
} else {
fprintf(stdout, "Found path: %s\n", path);
}

return 0;
}

The check_variable function without *nix functions
------------------------------------------------------------------------

void check_variable_demo(const char *var) {
char *lvar = NULL;
char *p;
char *t;
static char path[PATH_MAX] = {0};

if (var == NULL)
return;

lvar = strdup(var);

p = t = lvar;

for (;;) {
if ( (*p == '\0') && (*t == '\0')) {
free(lvar);
break;
}

while(*p && *p != ':')
p++;

if (*p)
*p++ = '\0';

strcat(path, t);
strcat(path, "/");

printf("Path: %s\n", path);

path[0] = '\0';
t = p;
}
}

Thanks for your time and sorry for my bad english
 
S

scholz.lothar

Hello everyone,

Hello, this is typical unix crap programming style.
You forget about escaped characters. So your code is just buggy.

**** the lazy unix programmer and file system designer for all
the whitespace, escape, path problems the world is still having.

I have only scanned the code, so i don't tell you about
performance, but code like this does not need to be performant
anyway.
 
D

DiAvOl

Hello, this is typical unix crap programming style.
You forget about escaped characters. So your code is just buggy.

**** the lazy unix programmer and file system designer for all
the whitespace, escape, path  problems the world is still having.

I have only scanned the code, so i don't tell you about
performance, but code like this does not need to be performant
anyway.

Ok, I don't handle escape characters (not sure if I need to do that -
probably not)
I will keep that and ignore the rest of your reply

Anyway read the whole post pal, I write "please ignore the unix code
portion"
 
U

user923005

Hello everyone,

I've recently implemented a function which I use in order to check if
a binary file exists in the PATH environment variable and returns the
full path.
[snip]
See:
http://legacy.imatix.com/html/sfl/

Which has this routine (portable across POSIX and Windows
environments):
file_locate
#include "sflfile.h"
FILE *
file_locate (
const char *path,
const char *name,
const char *ext)

Synopsis
Combines the functions of file where() and file_open when you want to
read a file. Searches for a file on a specified path, opens the file
if found, and returns a FILE * for the open file. Returns NULL if the
file was not found or could not be opened for reading.

Source Code - (sflfile.c)
{
char
*filename;

ASSERT (name);
filename = file where ('r', path, name, ext);
if (filename)
return (file open (filename, 'r'));
else
return (NULL);
}
 
C

Chris McDonald

user923005 said:
Combines the functions of file where() and file_open when you want to
read a file. Searches for a file on a specified path, opens the file
if found, and returns a FILE * for the open file. Returns NULL if the
file was not found or could not be opened for reading.


It's an unusual design to expect to *open* an file on the PATH - the
PATH usually provides directories of executable programs (yes, possibly
scripts), but there's usually little need to have read-access to them.

Yes, nothing wrong, but it 'confuses' the typical role of PATH.
 
U

user923005

It's an unusual design to expect to *open* an file on the PATH - the
PATH usually provides directories of executable programs (yes, possibly
scripts), but there's usually little need to have read-access to them.

Yes, nothing wrong, but it 'confuses' the typical role of PATH.

I do it all the time, especially with LoadLibrary()/GetProcAddress()
or dlopen()/dlcose() and the like.

There are times when you *have* to do it that way. Oracle (for
instance) has different versions of the OCI with different
capabilities. You may or may not be able to call the FastPath API and
other advanced features. The only way to know if they are present is
to dynamically load the library and then check for the entry points.

I guessed that is what the OP was doing, since he was clearly building
a path that included a program name from the path variable.
 
U

user923005

Eric said:
DiAvOl said:
[...]
  static char path[PATH_MAX] = {0};
    Why make this static?  [...]

     Never mind; I see now what I overlooked on first reading:

Still, it might be better to return a pointer to a dynamically-
allocated region whose size is *known* to be sufficient than to
return a pointer to a static region that is *hoped* to be big
enough.

We know it is a Posix system. If _LIMITS_EXTENSION is defined,
PATH_MAX must be defined when <limits.h> is included.

PATH_MAX corresponds {roughly} to ANSI/ISO FILENAME_MAX and is
guaranteed to be long enough to hold any valid path.

Personally, I would have gone with FILENAME_MAX because it provides a
better {more portable and more certain} guarantee than PATH_MAX.
 
K

Keith Thompson

user923005 said:
Eric said:
DiAvOl wrote:
[...]
  static char path[PATH_MAX] = {0};
    Why make this static?  [...]

     Never mind; I see now what I overlooked on first reading:
      return path;

Still, it might be better to return a pointer to a dynamically-
allocated region whose size is *known* to be sufficient than to
return a pointer to a static region that is *hoped* to be big
enough.

We know it is a Posix system. If _LIMITS_EXTENSION is defined,
PATH_MAX must be defined when <limits.h> is included.

PATH_MAX corresponds {roughly} to ANSI/ISO FILENAME_MAX and is
guaranteed to be long enough to hold any valid path.

Personally, I would have gone with FILENAME_MAX because it provides a
better {more portable and more certain} guarantee than PATH_MAX.

Using FILENAME_MAX is still tricky. C99 7.19.1p3 says that FILENAME_MAX

expands to an integer constant expression that is the size needed
for an array of char large enough to hold the longest file name
string that the implementation guarantees can be opened

with a footnote:

If the implementation imposes no practical limit on the length of
file name strings, the value of FILENAME_MAX should instead be the
recommended size of an array intended to hold a file name
string. Of course, file name string contents are subject to other
system-specific constraints; therefore all possible strings of
length FILENAME_MAX cannot be expected to be opened successfully.

So the implementation guarantees that it can open files with names of
length up to FILENAME_MAX-1, but it doesn't guarantee that no file
name can be longer than that.

For real safety, you have to either expand your buffer dynamically or
be prepared to reject very long file names.
 
B

Barry Schwarz

Hello everyone,

I've recently implemented a function which I use in order to check if
a binary file exists in the PATH environment variable and returns the
full path. I'm not sure however how efficient and fast my
implementation is or if it contains bugs (although it works fine at
the moment) and I am looking forward for a better one.

First I am pasting the code that contains non-portable non-ANSI C code
(please ignore those portions - stat etc -) and the second one is the
same function ( check_variable() ) without the *nix functions which
just prints the contents of the path variable after parsing it (in
short just the code that I am interested in making faster/better)

The full implementation:
-----------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>

#ifndef PATH_MAX
#define PATH_MAX 4096
#endif

#define UNUSED(x) ((void) x)

char *check_variable(const char *var, const char *program) {
char *lvar = NULL;
char *p;
char *t;
const char *pname;
static char path[PATH_MAX] = {0};

Static objects are initialized only once prior to program execution.
If you call this function a second time, the calls to strcat call will
not remove the old data, making it possible to overrun the array. You
probably need something like
path[0] = '\0';
prior to the for loop.
struct stat sb;

if (var == NULL)
return NULL;

lvar = strdup(var);

p = t = lvar;

if ( (pname = strchr(program, '/')) != NULL)
pname++;
else
pname = program;

for (;;) {
if ( (*p == '\0') && (*t == '\0')) {
free(lvar);
break;
}

while(*p && *p != ':')
p++;

if (*p)
*p++ = '\0';

strcat(path, t);
strcat(path, "/");
strcat(path, pname);

if (stat(path, &sb) == 0) {
free(lvar);
return path;
}

path[0] = '\0';
t = p;
}

return NULL;
}

int main(int argc, char *argv[]) {
char *path;
UNUSED(argc); /* Ignore argument checks, testing purposes */

if ( (path = check_variable(getenv("PATH"), argv[1])) == NULL) {

Don't you think you should check if argv[1] exists before using it? It
is possible for argc to be 0.
fprintf(stderr, "Could not find the executable in the path\n");
exit(EXIT_FAILURE);
} else {
fprintf(stdout, "Found path: %s\n", path);
}

return 0;
}

The check_variable function without *nix functions
------------------------------------------------------------------------

void check_variable_demo(const char *var) {
char *lvar = NULL;
char *p;
char *t;
static char path[PATH_MAX] = {0};

if (var == NULL)
return;

lvar = strdup(var);

This is not a standard function.
p = t = lvar;

for (;;) {
if ( (*p == '\0') && (*t == '\0')) {
free(lvar);
break;
}

while(*p && *p != ':')
p++;

if (*p)
*p++ = '\0';

strcat(path, t);
strcat(path, "/");

printf("Path: %s\n", path);

path[0] = '\0';
t = p;
}
}

Thanks for your time and sorry for my bad english
 
J

James Kuyper

Richard said:
In an otherwise excellent article, Eric Sosman said:


...or will read it without ever writing it, in which case the
initialisation value protects you against reading an indeterminate value
and invoking undefined behaviour.

Keep in mind that the real bug is reading the value without having
written a value that was intended to be used. Initializing with a value
that is NOT intended to be used serves only to make the failure mode
less nasty; this can have the undesirable result of making the failure
much harder to detect.

I've had a lot of personal experience with this, because the SGI C
compiler will zero-initialize automatic variables that are not
explicitly initialized. As a result, several bugs in code that I did not
write, but am responsible for, went unnoticed for years. 0 was not the
correct value, but is was a value that caused behavior that differed
from the correct behavior only in ways that were too subtle to be easily
noticed. As a result, we did not detect those errors until we ported the
code to Linux machines using gcc, which does not initialize those variables.

Initializing variables with carefully chosen values could, in some
cases, cause the catastrophic failure to occur more reliably and in a
safer fashion than it would when they are uninitialized. However, many
compilers can diagnose the reading of uninitialized variables;
initializing them disables this feature, without actually solving the
underlying problem. I greatly prefer finding these problems by a
diagnostic message at compile time, than by a catastrophic failure at
run time. I prefer either of those, to not finding the bug at all,
because it has been masked by a "safe" initialization that leaves me
with a subtly-incorrect program.
 
D

DiAvOl

Initializing variables with carefully chosen values could, in some
cases, cause the catastrophic failure to occur more reliably and in a
safer fashion than it would when they are uninitialized. However, many
compilers can diagnose the reading of uninitialized variables;
initializing them disables this feature, without actually solving the
underlying problem. I greatly prefer finding these problems by a
diagnostic message at compile time, than by a catastrophic failure at
run time. I prefer either of those, to not finding the bug at all,
because it has been masked by a "safe" initialization that leaves me
with a subtly-incorrect program.

In case you miss the compiler warning though and use an uninitialized
variable the program "may" work because the uninitialized pointer can
point anywhere. On the other hand if you initialize the pointer to a
NULL value and use it the buggy program will always crash with a
segmentation fault.

Thanks for your replies
 
J

James Kuyper

DiAvOl said:
In case you miss the compiler warning though and use an uninitialized
variable the program "may" work because the uninitialized pointer can
point anywhere. On the other hand if you initialize the pointer to a
NULL value and use it the buggy program will always crash with a
segmentation fault.

That's an example of what I meant by a "carefully chosen value". In
practice, it doesn't make much of a difference. An uninitialized pointer
value is extremely unlikely to be dereferencable, at least on the
systems I've used. Other types are more forgiving.
 
K

Keith Thompson

James Kuyper said:
DiAvOl wrote: [...]
In case you miss the compiler warning though and use an uninitialized
variable the program "may" work because the uninitialized pointer can
point anywhere. On the other hand if you initialize the pointer to a
NULL value and use it the buggy program will always crash with a
segmentation fault.

That's an example of what I meant by a "carefully chosen value". In
practice, it doesn't make much of a difference. An uninitialized
pointer value is extremely unlikely to be dereferencable, at least on
the systems I've used. Other types are more forgiving.

Unless the pointer happens to occupy the same memory address as a
previous pointer object that had a valid, or at least dereferencable,
value.

Uninitialized garbage isn't mathematically random.
 
J

jameskuyper

Keith said:
Unless the pointer happens to occupy the same memory address as a
previous pointer object that had a valid, or at least dereferencable,
value.

You do have a point; this isn't even incredibly unlikely. On at least
some systems, repeated calls to the same function, with no intervening
calls to any other function, allocate the same exact pieces of memory
to hold all of the automatic variables, which still hold the values
they had at the end of the previous call to that function. Still, I
prefer relying upon compile-time detection, rather than run-time.
 
C

Chris Torek

You do have a point; this isn't even incredibly unlikely. On at least
some systems, repeated calls to the same function, with no intervening
calls to any other function, allocate the same exact pieces of memory
to hold all of the automatic variables, which still hold the values
they had at the end of the previous call to that function.

Local variables that act as if they were static. Interesting
failure mode. :)

One other "interesting failure mode" that I have seen in real code,
namely mh (the RAND "mail handler" suite), many years ago, when code
for Unix systems ran on PDP-11s and VAXen (and no other machines).

The code structure was something like this:

int somefunc() {
register MSG *p;
...
p = get_current_msg();
...
if (otherfunc())
do something;
else
do something else;
...
}

int otherfunc() {
register MSG *cur;
...
use(cur->field); /* without ever setting "cur" */
...
return some result or another;
}

This code actually worked, because the "register" keyword told the
C compiler to use a machine register, and it always did. (There were
only two C compilers -- the PDP-11 C Compiler and the VAX C Compiler
-- at the time, so it was easy to say what "the C compiler" did.
Actually there were two PDP-11 C compilers, dmr's and scj's, but
both behaved the same way with regard to "register" variables.)

The code thus passed the register's value by "register inference".
On the PDP-11, the first three "register" variables went in r5,
r4, and r3. On the VAX, the first six "register" variables went
in r11, r10, r9, r8, r7, and r6. As long as the two pointers were
both in the same machine register, and the register was already
set in the caller (somefunc() above), that register held the correct
value in the callee (otherfunc() above). (The fact that the
register was callee-save -- pushed at entry to otherfunc(), and
popped again at exit -- was irrelevant since it was not modified
inside otherfunc().)

We (actually Fred Blonder) found the problem when we compiled mh
for the Pyramid, a machine on which registers worked quite differently.
Finding the bug was not too difficult -- it was obvious from code
inspection that the pointer in otherfunc() was never initialized
-- but figuring out why it worked at all on the VAX was interesting.
Still, I prefer relying upon compile-time detection, rather than run-time.

Indeed, had gcc existed at the time, compiling with gcc (with
optimization and warnings both turned on) would have found the bug
right away, even before running the code. (Initializing the pointer
to NULL might not have found the problem, since on the VAX and
PDP-11, *(MSG *)NULL would not fault. Depending on what the
sub-function was doing, it might have *seemed* to be working, up
until cur->field was supposed to have some useful value anyway.)
 
J

jameskuyper

Chris Torek wrote:
....
Local variables that act as if they were static. Interesting
failure mode. :)

It can even be worse. On a single user system (I found this out on a
DOS machine), you can have a program allocate the same exact memory
for all variables in two consecutive runs of a program, with the final
values for those variables used as the values for uninitialized
variables for the second run. I found this out while debugging a
program which would alternate, with perfect regularity, between two
different failure modes. My first clue to finding the problem was the
realization that, in order to alternate with perfect regularity, it
had to be storing the information about which way to behave somewhere
- it took me an embarrasingly long time to figure out where.
 
A

Andrew Poelstra

In case you miss the compiler warning though and use an uninitialized
variable the program "may" work because the uninitialized pointer can
point anywhere. On the other hand if you initialize the pointer to a
NULL value and use it the buggy program will always crash with a
segmentation fault.

Actually, we just found a bug in our code for our embedded controllers
that potentially dereferenced a NULL pointer (some code was outside an
if(pPointer != NULL) {} block and shouldn't have been), but we never
saw the bug actually cause any issues in real life, since the processor
we use does not have any memory protection.

Therefore, '0' was a value pointer value and dereferencing it did not
cause any visible problems. Nothing reset, nothing failed, and when I
looked through the code, it looked like the value was checked to make
sure it was one of two specific vales, and in the case that it /was/
one of those values, everything would be peachy, and if not an error
code would have been returned just as it would've if the NULL value
had been checked.

So NULL isn't always "safe".
 
B

Barry Schwarz

In case you miss the compiler warning though and use an uninitialized
variable the program "may" work because the uninitialized pointer can
point anywhere. On the other hand if you initialize the pointer to a
NULL value and use it the buggy program will always crash with a
segmentation fault.

Only for a very limited meaning of the word "always" and a very
expansive meaning of the phrase "segmentation fault". There is a lot
more variety in the real world than what you use on your home or
office system.
 
J

Jean-Marc Bourguet

Chris Torek said:
This code actually worked, because the "register" keyword told the
C compiler to use a machine register, and it always did. (There were
only two C compilers -- the PDP-11 C Compiler and the VAX C Compiler
-- at the time, so it was easy to say what "the C compiler" did.
Actually there were two PDP-11 C compilers, dmr's and scj's, but
both behaved the same way with regard to "register" variables.)

Historical precisions.

Reading DMR's "The development of the C Programming Language" in HOPL-II.
DMR's compiler seems to have been ported to Honeywell 635 and IBM 360/370.
Then Steve Johnson wrote pcc at the same time as he, dmr and Thompson
ported Unix to Interdata 8/32. The port induced changes in the language
(unsigned, cast, tying struct members to the struct). The success of the
port led another port to the VAX. (It isn't mentionned if the VAX compiler
was based on pcc or not -- which makes me think it probably was).

Yours,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top