Questions about the behavior for argv(0)

C

C Guy

The function signature for main is:

int main (int argc, char *argv[], char **envp)

argc tells you how many argv[] there are.

argv[0] is the name of the executable - the name of the executable
containing main().

Under Win-XP, I've noticed that argv[0] does not include the full
filespec when invoked within a command shell. By filespec, I mean the
full path and the full name of the program file - including suffix (ie -
..exe).

What I am seeing in argv(0) is just the file stem - as typed by the
user.

Eg, from a command shell, if I invoke "example" from c:\hello\there\, I
am seeing argv[0] return simply "example". In other words, exactly what
the user typed, not the full and complete filespec.

If I perform the same command in, say, a command prompt in win98,
argv(0) returns "c:\hello\there\example.exe" (the full and complete
filespec).

Maybe I'm dreaming, but I could swear that once upon a time that from a
command prompt on XP that I would see the same thing as I see on win98.

When launched from explorer (under XP) argv(0) seems to behave as I
expect.

Did XP always exhibit this behavior?

How does NT and 2K behave in this regard?
 
K

Keith Thompson

C Guy said:
The function signature for main is:

int main (int argc, char *argv[], char **envp)

The envp parameter is non-standard.
argc tells you how many argv[] there are.

Close enough. argv[argc] is a null pointer.
argv[0] is the name of the executable - the name of the executable
containing main().

All the C standard requires is that it "represents the program name".
In more detail:

If the value of argc is greater than zero, the string pointed to
by argv[0] represents the program name; argv[0][0] shall be the
null character if the program name is not available from the host
environment. If the value of argc is greater than one, the strings
pointed to by argv[1] through argv[argc-1] represent the program
parameters.

Both behaviors you describe are consistent with this requirement.
Under Win-XP, I've noticed that argv[0] does not include the full
filespec when invoked within a command shell. [...]
When launched from explorer (under XP) argv(0) seems to behave as I
expect.

Did XP always exhibit this behavior?

How does NT and 2K behave in this regard?

That's a Windows question, not a C (or C++) question. If you don't
get an answer from microsoft.public.vc.mfc, try
comp.os.ms-windows.programmer.win32.

Followups redirected, dropping comp.lang.c and comp.lang.c++.
 
J

James Kanze

C Guy said:
The function signature for main is:
int main (int argc, char *argv[], char **envp)
The envp parameter is non-standard.

The envp parameter makes the entire line implementation defined.
So he really does need to ask in an implementation specific
forum.
argc tells you how many argv[] there are.
Close enough. argv[argc] is a null pointer.
argv[0] is the name of the executable - the name of the executable
containing main().
All the C standard requires is that it "represents the program
name". In more detail:
If the value of argc is greater than zero, the string pointed to
by argv[0] represents the program name; argv[0][0] shall be the
null character if the program name is not available from the host
environment. If the value of argc is greater than one, the strings
pointed to by argv[1] through argv[argc-1] represent the program
parameters.
Both behaviors you describe are consistent with this requirement.

The C++ standard differs slightly here. (I'm reading this in
comp.lang.c++. Yet another problematic cross-posting, although
I guess one could be forgiven for not realizing the C and C++
are different here.) In C++, ``[...] and argv[0] shall be the
pointer to the initial character of a NTMBS that represents the
name used to invoke the program or "".'' Of course, I'm not
sure what "name used to invoke the program" means if I've
invoked it by clicking on some icon. And neither Windows nor
Unix are really conform in this regard---in both, it's
relatively simple to start a program with a totally arbitrary
string in argv[0]. (In Unix, for example, some programs will
prepend a '-' to the program name.)
 
C

C Guy

James said:
The C++ standard differs slightly here. (I'm reading this in
comp.lang.c++. Yet another problematic cross-posting, although
I guess one could be forgiven for not realizing the C and C++
are different here.)

Is there a better newsgroup than microsoft.public.vc.mfc to post this
question in then?

comp.os.ms-windows.programmer.win32 has been suggested. Any others?

Others have written:
argv[0] does not have to be a full path name nor even a file name.

I'm more interested in knowning why the behavior of argv[0] is different
(on an XP machine) when a program is invoked within a command shell vs
windows explorer.

I'm also interested to know why the behavior of argv[0] is NOT different
on a win-98 box under the same two conditions.

I'm also interested to know if the behavior I currently see for argv[0]
on an XP box has always been there, or if some service pack, update or
patch is the reason for the current behavior.

I'm also interested to know of an alternative method that returns a
consistent result. The following has been suggested and I will
investigate:
 
J

jameskuyper

C said:
James Kanze wrote: ....
argv[0] does not have to be a full path name nor even a file name.

I'm more interested in knowning why the behavior of argv[0] is different
(on an XP machine) when a program is invoked within a command shell vs
windows explorer.

I'm also interested to know why the behavior of argv[0] is NOT different
on a win-98 box under the same two conditions.

I'm also interested to know if the behavior I currently see for argv[0]
on an XP box has always been there, or if some service pack, update or
patch is the reason for the current behavior.

I'm also interested to know of an alternative method that returns a
consistent result. The following has been suggested and I will
investigate:
Call the Windows API GetModuleFileName(NULL, ...) to get the
fully qualified path of your .exe

Since neither the C standard nor the C++ standards impose sufficiently
strict restrictions on the value of argv[0] for your needs, every
single one of those questions is more appropriatedly directed to a
windows-specific newsgroup than to either comp.lang.c or comp.lang.c+
+.
 
J

James Kanze

Is there a better newsgroup than microsoft.public.vc.mfc to
post this question in then?
comp.os.ms-windows.programmer.win32 has been suggested. Any
others?

I'm not sure; the comp.os.ms-windows.programmer is the hierarchy
where I'd go.
Others have written:
argv[0] does not have to be a full path name nor even a file
name.
I'm more interested in knowning why the behavior of argv[0] is
different (on an XP machine) when a program is invoked within
a command shell vs windows explorer.
I'm also interested to know why the behavior of argv[0] is NOT
different on a win-98 box under the same two conditions.
I'm also interested to know if the behavior I currently see
for argv[0] on an XP box has always been there, or if some
service pack, update or patch is the reason for the current
behavior.

The answer for all of these is simple (and exactly like the
answer for Unix, so slightly portable): in both Unix and
Windows, the invoking process provides whatever it likes as
argv[0]. Which doesn't conform to either the C standard nor the
C++, but a conforment implementation of C or C++ isn't possible
under Unix or Windows.
I'm also interested to know of an alternative method that
returns a consistent result. The following has been suggested
and I will investigate:

That's what I use.
 
N

Nate Eldredge

James Kanze said:
The answer for all of these is simple (and exactly like the
answer for Unix, so slightly portable): in both Unix and
Windows, the invoking process provides whatever it likes as
argv[0]. Which doesn't conform to either the C standard nor the
C++, but a conforment implementation of C or C++ isn't possible
under Unix or Windows.

I wouldn't say this makes it non-conformant. The C standard says that
"the string pointed to by argv[0] represents the program name"
(5.1.2.2.1 (2)). It does not further define "program name". So one
could argue that the string provided by the calling process *is* the
program name for that particular run of the program. Indeed, in the
Unix world, this is the terminology people actually use; the program's
name is independent of filename of its executable. ("If you run this
program with the name 'foo', it does something; if you run it with the
name 'bar', it does something else.")

The fact that the standard does not define "program name" suggests to me
that the authors intended to let the implementation decide what the
"program name" should be. This seems very reasonable, since those
details are clearly beyond the scope of the standard. Note also the
previous paragraph, wherein the argv strings are described as having
"implementation-defined values".

Finally, I find it very unlikely that the standard authors would
knowingly include specifications that would make all existing Unix and
Windows implementations non-conformant, as you seem to suggest they did.
That's what I use.

Note that under Unix, it is generally not possible to do this reliably
at all, so if you intend to port someday, you should design your program
in a manner that does not require this information.
 
L

Lew Pitcher

James Kanze <[email protected]> writes:
[other attributions lost prior to this post]
Note that under Unix, it is generally not possible to do this reliably
at all, so if you intend to port someday, you should design your program
in a manner that does not require this information.

Indeed. In my experience, the usual reason for a Windows programmer to
complain about Unix not providing the full pathname (through whatever
mechanism) of the executable is that the programmer intends to use the
supplied path in a manner not compatible with Unix system configuration
and/or usage. Typically, the Windows programmer wishes to continue to use
the Windows convention of having program configuration and data files
reside in the same directory as the program executable, which is contrary
to the spirit and often to the configuration of a Unix standard
environment.

I won't debate on the rightness or wrongness of this mechanism here, but I
will say that, in general, programmers either need to write programs for
portability (and not depend on /any/ of the platform and language-specific
features available to them), or they need to write for /specific/
platforms. Porting "portable" code (like code written exclusively to the C
standard) is easy and needs only a recompilation for the target
environment; porting "platform specific" code is hard, needs
platform-specific substitutions, and often /really needs/ (but is seldom
acted apon) a complete rearchitecting.

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
B

BobF

Nate said:
Indeed, in the
Unix world, this is the terminology people actually use; the program's
name is independent of filename of its executable. ("If you run this
program with the name 'foo', it does something; if you run it with the
name 'bar', it does something else.")

Which Unix world? It's been a while, but IIRC the command typed at the
Unix prompt had to match an actual filename.

Using your example, I would need two shell scripts (files), one named
'foo' and the other named 'bar' to invoke a different *binary* with
different args to get different behavior between the two runs.

However, the scripts themselves are executable ... if you had said
'binary' instead of 'executable' it would make more sense ... to me.
 
L

Lew Pitcher

Which Unix world? It's been a while, but IIRC the command typed at the
Unix prompt had to match an actual filename.

But, with Unix, two (or more) different filenames can point to the same
file. Thus, a Unix executable file can have multiple names, and can test
argv[0] to see /which/ name it had been called for execution.

Using your example, I would need two shell scripts (files), one named
'foo' and the other named 'bar' to invoke a different *binary* with
different args to get different behavior between the two runs.

However, the scripts themselves are executable ... if you had said
'binary' instead of 'executable' it would make more sense ... to me.

The technique works for both executable binaries and executable shell
scripts. To me, it makes more sense to refer to 'executables'
than 'binaries' in the context of Unix filename/file linkage. OTOH, in the
context of C programming, it would make more sense to refer to 'binaries',
as C is seldom treated as an interpreted language or a shell script.

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
N

Nate Eldredge

BobF said:
Which Unix world? It's been a while, but IIRC the command typed at
the Unix prompt had to match an actual filename.

Using your example, I would need two shell scripts (files), one named
foo' and the other named 'bar' to invoke a different *binary* with
different args to get different behavior between the two runs.

However, the scripts themselves are executable ... if you had said
binary' instead of 'executable' it would make more sense ... to me.

What I have in mind is running the same binary with two different
names. This is most commonly done with a hard or soft link.

For instance, on my FreeBSD machine, /usr/bin/sum and /usr/bin/cksum are
hard links to the same binary. That binary does something like

int main(int argc, char *argv[]) {
if (strcmp(argv[0], "sum") == 0)
compute_16_bit_checksum();
else if (strcmp(argv[0], "cksum") == 0)
compute_crc32();
}

So running "sum foo.dat" gives you a 16-bit checksum, while "cksum
foo.dat" computes a CRC32. This way the two algorithms can share the
code that they have in common, without needing to duplicate it in two
binaries or use a shared library.

However, you could also do

execl("/usr/bin/sum", "cksum", "foo.dat", (char *)NULL);

which would also compute a CRC32.
 
R

Richard Bos

James Kanze said:
The answer for all of these is simple (and exactly like the
answer for Unix, so slightly portable): in both Unix and
Windows, the invoking process provides whatever it likes as
argv[0]. Which doesn't conform to either the C standard nor the
C++, but a conforment implementation of C or C++ isn't possible
under Unix or Windows.

Do elucidate. AFAIAC, that Windows doesn't supply its own conformant
argv[0] needn't stop an implementation making its own best stab at it.

Richard
 
L

Lew Pitcher

In addition to my previous reply...

Which Unix world? It's been a while, but IIRC the command typed at the
Unix prompt had to match an actual filename.

which is irrelevant.

In Unix, the exec() family of system calls governs the values of the various
strings passed to a process through the argc/*argv[] arguments to main().
Shells use one of the exec() calls to start processes, and, /by
convention/, fill in argv[0] with the command path as entered into the
shell (this may be a full path or a relative path, including a path implied
relative to the current working directory, or implied relative to one of
the directories in the $PATH environment variable). Note that the value
given to argv[0] is /by convention/ a filename. The exec() syscall makes no
requirement that the value to argv[0] /must be/ a filename.

It is entirely possible to start a process such that the argv[0] it sees has
*no direct relationship* to the filename of the executable file. Take a
look at the argv[0] of a Unix login shell, for instance, which typically is
set to "-" without there needing to be an executable called "-" anywhere on
the system.


--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
B

BobF

Nate said:
BobF said:
Which Unix world? It's been a while, but IIRC the command typed at
the Unix prompt had to match an actual filename.

Using your example, I would need two shell scripts (files), one named
foo' and the other named 'bar' to invoke a different *binary* with
different args to get different behavior between the two runs.

However, the scripts themselves are executable ... if you had said
binary' instead of 'executable' it would make more sense ... to me.

What I have in mind is running the same binary with two different
names. This is most commonly done with a hard or soft link.

For instance, on my FreeBSD machine, /usr/bin/sum and /usr/bin/cksum are
hard links to the same binary. That binary does something like

int main(int argc, char *argv[]) {
if (strcmp(argv[0], "sum") == 0)
compute_16_bit_checksum();
else if (strcmp(argv[0], "cksum") == 0)
compute_crc32();
}

So running "sum foo.dat" gives you a 16-bit checksum, while "cksum
foo.dat" computes a CRC32. This way the two algorithms can share the
code that they have in common, without needing to duplicate it in two
binaries or use a shared library.

However, you could also do

execl("/usr/bin/sum", "cksum", "foo.dat", (char *)NULL);

which would also compute a CRC32.

Way Back When, we would accomplish this with "crc -16" and "crc -32".
But things were simpler then and people weren't afraid of simplicity or
c/l args :)
 
N

Nate Eldredge

BobF said:
Way Back When, we would accomplish this with "crc -16" and "crc
-32". But things were simpler then and people weren't afraid of
simplicity or c/l args :)

I hardly think you can accuse early Unix designers of fearing simplicity
or command line arguments.

Rather, originally there was a `sum' command that did one thing.
Someone wanted a different kind of checksum so they wrote a new
command, and it also became standard. Then people realized that the two
programs shared a lot of code and could be merged into one, but invoking
them as `sum' and `cksum' was already established, and changing it would
break scripts. Hence the current solution.
 
B

BobF

Nate said:
I hardly think you can accuse early Unix designers of fearing simplicity
or command line arguments.

Nope, I said just the opposite
Rather, originally there was a `sum' command that did one thing.
Someone wanted a different kind of checksum so they wrote a new
command, and it also became standard. Then people realized that the two
programs shared a lot of code and could be merged into one, but invoking
them as `sum' and `cksum' was already established, and changing it would
break scripts. Hence the current solution.

I would have used a different approach that would have ultimately
resulted in updated scripts.

In the simplest cases using multiple links to the same file and testing
argv[0] seems fine. IMHO, however, long term maintenance would be much
simpler using explicit args.

To each his own.
 
J

James Kanze

James Kanze said:
The answer for all of these is simple (and exactly like the
answer for Unix, so slightly portable): in both Unix and
Windows, the invoking process provides whatever it likes as
argv[0]. Which doesn't conform to either the C standard nor
the C++, but a conforment implementation of C or C++ isn't
possible under Unix or Windows.
I wouldn't say this makes it non-conformant. The C standard
says that "the string pointed to by argv[0] represents the
program name" (5.1.2.2.1 (2)). It does not further define
"program name". So one could argue that the string provided
by the calling process *is* the program name for that
particular run of the program.

The C++ standard says that it is the name used to invoke the
program, which is more restrictive:).

(Of course, strictly speaking, conformance is always possible,
since it's always conformant for argv[0] to be simply "", an
empty string.)
Indeed, in the Unix world, this is the terminology people
actually use; the program's name is independent of filename of
its executable. ("If you run this program with the name
'foo', it does something; if you run it with the name 'bar',
it does something else.")

That's not quite the situation: under Unix, a file can have many
names, and the rule is if it is invoked with one name, it does
one thing, and if it is invoked with another, it does something
else. Normally, however, these are still names of the
executable file (although the case of aliases still exists).

There's also a convention in Unix that if the first character in
argv[0] is a '-', the program should consider itself a login
shell. And in this case, there really isn't any program with
that name on the disk.
The fact that the standard does not define "program name"
suggests to me that the authors intended to let the
implementation decide what the "program name" should be. This
seems very reasonable, since those details are clearly beyond
the scope of the standard. Note also the previous paragraph,
wherein the argv strings are described as having
"implementation-defined values".
Finally, I find it very unlikely that the standard authors
would knowingly include specifications that would make all
existing Unix and Windows implementations non-conformant, as
you seem to suggest they did.

An implementation has several possible solutions: it can always
make argv[0] an empty string, of course. Or it can document
that it is only conformant for programs which are invoked from
the Bourne shell (or one of a number of pre-defined shells which
do pass the right thing to argv[0]). In practice, although not
without drawbacks, I prefer the way Unix (and Windows) does it
to something that would be 100% conform, everywhere.
Note that under Unix, it is generally not possible to do this
reliably at all, so if you intend to port someday, you should
design your program in a manner that does not require this
information.

It's not possible to do it 100% reliably, but in practice, you
can probably get close enough for most uses. The goal, of
course, is to find related files: resources, text and messages,
etc.; if the user has created an environment confused enough
that you can't find the executable under Unix, then tough luck:
you output an error message and exit with an error status.
(FWIW: I actually had the code working under Unix before porting
it to Windows.)
 
J

James Kanze

James Kanze said:
The answer for all of these is simple (and exactly like the
answer for Unix, so slightly portable): in both Unix and
Windows, the invoking process provides whatever it likes as
argv[0]. Which doesn't conform to either the C standard nor
the C++, but a conforment implementation of C or C++ isn't
possible under Unix or Windows.
Do elucidate. AFAIAC, that Windows doesn't supply its own
conformant argv[0] needn't stop an implementation making its
own best stab at it.

It's true that under Windows, the C runtime could use
GetModuleFileName to determine argv[0]. In practice,
implementations don't do it this way, because the solution
(which is more or less similar to the one in Unix) actually used
is more useful in practice.
 
R

Richard Bos

James Kanze said:
James Kanze said:
The answer for all of these is simple (and exactly like the
answer for Unix, so slightly portable): in both Unix and
Windows, the invoking process provides whatever it likes as
argv[0]. Which doesn't conform to either the C standard nor
the C++, but a conforment implementation of C or C++ isn't
possible under Unix or Windows.
Do elucidate. AFAIAC, that Windows doesn't supply its own
conformant argv[0] needn't stop an implementation making its
own best stab at it.

It's true that under Windows, the C runtime could use
GetModuleFileName to determine argv[0]. In practice,
implementations don't do it this way, because the solution
(which is more or less similar to the one in Unix) actually used
is more useful in practice.

Which means that most Windows (and/or Unix) implementations aren't
perfect in this regard. However, this still does not mean that a
conformant C implementation is not possible under those systems, and
that for two reasons.
First, what most implementations _choose_ to do does not mean that other
implementations cannot choose another, more ideally conforming method.
It's still possible to write one which gets it right.

Second, the Standard (5.1.2.2.1#2, fourth point) actually states:
# — If the value of argc is greater than zero, the string pointed to by
# argv[0] represents the /program name/;
which, first, does not require anything about where this program name
comes from, be it the (or _a_!) file name of the executable, a field
within that executable, the file name of the C file being interpreted,
the file name of the symlink used to invoke the executable, or any other
reasonable choice; and second, actually _defines_ what "program name"
means, as far as C is concerned: it means whatever is in argv[0],
regardless of how it got there.
So, apart from QoI desires, any implementation can put whatever it wants
in argv[0], and this will be conforming.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top