Is argv array modifiable ?

M

mnaydin

Assume the main function is defined with
int main(int argc, char *argv[]) { /*...*/ }

So, is it permitted to modify the argv array? The standard says
"The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program,[...]". According to
my reading of the standard, for example, ++argv and ++argv[0][0]
are both permitted, but not ++argv[0] because it says nothing about
the argv array itself. Is my interpretation correct ?
 
R

Richard Heathfield

mnaydin said:
Assume the main function is defined with
int main(int argc, char *argv[]) { /*...*/ }

So, is it permitted to modify the argv array? The standard says
"The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program,[...]". According to
my reading of the standard, for example, ++argv and ++argv[0][0]
are both permitted, but not ++argv[0] because it says nothing about
the argv array itself. Is my interpretation correct ?

<caveat class="this is from memory, not the Standard">
I believe so, yes. You can modify argv because you get a
copy of the caller's value, so why should the caller care
what you do with it? You can modify the contents of each
string because there's no particular reason to forbid you
to, so long as you don't try to stretch the string - i.e.
scribble over or past the null terminator. But for all you
know, the implementation might have used dynamic allocation
to get the memory it needs for storing those strings, and
might have no spare copy of the pointer values returned by
the allocator - so (if I recall correctly) the Standard
doesn't offer any behaviour guarantees whatsoever if you
mess with those pointers.
</caveat>
 
J

Jordan Abel

mnaydin said:
Assume the main function is defined with
int main(int argc, char *argv[]) { /*...*/ }

So, is it permitted to modify the argv array? The standard says
"The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program,[...]". According to
my reading of the standard, for example, ++argv and ++argv[0][0]
are both permitted, but not ++argv[0] because it says nothing about
the argv array itself. Is my interpretation correct ?

<caveat class="this is from memory, not the Standard">
I believe so, yes. You can modify argv because you get a
copy of the caller's value, so why should the caller care
what you do with it? You can modify the contents of each
string because there's no particular reason to forbid you
to, so long as you don't try to stretch the string - i.e.
scribble over or past the null terminator. But for all you
know, the implementation might have used dynamic allocation
to get the memory it needs for storing those strings, and
might have no spare copy of the pointer values returned by
the allocator - so (if I recall correctly) the Standard
doesn't offer any behaviour guarantees whatsoever if you
mess with those pointers.
</caveat>

Can you swap two of them? [suppose you want to bring all arguments
starting with '-' to the beginning of the array]
 
E

Eric Sosman

Jordan said:
mnaydin said:

Assume the main function is defined with
int main(int argc, char *argv[]) { /*...*/ }

So, is it permitted to modify the argv array? The standard says
"The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program,[...]". According to
my reading of the standard, for example, ++argv and ++argv[0][0]
are both permitted, but not ++argv[0] because it says nothing about
the argv array itself. Is my interpretation correct ?

<caveat class="this is from memory, not the Standard">
I believe so, yes. You can modify argv because you get a
copy of the caller's value, so why should the caller care
what you do with it? You can modify the contents of each
string because there's no particular reason to forbid you
to, so long as you don't try to stretch the string - i.e.
scribble over or past the null terminator. But for all you
know, the implementation might have used dynamic allocation
to get the memory it needs for storing those strings, and
might have no spare copy of the pointer values returned by
the allocator - so (if I recall correctly) the Standard
doesn't offer any behaviour guarantees whatsoever if you
mess with those pointers.
</caveat>


Can you swap two of them? [suppose you want to bring all arguments
starting with '-' to the beginning of the array]

Not reliably. There are three different things one might
be talking about when one says `argv':

- The function parameter variable: This is modifiable.

- The individual pointers argv[0], argv[1], ... The
Standard says nothing about whether these are modifiable.

- The strings whose first characters are *argv[0],
*argv[1], ... The Standard says these are modifiable.

Section 5.1.2.2.1, paragraph 2, final constraint.
 
B

bluejack

Given that there are no const keywords in use, one would expect that
argv is modifyable in any and all senses. Naturally, main is something
of an exception case, but even so, I trust that the people who
established the standard were reasonably sensible and rigorous people,
and if they had meant for something to be const, they would have used
the const keyword to so indicate.

As for compiler designers...

-bluejack
 
M

mnaydin

Jordan said:
mnaydin said:
Assume the main function is defined with
int main(int argc, char *argv[]) { /*...*/ }

So, is it permitted to modify the argv array? The standard says
"The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program,[...]". According to
my reading of the standard, for example, ++argv and ++argv[0][0]
are both permitted, but not ++argv[0] because it says nothing about
the argv array itself. Is my interpretation correct ?

<caveat class="this is from memory, not the Standard">
I believe so, yes. You can modify argv because you get a
copy of the caller's value, so why should the caller care
what you do with it? You can modify the contents of each
string because there's no particular reason to forbid you
to, so long as you don't try to stretch the string - i.e.
scribble over or past the null terminator. But for all you
know, the implementation might have used dynamic allocation
to get the memory it needs for storing those strings, and
might have no spare copy of the pointer values returned by
the allocator - so (if I recall correctly) the Standard
doesn't offer any behaviour guarantees whatsoever if you
mess with those pointers.
</caveat>

Can you swap two of them? [suppose you want to bring all arguments
starting with '-' to the beginning of the array]

Yes, my primary intention is to bring some arguments to the beginning
of the array. But swapping two of them on the argv array is not a
solution because the assignment argv = argv[j] is not guaranteed to
work since argv array may not be modifiable, as Richard and Eric said
in this thread. On the other hand, I thought this was a common
practice. At least in K&R2 there is an example on the page 117,
section 5.10, where argv[0] is modified, though with a different
purpose from mine. Interestingly, in the K&R1 version of the same
example, on the page 113, section 5.11, the argv[0] was not modified
and a pointer to char, named s, was used to loop through the string.

In any case, I think one of the easy and guranteed solutions is to
clone the original argv array and work on the cloned array,
something like that:
char **arglist = malloc((argc + 1) * sizeof *arglist);
if (arglist == NULL) ... Ouch ! ...
memcpy(arglist, argv, (argc + 1) * sizeof *arglist);
 
E

Eric Sosman

bluejack said:
Given that there are no const keywords in use, one would expect that
argv is modifyable in any and all senses. Naturally, main is something
of an exception case, but even so, I trust that the people who
established the standard were reasonably sensible and rigorous people,
and if they had meant for something to be const, they would have used
the const keyword to so indicate.

On the other hand, the authors of the Standard stated
explicitly that the pointed-to strings are modifiable, even
though the "no `const' appears" argument would apply to them
with equal force. Why did they bother?

Keep in mind the large body of C code already in existence
before `const' entered the language. The ANSI committee could
not invalidate two-plus decades' worth of existing code because
they'd thought of a better way. They codified existing practice,
even though (with the new tools) more explicit practice was
possible.

It seems to me not unlike the situation with string literals:
They are not `const', yet you are forbidden to try to alter them.
The Rationale explains that they were not made `const' because a
lot of existing code would break; instead, they are non-`const'
and the Standard has special language warning you not to modify
them.

The argv question seems similar (although the Rationale does
not confirm it): Pre-`const' code declared argv as `char**', and
the Standard adopted that use but added special language describing
the writeability of argv[j]. I think it a "curious incident"
that the Standard says nothing about the writeability of argv.
 
M

mnaydin

bluejack said:
Given that there are no const keywords in use, one would expect that
argv is modifyable in any and all senses. Naturally, main is something
of an exception case, but even so, I trust that the people who
established the standard were reasonably sensible and rigorous people,
and if they had meant for something to be const, they would have used
the const keyword to so indicate.

As for compiler designers...

-bluejack

But, by the same logic, one could argue that it is explicitly stated in
the standard that the parameters argc, argv, and the strings pointed to
by argv array shall be modifiable, even though there is no const
keyword qualifying them, but nothing is stated on the modifiability
of the argv array itself (ie, argv[0],...,argv[argc]), so there is a
strong indication that the argv array is not supposed to be modifiable.
I think relying on the absence of the const keyword is not a valid
argument.
 
P

Peter Nilsson

[You might like to quote some context. If your message was not related
to Eric Sosman's then perhaps you should reply to the OP's message
rather than somewhere downthread.]
Given that there are no const keywords in use, one would expect
that argv is modifyable in any and all senses. I trust that the
people who established the standard were reasonably sensible and
rigorous people, and if they had meant for something to be const,
they would have used the const keyword to so indicate.

That is a naive, even dangerous, form of reasoning. C has many quirks
which are counter-intuitive. Some of them are far from sensisble, e.g.
gets.

Trusting (or blaming) the Committee is an irrelevance. At the end of
the day, the language is that written in the Standard. It is up to
programmers to educate themselves on what that language is.

C is one of the worst languages for programming by intuition and hope!
 
B

bluejack

Peter said:
That is a naive, even dangerous, form of reasoning. C has many quirks
which are counter-intuitive. Some of them are far from sensisble, e.g.
gets.
Granted.

Trusting (or blaming) the Committee is an irrelevance. At the end of
the day, the language is that written in the Standard. It is up to
programmers to educate themselves on what that language is.

And, while there are several good approaches to educating yourself
on what the language is ... and I realize this is going to endear me
to nobody ... my preferred method is "trial and error" -- despite my
"naive and dangerous" form of reasoning, it's a perfectly effective
approach, assuming you start out by trusting nobody. I don't trust
the standard (in part because there's no guarantee it has been
implemented correctly, but mostly because I don't have a copy),
I don't trust compiler designers (because they don't necessarily
implement correctly), I don't trust secondary documentation (it's
like a photocopy of a photocopy), I *certainly* don't trust usenet,
and I trust my own memory *least of all*. What I trust are demonstrable
results.

Naturally, with that mentality, I tend to code defensively. It would
never
even occur to me to *want* to change argv (or use gets). Still I do
find
these conversations fascinating, and I always enjoy the cranky
attitude found on usenet!

-bluejack
 
C

Chuck F.

bluejack said:
>
Given that there are no const keywords in use, one would expect
that argv is modifyable in any and all senses. Naturally, main
is something of an exception case, but even so, I trust that the
people who established the standard were reasonably sensible and
rigorous people, and if they had meant for something to be const,
they would have used the const keyword to so indicate.

This is fairly meaningless due to the total lack of context. See
my sig below for a way to use the broken google interface sanely.
 
F

Flash Gordon

bluejack said:

These quirks won't be learnt by trial and error. The *most* you will
learn is how the specific version of the specific implementation you are
using works.
And, while there are several good approaches to educating yourself
on what the language is ... and I realize this is going to endear me
to nobody ... my preferred method is "trial and error" -- despite my
"naive and dangerous" form of reasoning, it's a perfectly effective
approach,

No, it is most definitely NOT a perfectly effective method. All sorts of
things that you might think are correct, and might work on your compiler
this week, might fail abysmally when it actually matters to you.
> assuming you start out by trusting nobody.

Start by not trusting trial and error, because it has been repeatedly
been shown that the people posting here having relied on it to learn C
have learnt to do things which are definitely wrong.
> I don't trust
the standard (in part because there's no guarantee it has been
implemented correctly,

In that case build your own chip factory, design and build your own
chips, and write your own compiler.
> but mostly because I don't have a copy),

Google for n1124.pdf to get a free public draft of the next version, or
buy a copy of the current version from a standards body (you can get it
for $18 last I heard).
I don't trust compiler designers (because they don't necessarily
implement correctly),

In that case don't use any you have not implemented. You also can't
trust assemblers, text editors or the OS by that reasoning.
> I don't trust secondary documentation (it's
like a photocopy of a photocopy),

It is easy to find reviews of books to see if they are reliable, and you
can cross-reference to the standard if you are not sure.
> I *certainly* don't trust usenet,
and I trust my own memory *least of all*.
> What I trust are demonstrable
results.

I can demonstrate with one compiler that you can safely modify string
literals and get the expected result. I can also demonstrate with a
later version of the *same* compiler that you can't modify string
literals because it causes a SIGSEGV (I might be wrong on the exact
signal, but definitely a crash). The reality is that anything can happen
because it is undefined behaviour. However, had I relied on your method
of trial and error all my code could have suddenly gone from "working"
to "crashing".

If I could be bothered I could come up with lots of other examples, but
the above is one I know to be demonstrably true.
Naturally, with that mentality, I tend to code defensively.

Coding defensively REQUIRES understanding how the language is DEFINED to
work, what you are doing by relying on trial and error rather than a
reliable source of information is coding stupidly.
> It would
never
even occur to me to *want* to change argv (or use gets). Still I do
find
these conversations fascinating, and I always enjoy the cranky
attitude found on usenet!

Well, if you think trial and error is a substitute for a good text book
expect responses a lot more cranky than mine.
 
J

Jordan Abel

On the other hand, the authors of the Standard stated
explicitly that the pointed-to strings are modifiable, even
though the "no `const' appears" argument would apply to them
with equal force. Why did they bother?

Keep in mind the large body of C code already in existence
before `const' entered the language. The ANSI committee could not
invalidate two-plus decades' worth of existing code because they'd
thought of a better way. They codified existing practice, even though
(with the new tools) more explicit practice was possible.

They could have permitted an additional prototype:

int main(int argc, char * const *argv); which i think they would have
done if they had intended that the pointers may not be modifiable.
It seems to me not unlike the situation with string literals:
They are not `const', yet you are forbidden to try to alter them.

Except, of course, that you are inferring that by lack of analogy to the
explicit permission to write their targets, not from any actual language
in the standard.
The
Rationale explains that they were not made `const' because a lot of
existing code would break; instead, they are non-`const' and the
Standard has special language warning you not to modify them.

The standard does not have such special language for the argv pointers.
The behavior in modifying a non-const variable that is not a string
literal and was not cast from the address of a const variable is
well-defined.
The argv question seems similar (although the Rationale does not
confirm it): Pre-`const' code declared argv as `char**', and the
Standard adopted that use but added special language describing the
writeability of argv[j]. I think it a "curious incident" that the
Standard says nothing about the writeability of argv.


I think it's more curious that it does add such language for the
writeability of argv[j], given that it's non-const (and not a string
literal) and hence "should" be modifiable anyway.
 
K

Keith Thompson

Chuck F. said:
This is fairly meaningless due to the total lack of context. See my
sig below for a way to use the broken google interface sanely.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Or, better yet, read the more detailed description at
<http://cfaj.freeshell.org/google/>.
 
N

Netocrat

On 2005-12-15, Eric Sosman <[email protected]> wrote:
[on string literals as an analogy to argv]
The standard does not have such special language for the argv pointers.
The behavior in modifying a non-const variable that is not a string
literal and was not cast from the address of a const variable is
well-defined.

The ultimate declaration of the argv variable passed into the program is
not specified though, all the program gets is the declaration of the
function parameter.

It's legal to cast a const-qualified variable to a non-const version of
the same and pass it into a function, it's just not legal to write to it
within the function.
The argv question seems similar (although the Rationale does not
confirm it): Pre-`const' code declared argv as `char**', and the
Standard adopted that use but added special language describing the
writeability of argv[j]. I think it a "curious incident" that the
Standard says nothing about the writeability of argv.


I think it's more curious that it does add such language for the
writeability of argv[j], given that it's non-const (and not a string
literal) and hence "should" be modifiable anyway.


Without that language it would be implicitly undefined behaviour to
attempt to modify argv and argv[j], as it is now for arg. The
curiosity is that the Standard left it implicit rather than making it
explicit.
 
J

Jordan Abel

On 2005-12-15, Eric Sosman <[email protected]> wrote:
[on string literals as an analogy to argv]
The standard does not have such special language for the argv pointers.
The behavior in modifying a non-const variable that is not a string
literal and was not cast from the address of a const variable is
well-defined.

The ultimate declaration of the argv variable passed into the program is
not specified though, all the program gets is the declaration of the
function parameter.

It's legal to cast a const-qualified variable to a non-const version of
the same and pass it into a function, it's just not legal to write to it
within the function.

There is, however no basis in the text for supposing that this is the
case for *argv (...etc).
I think it's more curious that it does add such language for the
writeability of argv[j], given that it's non-const (and not a string
literal) and hence "should" be modifiable anyway.


Without that language it would be implicitly undefined behaviour


It would not. without that language, **argv (...etc) would still be of
type char, not const char, and since it's not a string literal (a listed
exception to an object of type char being modifiable), there's no basis
for supposing that it would be non-modifiable.
to attempt to modify argv and argv[j], as it is now for arg.
The curiosity is that the Standard left it implicit rather than making
it explicit.


There is no basis in the text for believing that it might be the case,
other than your interpretation of a conspicuous lack of a similar
statement for argv as for argv[j].
 
N

Netocrat

[I worded the above sloppily. More correctly the first sentence should
begin: "It's legal to take the address of a const-declared variable, cast
it to a pointer to a non-const qualified version of the variable's type,
and pass that pointer into a function, ..."]
There is, however no basis in the text for supposing that this is the
case for *argv (...etc).

(Assuming that you interpreted my sloppy wording as intended) I'd express
that in reverse: there's no basis in the text for supposing that the
variables passed into main() are uniquely unaffected by this possibility.

The mention that argc and argv are modifiable does seem redundant, but
useful clarification given that they are coming from an external
environment. The claim in my last post that it would be implicit UB to
attempt to modify argv without this mention may have been too strong, but
I'm not convinced that modifying argv[j] would be legal and defined
without mention.
 
J

Jordan Abel

comp.std.c added, it seems appropriate: for those who haven't been
following along, the issue is whether the text of the standard supports
a view that modification of the elements of argv [i.e. the individual
pointers] results in undefined behavior.

[I worded the above sloppily. More correctly the first sentence
should begin: "It's legal to take the address of a const-declared
variable, cast it to a pointer to a non-const qualified version of the
variable's type, and pass that pointer into a function, ..."]

But there's no reason to think that this has been done by whatever calls
main.
(Assuming that you interpreted my sloppy wording as intended) I'd
express that in reverse: there's no basis in the text for supposing
that the variables passed into main() are uniquely unaffected by this
possibility.

It's hardly unique.
 
J

Jonathan Leffler

Jordan said:
comp.std.c added, it seems appropriate: for those who haven't been
following along, the issue is whether the text of the standard supports
a view that modification of the elements of argv [i.e. the individual
pointers] results in undefined behavior.

Yes.

Section 5.1.2.2.1 of the ISO/IEC 9899:1999 seems quite explicit:

The parameters argc and argv and the strings pointed to by the argv
array shall be modifiable by the program, and retain their last-stored
values between program startup and program termination.


There's a note about the conventional but non-mandatory use of argc and
argv as the names of the parameters. Looks pretty clear to me...
 
C

Chris Torek

Jordan said:
comp.std.c added, it seems appropriate: for those who haven't been
following along, the issue is whether the text of the standard supports
a view that modification of the elements of argv [i.e. the individual
pointers] results in undefined behavior.


I think this claim is a bit premature....
Section 5.1.2.2.1 of the ISO/IEC 9899:1999 seems quite explicit:

The parameters argc and argv and the strings pointed to by the argv
array shall be modifiable by the program, and retain their last-stored
values between program startup and program termination.

That text guarantees that, in the "code" part of:

int main(int argc, char **argv) {
... code ...
}

the programmer may change argv itself (though this is hardly
controversial) and, for appropriate values of i and j, the programmer
may change argv[j] by ordinary assignment. Thus, e.g., the code
fragment below is fine given suitable i, p, and q:

/* suppose at this point, strcmp(argv, "this:that") == 0 */
p = argv;
p[4] = '\0';
q = p + 5;
/* now strcmp(p, "this") == 0 && strcmp(q, "that") == 0 */

The question in question (it is late, pardon the phrasing :) ) is
whether this is also proper, given suitable i, k, and p:

p = argv;
argv = argv[k];
argv[k] = p;

This writes on argv and argv[k], rather than argv[j]. The
fact that the Standard explicitly allows the programmer to write
on argv[j] should make one wonder why it fails to mention whether
the programmer may write on argv itself. The lack of a "const"
qualifier is not in itself permission, since:

void f(void) {
char *x = "this:that";

x[4] = '\0';
...
}

violates a "shall" outside a constraints section, rendering the
behavior undefined, yet no part of the declaration of x uses "const".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top