modifying the strings pointed to by argv

F

Francine.Neary

./foo
Segmentation fault (core dumped)
./foo -h
1

Well, of course, you could bloat the code with error messages, but
there's the old saying, Garbage In Garbage Out (GIGO).
./foo 0 /* it actually gets this right, which was a mild surprise */
1

Every non-negative integer argument up to and including 12 works fine,
but observe what happens when we exceed 12:

./foo 13
1932053504

(the correct value is of course 6227020800).

Well, you can modify the struct to maintain the factorial as a long
long if you want...
So you have a seriously convoluted program which only works for a
vanishingly small percentage (less than 0.2%) of possible valid inputs.
My own factorial program is, alas, twice as long as yours (58 lines),
not including library code of course - but it continues to get the
answers right long past 13, despite not taking advantage of ISO's
licence to write into argv. For example, compare this:

./foo 52
0

with this:

~/path/to/rjhfactorial 52
80658175170943878571660636856403766975289505440883277824000000000000

....or even to use GMP or some other library, and it will then work for
100% of valid inputs. This was just a proof of concept - like anything
it can be extended and generalized out of sight.
 
R

Richard Bos

Mark McIntyre said:
Mark McIntyre said:

The risk to the stability of your program is potentially high, and you
gain nothing that you can't get more safely in some other way.

What risk?
Sorry to go on about this, but to date I'm not aware of anyone in this
thread actually producing some evidence that this is bad. After all,
argv[n] is just another array of chars, so long as you don't walk off
the end of it, there's no more risk than with any other array of
chars.

Well, he seems to think that we're all going to make like C++
programmers and forget about "so long as". The natural solution to this
disease is not, however, to forbid writing to argv[n], but to make
arrays in C size-limited, with run-time checks on transgressions
thereof, and a sizeofobject operator for pointers (and arrays decayed
into pointers), including any which are passed as function parameters.
Unfortunately this is a lot of work, but there _is_ previous art in this
area; I suggest that he get together with jacob navia.

Richard, g,d,rlb
 
M

Mark McIntyre

Mark McIntyre said:
After all,
argv[n] is just another array of chars, so long as you don't walk off
the end of it, there's no more risk than with any other array of
chars.

You're right. As long as your code is perfect, nothing bad can happen.

But this is axiomatic. By this logic, one would never write to
character arrays at all.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
R

Richard Heathfield

Mark McIntyre said:
By this logic, one would never write to character arrays at all.

It's a deal. We're probably still in time for the next Standard - are
you going to write up the DR or shall I? :)
 
F

Francine.Neary

The standard allows that we can copy strings onto arg[0], arg[1], etc.
Why is it allowed ?
What can be the maximum length of such a string that is copied ?

The more I think about this thread, the more it seems to me that the
standard is quite broken on this point.

I was thinking along these lines: OK, so my program starts, and let's
say I get one command line argument, so argv[0] and argv[1] are both
pointers to strings. Perhaps argv[1] is enormously long! Now I change
it to point to something else. Alarm bells ring - I've just lost all
record of whatever argv[1] was pointing too - boom, potential memory
leak alert.

But on further reflection, even if I don't change argv[1], it's still
as good as a memory leak! Once my program has finishing doing whatever
it wants to do with its command line arguments, there's still never
any way of releasing the memory they occupy.

One practical solution would be: not only should each argv be
modifiable, but in fact it should be a pointer allocated (or that
behaves as if it was allocated) by malloc. This would let the
programmer free() the memory taken up by the command line arguments
once she'd finished processing them.
 
D

Default User

Mark said:
wrote: >>> The risk to the stability of your program is potentially
high, >>

C&V please.


Chapter and Verse for a probability? When has the standard ever
concerned itself with that sort of thing?





Brian
 
M

matevzb

How about this little factorial program? I think it's pretty cute.
E.g.
$ ./factorial 4
24

#include <stdio.h>
#include <stdlib.h>

struct s {
unsigned int f;
unsigned int n;

};

main(int argc, char **argv)
{
unsigned int i;
if(*argv) {
i=(unsigned int) atoi(argv[1]);
*argv=0;
argv[1]=malloc(sizeof(struct s));
((struct s*) argv[1])->n=i;
((struct s*) argv[1])->f=1;
}

if(((struct s*) argv[1])->n) {
((struct s*) argv[1])->f*=((struct s*) argv[1])->n--;
main(argc,argv);
return 0;
}
printf("%u\n",((struct s*) argv[1])->f);
free(argv[1]);
return 0;

}
Um, functionality aside, now I know what "typecasting" is =)
Seriously though, why abuse main() in such manner, unless one would
like to apply for IOCCC?
 
C

Chris Torek

The more I think about this thread, the more it seems to me that the
standard is quite broken on this point.

Maybe, from a "very small memory machine" point of view anyway:
I was thinking along these lines: OK, so my program starts, and let's
say I get one command line argument, so argv[0] and argv[1] are both
pointers to strings. Perhaps argv[1] is enormously long! Now I change
it to point to something else. Alarm bells ring - I've just lost all
record of whatever argv[1] was pointing too - boom, potential memory
leak alert.

But on further reflection, even if I don't change argv[1], it's still
as good as a memory leak! Once my program has finishing doing whatever
it wants to do with its command line arguments, there's still never
any way of releasing the memory they occupy.

Maybe not for you, but the implementation could.

Remember that your main() is called, somehow, by the implementation.
Typically this is accomplished with a bit of sneakily-written (and
usually machine-dependent in some way) code that vaguely resembles:

void __start(void) {
register struct __OS_start_info *args __sneaky("%r29");
int status;

... do some stuff with the OS-provided startup info ...
... in the process, set up argc and argv ...

__init_C_library_part_A();
__init_C_library_part_B();
...
__init_C_library_part_P();

status = main(argc, argv);

exit(status); /* __shutdown_C_library_* calls are done from exit() */
/* NOTREACHED */
}

Some of the __init calls may arrange to release the space occupied
by the argv array and strings.
One practical solution would be: not only should each argv be
modifiable, but in fact it should be a pointer allocated (or that
behaves as if it was allocated) by malloc. This would let the
programmer free() the memory taken up by the command line arguments
once she'd finished processing them.


While that might offer some advantage to a program that is running
short of memory otherwise, it would be a big change to existing
implementations, which the "OS-provided startup info" includes
storing the argv array and/or strings (and/or "environment" text
and pointers) in a stack frame that resides just "above" (or
co-incident with, depending on architecture) the frame for __start()
itself.
 
M

Mark McIntyre

Mark McIntyre said:


It's a deal. We're probably still in time for the next Standard - are
you going to write up the DR or shall I? :)

I'm guessing in that case that there's no logic behind your aversion.
Thats fair enough - I don't like rats much, no reason why.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

Chapter and Verse for a probability? When has the standard ever
concerned itself with that sort of thing?

Never, of course .
My point is that if someone is is posting a prejudice ^w highly
subjective opinion, they ought to make that clear, and not try to
dress it up as anything more.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
R

Richard Heathfield

CBFalconer said:

Mine produces <grin> :

[1] c:\c\junk>fact 52
Factorial(52) == 2147483648e12 * pow(3,23) * pow(7,8) * pow(11,4) *
pow(13,4) *
pow(17,3) * pow(19,2) * pow(23,2) * pow(29,1) * pow(31,1) *
pow(37,1) * pow(41,1
) * pow(43,1) * pow(47,1) * pow(2,6)
or approximately
806581751709438768400000000000000000000000000000000000000000000
00000.

without any fancy bignum arithmetic packages.

It's very sweet of you to call my bignum arithmetic routines "fancy",
but I don't think you'd say that if you saw them. "An ill-favoured
thing, sir, but mine own", as I think the Bard has it.
 
F

Francine.Neary

Um, functionality aside, now I know what "typecasting" is =)
Seriously though, why abuse main() in such manner, unless one would
like to apply for IOCCC?

Heh heh, I just looked that up on Wikipedia - I think I should enter
it, I'd be a natural :) I always love slipping in clever little
shortcuts where possible. To be honest I'm really enjoying finding my
feet in C... Basically, I'm going to finish my degree this summer and
after that I want a programming job. Unfortunately in my course we
only learned Java - nothing lower level (no C++, no C, no assembler).
I really hate hand-holding high-level langs like Java, so I've been
giving myself a crash course in C, hoping I'll be programming it
professionally by the end of this year! It's just so liberating -
anything I want to do, it lets me just do. And you've got to love a
language whose only runtime error message is "Segmentation fault"...

Anyway, back to your question - of course I'm not suggesting my
factorial program as a model for a larger program, but I think it's
kind of neat, and a good illustration of the flexibility of a char *.
 
F

Francine.Neary

The more I think about this thread, the more it seems to me that the
standard is quite broken on this point.

Maybe, from a "very small memory machine" point of view anyway:
I was thinking along these lines: OK, so my program starts, and let's
say I get one command line argument, so argv[0] and argv[1] are both
pointers to strings. Perhaps argv[1] is enormously long! Now I change
it to point to something else. Alarm bells ring - I've just lost all
record of whatever argv[1] was pointing too - boom, potential memory
leak alert.
But on further reflection, even if I don't change argv[1], it's still
as good as a memory leak! Once my program has finishing doing whatever
it wants to do with its command line arguments, there's still never
any way of releasing the memory they occupy.

Maybe not for you, but the implementation could.

Remember that your main() is called, somehow, by the implementation.
Typically this is accomplished with a bit of sneakily-written (and
usually machine-dependent in some way) code that vaguely resembles:

void __start(void) {
register struct __OS_start_info *args __sneaky("%r29");
int status;

... do some stuff with the OS-provided startup info ...
... in the process, set up argc and argv ...

__init_C_library_part_A();
__init_C_library_part_B();
...
__init_C_library_part_P();

status = main(argc, argv);

exit(status); /* __shutdown_C_library_* calls are done from exit() */
/* NOTREACHED */
}

That's interesting! I'd never guessed there was an extra bit of C
slipped in there - I guess I'd assumed the operating system directly
passed control to my main().

So maybe one way forward would be to set up some sort of callback
function - some mechanism for the programmer to signal to whatever can
control argc & argv that it's OK to free them up (I guess that would
mean moving the stack pointer an appropriate amount, from what you
said).
Some of the __init calls may arrange to release the space occupied
by the argv array and strings.
One practical solution would be: not only should each argv be
modifiable, but in fact it should be a pointer allocated (or that
behaves as if it was allocated) by malloc. This would let the
programmer free() the memory taken up by the command line arguments
once she'd finished processing them.


While that might offer some advantage to a program that is running
short of memory otherwise, it would be a big change to existing
implementations, which the "OS-provided startup info" includes
storing the argv array and/or strings (and/or "environment" text
and pointers) in a stack frame that resides just "above" (or
co-incident with, depending on architecture) the frame for __start()
itself.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
 
C

Chris Torek

That's interesting! I'd never guessed there was an extra bit of C
slipped in there - I guess I'd assumed the operating system directly
passed control to my main().

It may not be C code (it might be hand-coded assembly, for instance,
or written in some other convenient language). It is pretty rare
to get a call straight from the OS, though, if only because the OS
has no idea how to arrange for the proper cleanup calls to happen
if main() returns. (Simply executing the "terminate process" OS
call is not sufficient, since there may be atexit() calls to handle,
stdio buffers to flush, and so on.)
So maybe one way forward would be to set up some sort of callback
function - some mechanism for the programmer to signal to whatever can
control argc & argv that it's OK to free them up (I guess that would
mean moving the stack pointer an appropriate amount, from what you
said).

This is not really practical on most implementations (at least,
most of the ones I know of). An "I am done with arguments" call
would turn into a no-op. You could still include it, and have it
do nothing on those implementations, but do something on others,
of course.
 
S

Stephen Sprunk

That's interesting! I'd never guessed there was an extra bit of C
slipped in there - I guess I'd assumed the operating system
directly passed control to my main().

That's unlikely, since various languages and object formats will have
different entry mechanisms.

For instance, ELF has a pointer in the header to the function that is called
as an entry point. GCC links in a static hand-coded routine called
_start(), which grabs argc, argv, and environment variables, does certain
other OS-dependent things like setting up stdio, calls main(), and passes
the returned value back to the OS as part of the process tear-down
procedure. The compiler can't secretly insert that work into main() because
that would conflict with being able to call main() recursively.

S
 
D

David Thompson

On 14 Mar 2007 06:41:36 -0700, "Cong Wang" <[email protected]>
wrote:
The standard just says "argc and argv and the strings pointed to by
the argv array shall be modi?able by the program." But the argv array
_itself_ is _not_ required to be modi?able.
Right, although many seem to feel this was an oversight and it was/is
intended to be. It's difficult to conceive of any reasonable (nonDS9k)
implementation where it isn't.
The following buggy code only works well when argc >=2 and
strlen(argv[1]) >=1:

#include <stdio.h>
#include <string.h>
(You don't actually need stdio.)
int main(int argc, char *argv[])
{
char buf[2] = {0};

Did you intend something else here, like {'X',0} ?
As written, you only need strlen(argv[1])>=0, which in fact is
redundant; if argc >= 2 argv[1] must be a valid string, and strlen of
any valid string is >= 0.
strcpy(argv[1], buf);
return 0;
}

I guess making that modifiable may be considered for exec*() functions
or for the recursion of main(). And the maximum length maybe found in
POSIX.

POSIX allows an implementation dependent maximum on the aggregate
total of all arguments and environment passed to an exec'ed program,
optionally including some overhead. But there is no allowance for a
separate limit on the length of a single argument, if that was meant.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top