char **argv & char *argv[]

J

jab3

(again :))

Hello everyone.

I'll ask this even at risk of being accused of not researching adequately.
My question (before longer reasoning) is: How does declaring (or defining,
whatever) a variable **var make it an array of pointers?

I realize that 'char **var' is a pointer to a pointer of type char (I hope).
And I realize that with var[], var is actually a memory address (or at
least as it is represented by C, IIRC (an internal copy which is a fixed
pointer)) pointing (permanently) to the first element of an array. And I
realize that *var[] is an array of pointers where each pointer can point to
the beginning of a string (or whatever). But then there is **var. How
does that then become an array of pointers?

Hmmm. It's coming to me. Wait. So we declare 'char **var'. *var
is/contains a memory address of size char, which could point to the
beginning of a string (**var). (?) *var+1 would be the next char memory
address, which could point to a string (*(*var+1)) (moving char bytes
through memory (1)). That's my hangup. How is the **argv structure
formed? Are the arguments added, then memory allocated for that many, then
dividing them up across the argv variable? Because I've learned to accept
that **argv points to a string, *(*argv+1) points to the next, etc. (Are
the () necessary? Am I even right?) But how does it get that way? I feel
like I'm almost to a satori experience with this aspect of pointers (which
would be nice :)), but there's something holding me back (my mind maybe?).
I think I just need to get a grasp of the mechanics behind the creation of
argv. (Don't ask; whenever I'm studying pointers I get stuck on these
issues and I can't stop thinking about how, so I become unable to wrap my
head around it)

Where does the program store the arguments before putting them in argv? Is
there a buffer it puts each argument in, then copies it into argv? It's
driving me crazy. (Similar to how passing a pointer to printf with %s
(char *str = "Confused";printf("%s", str);) is the same as a string. How
does it (the compiler, program, ??) know? I then figured when it receives
the memory address, expecting a string, it dereferences the pointer,
traversing it until it gets a '\0'? Close?) Although it actually just hit
me that if I were to pass a normal string variable (char str[6] = "idiot")
as 'printf("hello, %s", str)' then str is actually a pointer to the first
element of str[6]. Ahhhh... :)

I realize that perhaps the argv example is implementation specific and not
topical. Perhaps you could imagine a similar situation, i.e. passing a
**var in a function that is in fact an array of pointers. Is the **var
construction often used without being an array of pointers? Also, why is
it technically more accurate to define argv as **argv and not *argv[]?
(according to a book I have, Linux Programming by Example).


Please excuse the rambling. I know I'm not being very clear. There's a
reason for that; hence the post :). Thanks for any help or guidance, and
patience.

-jab3
 
Y

Yan

(please those that see errors in my answers, point them out)
(again :))

Hello everyone.

I'll ask this even at risk of being accused of not researching adequately.
My question (before longer reasoning) is: How does declaring (or defining,
whatever) a variable **var make it an array of pointers?

I realize that 'char **var' is a pointer to a pointer of type char (I hope).
And I realize that with var[], var is actually a memory address (or at
least as it is represented by C, IIRC (an internal copy which is a fixed
pointer)) pointing (permanently) to the first element of an array. And I
realize that *var[] is an array of pointers where each pointer can point to
the beginning of a string (or whatever). But then there is **var. How
does that then become an array of pointers?

In the declaration 'char var[];' var, when used by itself is just a
pointer to the first element, and when you access the first or second
element your compiler even turns that var[0] into *(var+0) and var[1]
into *(var+1), literally using the number you gave it as the offset. So
you can think of var[] being sorta equal to *var so *var[] can sorta
equal to **var.. I don't know if im making too much sense, but check out
K&R's book, it explains it well
Hmmm. It's coming to me. Wait. So we declare 'char **var'. *var
is/contains a memory address of size char, which could point to the
beginning of a string (**var). (?) *var+1 would be the next char memory
address, which could point to a string (*(*var+1)) (moving char bytes
through memory (1)). That's my hangup. How is the **argv structure
formed? Are the arguments added, then memory allocated for that many, then
dividing them up across the argv variable? Because I've learned to accept
that **argv points to a string, *(*argv+1) points to the next, etc. (Are
the () necessary?

yeah they are necessary because + is of lower precedence than *
Am I even right?) But how does it get that way? I feel
like I'm almost to a satori experience with this aspect of pointers (which
would be nice :)), but there's something holding me back (my mind maybe?).
I think I just need to get a grasp of the mechanics behind the creation of
argv. (Don't ask; whenever I'm studying pointers I get stuck on these
issues and I can't stop thinking about how, so I become unable to wrap my
head around it)

really check out K&R's book and go to the chapter on pointers, it's
really of great help
Where does the program store the arguments before putting them in argv?

When a process is created (at least under unix) the first thing that's
put on the stack is your program's activation record, your environmental
variables, your arguments and your count of args, thus when you pop them
off the stack one by one (as by the standard calling declaration) you
take the count and the args. that's done by the operating system.
Is
there a buffer it puts each argument in, then copies it into argv? It's
driving me crazy. (Similar to how passing a pointer to printf with %s
(char *str = "Confused";printf("%s", str);) is the same as a string.

any string that's in quotes in a C program gets stored in a read-only
part of your program when it's running, so the line:

char *str = "Confused";

gets coppied inot the read-only memory as soon as your program sees it,
then assigns that address to str, which is why you can't change strings
like that. when you call printf() with that str pointer as one of the
args, it simply goes to that location in read-only memory and reads it.


How
does it (the compiler, program, ??) know? I then figured when it receives
the memory address, expecting a string, it dereferences the pointer,
traversing it until it gets a '\0'? Close?

yup that's how c does strings

) Although it actually just hit
me that if I were to pass a normal string variable (char str[6] = "idiot")

now saying:

char str[6] = "idiot";

is different from what i said above since in that statement you declare
an array of chars of length 6 and you assign the string "idiot" to it,
read: writeable memory, that statement is syntatically equivalent to:

char str[6] = { 'i', 'd', 'i', 'o', 't', '\0' };

as 'printf("hello, %s", str)' then str is actually a pointer to the first
element of str[6]. Ahhhh... :)

so as i said above str by itself is just a pointer the the location of
the first char, just as it was in constant string just like it was in an
array as i mentioned first thing in the response, so to the printf
statement it pretty much looks like the same thing
I realize that perhaps the argv example is implementation specific and not
topical. Perhaps you could imagine a similar situation, i.e. passing a
**var in a function that is in fact an array of pointers. Is the **var
construction often used without being an array of pointers?

Also, why is
it technically more accurate to define argv as **argv and not *argv[]?
(according to a book I have, Linux Programming by Example).

its more accurate because in your system, argv is exactly that, a
pointer to a pointer, the "first dereferencing" gives the address of the
pointer to where the first string is (in argv's case, your program's
name), then the next dereferencing (**argv) would point to the actual
first letter in the first string, (*(*argv+1) would point to the first
letter of the second string, etc)
 
C

CBFalconer

jab3 said:
I'll ask this even at risk of being accused of not researching
adequately. My question (before longer reasoning) is: How does
declaring (or defining, whatever) a variable **var make it an
array of pointers?

It doesn't. It makes it a variable holding a pointer to some other
type of pointer. The confusion arises because this is exactly what
you get when you pass an array of those pointers to a function. A
passed array is represented by a pointer to its zeroth element.
 
C

Chris Torek

The short answer is, "it does not".

More precisely, "char **var" declares "var" as a variable of *type*
"pointer to pointer to char". Whether "var" actually points to
anything at all (much less "anything useful") is up to you, the
programmer.
And I realize that with var[], var is actually a memory address (or at
least as it is represented by C, IIRC (an internal copy which is a fixed
pointer)) pointing (permanently) to the first element of an array.

This is also wrong, or at least, not quite right. :)

There are some "gotchas" with array declarations that do not occur
with pointers, so we have to start adding more context. If we write,
for instance:

int arr1[8] = { 1, 2, 3, 0 };

outside of a function, or inside a block, we have both declared
and defined "arr1" as a variable of type "array 8 of int" (to use
the "cdecl" program's syntax). Because we initialized the array,
we can omit the size, and have the compiler figure it out:

int arr2[] = { 1, 2, 3, 0 };

but now we get an "array 4 of int", because we only used four
initializers.

On the other hand, we have a peculiar feature of the C language in
which function parameters that *look like* arrays are actually
declared as pointers. If we write:

void somefunc(char s[]) {
/* code */
}

the compiler is obligated to pretend that we actually wrote:

void somefunc(char *s) {
/* code */
}

That is, the local-variable "s" within the function somefunc() has
type "pointer to char", rather than "array MISSING_SIZE of char".

The reason for this peculiar feature has to do with what I call
"The Rule" about arrays and pointers in C, combined with the fact
that C passes arguments by value. For (much) more about The Rule,
see <http://web.torek.net/torek/c/pa.html>.

Except for some new features in C99 intended for optimization,
there is never any reason you *have* to use the array notation to
declare formal parameter names in function definitions, and I
encourage programmers to use the pointer notation, so that the
declaration is not misleading: since "s" inside somefunc() has type
"char *", we should all declare it as "char *" in the first place.

Ever since the C89 standard came out, something peculiar happens
if we write:

int arr3[];

outside a function. This is a "tentative definition" of the
array "arr3", and if we reach the end of a translation unit (roughly,
"C source file") without coming across any more details for arr3[],
it acts as if we had written:

int arr3[1] = { 0 };

On the other hand, though, if we try to use empty square brackets
*inside* a function (not as a parameter but inside the {}s):

void wrong(void) {
int arr4[]; /* ERROR */
/* more stuff */
}

we have done something wrong. Empty square brackets are not allowed
here.

Finally, C99 has something called a "flexible array member" of
structures, which we can ignore for now, but does give you one more
place where you can write empty square brackets and have it mean
something special.

All of these are just things you have to memorize -- quirks about
C that "are just the way they are": not for any particular reason
other than that Dennis Ritchie and/or the C standards folks said
so. They all make it a little more tricky to talk about arrays in
C.
And I realize that *var[] is an array of pointers where each
pointer can point to the beginning of a string (or whatever).

If it is indeed an array at all -- for instance, if we write:

char *arr5[100];

either outside or inside a function (not as a parameter to a
function), then arr5 has type "array 100 of pointer to char", and
each of those 100 "pointer to char"s can point to the first of a
sequence of "char"s making up a C string.

Again, "it does not"...

In the declaration 'char var[];' var, when used by itself is just a
pointer to the first element ...

I think it is better to say that it *becomes* a pointer to the
first element of the array.

This is The Rule about arrays and pointers in C:

In a value context, an object of type "array N of T" becomes
a value of type "pointer to T", pointing to the first element
of that array, i.e., the one with subscript zero.

The compiler has to *produce* this pointer, often using a single
machine instruction. The array itself is a C object -- something
occupying memory, and (we can hope) holding some useful values --
but the pointer the C compiler comes up with is a mere "value"
(an "rvalue", in typical computer-science lingo).

The Rule is yet another arbitrary rule, something else you have to
memorize about C. It is *so* important, and used so often, though,
that it is not "just" another rule, it is *The* Rule: The Rule
about arrays and pointers in C. Memorize it, work with it, until
it seems natural, and then all this pointer stuff in C will start
to make sense.

Note that The Rule applies only to *objects* in *value contexts*.
You have to be able to distinguish between objects and values, and
spot the contexts, but this is pretty easy if you have done any C
programming, or even much programming in other languages. If you
have statements like:

a = 17;
b = a + 25;

you know that "a" gets set to 17, and "b" gets set to 42. But how
is it that we *set* "a" on the first line, then *get* its value to
add 25 to it on the second line? The answer is "object" vs "value"
contexts. The "a =" part means "set a" -- set the object. The
"17" part just means "the value 17". The "b =" part means we will
set "b" (the object), and the "a + 25" part means we will fetch
the value in "a" and add 25.

Most of these contexts are obvious -- the left side of an "="
operator is an "object context", and the right side is a "value
context". C has a lot of operators, though, and there are two
important ones that have "object context": the unary-& operator,
which takes the address of an object, and the sizeof operator,
which produces the size (in "C bytes") of an object.

Most of the other operators have value context, and if you name an
object, such as an ordinary variable, you get the object's value.
For ordinary "int"s and "double"s and such, the value of the object
is whatever value you last stored in the variable. For arrays,
the value is that produced by The Rule.

You can either use this value right away -- printing it out, or
applying some operator to it, for instance -- or you can store it
in an object. Consider what happens if we choose to store the
value the compiler produces when we apply The Rule to "arr5".

Remember that "arr5" has type "array 100 of pointer to char":
char *arr5[100];
and that The Rule says:
In a value context,
(yep, got one of those)
an object of type "array N
(check -- that is what we have; N here is 100)
of T"
(and T is "pointer to char")
becomes a value of type "pointer to T",
(pointer to pointer to char)
pointing to the first element of the array, i.e., the one
with subscript zero.

So if we want to store this value in an object, we need one of
type "pointer to pointer to char", or "char **":

char **holder;

holder = arr5;

(Note that the array's size -- the constant named N, 100 in this
case -- gets throw away. We are allowed to ignore it when working
with The Rule. It is a darn good idea to save it away somewhere
else, though, because if the array has 100 elements, we had better
not write over arr5[231], which does not exist. The Rule tosses
the constant, so in practical code, *we* have to save it -- the
language threw it away, but it really does matter. For now, we
will ignore it, and perhaps cross fingers, toes, and/or eyes and
hope we do not use an out-of-bounds array subscript later. Or
maybe we will occasionally check, remembering the size is 100.)

Now "holder" stores the value produced by The Rule: a value of
type "pointer to pointer to char", pointing to the first element
of arr5 -- &arr5[0], in other words.

This is where things get interesting. Suppose we now want to use
the value with the subscript operators, or with ordinary pointer
arithmetic. It may be time to remember that subscripts are in fact
defined in terms of pointer arithmetic, and the unary "*"
pointer-following operator:

a

"means":

*((a) + (i))

where the addition uses pointer arithmetic. We fetch the values
of the two operands -- the array "a" and the index "i" -- and add
them, then we use pointer-follower-"*" to find the object in that
slot in the array.

But wait! I just said "we use the value of the array"! There it
goes again: The Rule tells us how to find the "value" of an array
object in a value context. If "a" is an array, The Rule says that
we find its value by dropping the constant N, and then taking the
address of its first element.

This is exactly what we did when we stuck the "value" of arr5 into
the variable named "holder"! If "holder" holds the value that
gets produced by The Rule, what difference is there between:

arr5 /* i.e., *((arr5) + (i)) */

and:

holder /* i.e., *((holder) + (i)) */

? The answer is: there is no significant difference at all --
"arr5" has The Rule applied, producing a value, but "holder" is
used in a value context, pulling the *same* value out of it. The
only real difference is in the machine instruction(s) used to create
the value the first time (by applying The Rule), or to pull the
value out of the "holder" variable.

Of course, if we use *different* operators, we can get different
results. In particular, the sizeof operator has "object context"
instead of "value context":

size1 = sizeof arr5;
size2 = sizeof holder;

will set size1 to some fairly large number (like 200, 400, or 800)
and size2 to some small number (like 2, 4, or 8) on today's machines;
so "arr5" and "holder" really are very different. Their *values*
are the same, though, due to The Rule, and to us setting "holder"
to the value The Rule creates for "arr5".

In short: arrays are not pointers; but, due to The Rule, the "value"
of an array *is* a pointer, so a pointer is "just as good" as the
actual array, if you only want the value. (But the pointer *MUST*
be set to the right value first! Arrays are collections of lots
of objects -- N of them, where N is the size in the array definition
-- while a pointer is just *one* object. To use it like an array,
you have to point it at enough memory to hold all N objects.)

[skipping a bunch of stuff]
any string that's in quotes in a C program gets stored in a read-only
part of your program when it's running ...

Well, maybe: it *may* be read-only, and as a programmer, you are
expected not to write on it. If you *do* write on it, the language
makes no promises; things can go very wrong. So you should treat
it as if it is read-only, even if it happens not to be.

Also, this is not true for "any" string, just "most" of them (as
you noted later, in something I snipped later).
so the line:

char *str = "Confused";

gets coppied inot the read-only memory as soon as your program sees it,
then assigns that address to str, which is why you can't change strings
like that.

More precisely, string literals -- those things inside double quotes
-- are a shorthand for creating anonymous arrays of "char". These
arrays may be stored in read-only areas (and high-quality C compilers
should strive to make sure they are). There is an exception for
string literals used to initialize arrays of "char"s, so that:

char buf[] = "hello";

is not required to create one of those anonymous arrays (you have
a perfectly good, non-anonymous, array named "buf"; why bother with
two copies of the character sequence 'h' 'e' 'l' 'l' 'o' '\0'?).
But other cases do create the anonymous arrays.

Logically speaking, these anonymous arrays *should* have type
"const char [N]":

const char __internal_string_00000[9] =
{ 'C', 'o', 'n', 'f', 'u', 's', 'e', 'd', '\0' };

because they are supposed to be read-only; but for historical
compatibility with C from the 1970s and 1980s, the "const" is
left off of the type. The string literal "Confused" thus has
type "array 9 of char" -- 9 because there are nine bytes inside
it, counting the terminating '\0' -- instead of "array 9 of
const char".

Since this is an array, it -- like all arrays -- is once again
subject to The Rule. You can write:

str = "Confused"; /* just like: str = __internal_string_00000; */

to make str point to the uppercase 'C' -- the first element
of the array.

Every string literal (except those that initialize arrays of char)
generates, inside the compiler, another one of these
"__internal_string_01234[]" style arrays, and every one of these
arrays is another candidate for The Rule. (Identical string literals
may or may not reuse an already-internally-generated array -- this
part is up to the compiler. Note also that "Hel-LO, World", and
"O, World" could actually share a single array, if the compiler is
sufficiently clever/sneaky, because they both *end* with the same
sequence.)
when you call printf() with that str pointer as one of the
args, it simply goes to that location in read-only memory and reads it.

Indeed, every time you pass a string literal to printf() for the
format argument:

printf("str is `%s'\n", str);

you create another one of these anonymous arrays and apply The
Rule. (See how often The Rule gets used? Almost every printf()
in every C program has at least one occurrence.)

Note that the anonymous array *is* still an array, not a pointer;
if you apply one of the "object context" operators to it, it stays
an array. In particular, if you apply the sizeof operator, you
should get the size of the array, *not* the size of a pointer:

int sz = sizeof "the anonymous array for this string literal";

*must* set sz to 44 (if I counted right) -- 43 characters plus the
terminating '\0'. Likewise, we can even do weird things like:

char (*p1)[44] = &"the anonymous array for this string literal";

The "&" operator produces the address of the anonymous array,
just as if we had written out a named array:

char some_name[44] = "...";
char (*p2)[44] = &some_name;

except that the entire array to which "p1" points is read-only,
despite not being const-qualified. (We could const-qualify the
some_name array and p2, and even const-qualify p1 except for
some brokenness in C's type system. This brokenness is what you
get when a committee designs the thing. :) )

(Some C compilers have gotten string literals wrong, historically,
producing pointers instead of arrays. There are only two ways to
tell in legitimate C code, using sizeof and unary "&". That is,
only the tricks shown here with "sz" and "p1" will expose the
difference, because The Rule is so darn ubiquitous.)
 
J

jab3

Chris Torek graciously wrote on Sunday 05 December 2004 05:12 pm:
The short answer is, "it does not".

So I hear :).
More precisely, "char **var" declares "var" as a variable of *type*
"pointer to pointer to char". Whether "var" actually points to
anything at all (much less "anything useful") is up to you, the
programmer.

Is "var" an identifier, an object, an lvalue, or a variable? :) Seriously,
it could also be a value in certain contexts I see, but what is the
situation with lvalue and variable and identifier and object? I see in
K&R2 that an object is a "named region of storage," and an lvalue is an
"expression referring to an object." (197; I don't have the C Standard)
Then it says an identifier is a sequence of letters and digits. (192) Then
in your C for Smarties (which is good BTW; I'll have to digest it some
more), at first you say lvalues and objects are the same (the former being
ISO's term and the latter yours, sort of :)), but then you clarify it by
saying that an lvalue names an object, which is how I see K&R2. Then you
say variables are the best examples of objects. Is that the name or the
location/storage? And where to identifiers fit in? :) Am I right to think
of objects, strictly speaking, as the hardware location of 'stuff'? And
the lvalue is the name I've given that 'stuff,' for instance char stf[] =
"blah". 'stf' is the lvalue, and its location in memory is the object? I
think the more I type the more I confuse myself :).
And I realize that with var[], var is actually a memory address (or at
least as it is represented by C, IIRC (an internal copy which is a fixed
pointer)) pointing (permanently) to the first element of an array.

This is also wrong, or at least, not quite right. :)

That IIRC comment was an attempt at remembering your article about "The
Rule" I read a couple of months ago, but I didn't have the
conceptual...framework to process it (I didn't read the previous 3 articles
about types and objects and values and contexts, etc. then) Oh well.
There are some "gotchas" with array declarations that do not occur
with pointers, so we have to start adding more context. If we write,
for instance:

int arr1[8] = { 1, 2, 3, 0 };

outside of a function, or inside a block, we have both declared
and defined "arr1" as a variable of type "array 8 of int" (to use
the "cdecl" program's syntax). Because we initialized the array,
we can omit the size, and have the compiler figure it out:

int arr2[] = { 1, 2, 3, 0 };

but now we get an "array 4 of int", because we only used four
initializers.

On the other hand, we have a peculiar feature of the C language in
which function parameters that *look like* arrays are actually
declared as pointers. If we write:

void somefunc(char s[]) {
/* code */
}

the compiler is obligated to pretend that we actually wrote:

void somefunc(char *s) {
/* code */
}

That is, the local-variable "s" within the function somefunc() has
type "pointer to char", rather than "array MISSING_SIZE of char".

The reason for this peculiar feature has to do with what I call
"The Rule" about arrays and pointers in C, combined with the fact
that C passes arguments by value.

Ahh...That's why "The Rule" is effected. The argument is in a value
context, and C stipulates that the value of an array is a pointer to its
first element, so "The Rule" happens (close?).
Except for some new features in C99 intended for optimization,
there is never any reason you *have* to use the array notation to
declare formal parameter names in function definitions, and I
encourage programmers to use the pointer notation, so that the
declaration is not misleading: since "s" inside somefunc() has type
"char *", we should all declare it as "char *" in the first place.

Ah, I see. (At least I think I do. Right now. Tonight :))
Ever since the C89 standard came out, something peculiar happens
if we write:

int arr3[];

outside a function. This is a "tentative definition" of the
array "arr3", and if we reach the end of a translation unit (roughly,
"C source file") without coming across any more details for arr3[],
it acts as if we had written:

int arr3[1] = { 0 };

On the other hand, though, if we try to use empty square brackets
*inside* a function (not as a parameter but inside the {}s):

void wrong(void) {
int arr4[]; /* ERROR */
/* more stuff */
}

we have done something wrong. Empty square brackets are not allowed
here.

(BTW, what _is_ a translation unit? I see it used in the K&R2 Appendix A,
and I see it here, but I couldn't find that K&R2 defined what it meant.
They just say "a program consists of one or more _translation units_ stored
in files." (191) Granted, I haven't made it through the book yet. Just
skipped to that Appendix :))

So why is it wrong to declare an 'incomplete' type inside a function?
Finally, C99 has something called a "flexible array member" of
structures, which we can ignore for now, but does give you one more
place where you can write empty square brackets and have it mean
something special.

All of these are just things you have to memorize -- quirks about
C that "are just the way they are": not for any particular reason
other than that Dennis Ritchie and/or the C standards folks said
so. They all make it a little more tricky to talk about arrays in
C.
And I realize that *var[] is an array of pointers where each
pointer can point to the beginning of a string (or whatever).

If it is indeed an array at all -- for instance, if we write:

char *arr5[100];

either outside or inside a function (not as a parameter to a
function), then arr5 has type "array 100 of pointer to char", and
each of those 100 "pointer to char"s can point to the first of a
sequence of "char"s making up a C string.

Umm...I think that's what I meant :).

Again, "it does not"...

In the declaration 'char var[];' var, when used by itself is just a
pointer to the first element ...

I think it is better to say that it *becomes* a pointer to the
first element of the array.

Yeah, that's what I didn't understand before this reply and further reading
on your site. I had forgotten that "The Rule" is something that happens in
certain situations; not something that is persistent. Right? I mean,
let's say a function is called with a parameter of (char *str) but the
argument passed is "char a_str[20]". So inside of the function, a_str
'becomes' a pointer to char, the first element specifically. So then when
the function is over, is the pointer destroyed?
This is The Rule about arrays and pointers in C:

In a value context, an object of type "array N of T" becomes
a value of type "pointer to T", pointing to the first element
of that array, i.e., the one with subscript zero.

The compiler has to *produce* this pointer, often using a single
machine instruction. The array itself is a C object -- something
occupying memory, and (we can hope) holding some useful values --
but the pointer the C compiler comes up with is a mere "value"
(an "rvalue", in typical computer-science lingo).

The Rule is yet another arbitrary rule, something else you have to
memorize about C. It is *so* important, and used so often, though,
that it is not "just" another rule, it is *The* Rule: The Rule
about arrays and pointers in C. Memorize it, work with it, until
it seems natural, and then all this pointer stuff in C will start
to make sense.

Note that The Rule applies only to *objects* in *value contexts*.
You have to be able to distinguish between objects and values, and
spot the contexts, but this is pretty easy if you have done any C
programming, or even much programming in other languages. If you
have statements like:

a = 17;
b = a + 25;

you know that "a" gets set to 17, and "b" gets set to 42. But how
is it that we *set* "a" on the first line, then *get* its value to
add 25 to it on the second line? The answer is "object" vs "value"
contexts. The "a =" part means "set a" -- set the object. The
"17" part just means "the value 17". The "b =" part means we will
set "b" (the object), and the "a + 25" part means we will fetch
the value in "a" and add 25.

Most of these contexts are obvious -- the left side of an "="
operator is an "object context", and the right side is a "value
context". C has a lot of operators, though, and there are two
important ones that have "object context": the unary-& operator,
which takes the address of an object, and the sizeof operator,
which produces the size (in "C bytes") of an object.

Most of the other operators have value context, and if you name an
object, such as an ordinary variable, you get the object's value.
For ordinary "int"s and "double"s and such, the value of the object
is whatever value you last stored in the variable. For arrays,
the value is that produced by The Rule.

This reminds me of scalar and list context in Perl. Sort of. :) Not as far
as what each context means, but just the different contexts and how a
'variable' behaves/is treated differently based on how it is being used. I
can get that, for the most part; I'm sure there are tricky ones. But that
still doesn't clarify my confusion over objects, lvaues, identifiers, and
variables.

For instance, what is an example of an object that is not named? The
pointer produced by "The Rule?"
You can either use this value right away -- printing it out, or
applying some operator to it, for instance -- or you can store it
in an object. Consider what happens if we choose to store the
value the compiler produces when we apply The Rule to "arr5".

Remember that "arr5" has type "array 100 of pointer to char":
char *arr5[100];
and that The Rule says:
In a value context,
(yep, got one of those)
an object of type "array N
(check -- that is what we have; N here is 100)
of T"
(and T is "pointer to char")
becomes a value of type "pointer to T",
(pointer to pointer to char)
pointing to the first element of the array, i.e., the one
with subscript zero.

So if we want to store this value in an object, we need one of
type "pointer to pointer to char", or "char **":

char **holder;

holder = arr5;

(Note that the array's size -- the constant named N, 100 in this
case -- gets throw away. We are allowed to ignore it when working
with The Rule. It is a darn good idea to save it away somewhere
else, though, because if the array has 100 elements, we had better
not write over arr5[231], which does not exist. The Rule tosses
the constant, so in practical code, *we* have to save it -- the
language threw it away, but it really does matter. For now, we
will ignore it, and perhaps cross fingers, toes, and/or eyes and
hope we do not use an out-of-bounds array subscript later. Or
maybe we will occasionally check, remembering the size is 100.)

Now "holder" stores the value produced by The Rule: a value of
type "pointer to pointer to char", pointing to the first element
of arr5 -- &arr5[0], in other words.

This is where things get interesting. Suppose we now want to use
the value with the subscript operators, or with ordinary pointer
arithmetic. It may be time to remember that subscripts are in fact
defined in terms of pointer arithmetic, and the unary "*"
pointer-following operator:

a

"means":

*((a) + (i))

where the addition uses pointer arithmetic. We fetch the values
of the two operands -- the array "a" and the index "i" -- and add
them, then we use pointer-follower-"*" to find the object in that
slot in the array.

But wait! I just said "we use the value of the array"! There it
goes again: The Rule tells us how to find the "value" of an array
object in a value context. If "a" is an array, The Rule says that
we find its value by dropping the constant N, and then taking the
address of its first element.

This is exactly what we did when we stuck the "value" of arr5 into
the variable named "holder"! If "holder" holds the value that
gets produced by The Rule, what difference is there between:

arr5 /* i.e., *((arr5) + (i)) */

and:

holder /* i.e., *((holder) + (i)) */

? The answer is: there is no significant difference at all --
"arr5" has The Rule applied, producing a value, but "holder" is
used in a value context, pulling the *same* value out of it. The
only real difference is in the machine instruction(s) used to create
the value the first time (by applying The Rule), or to pull the
value out of the "holder" variable.

Of course, if we use *different* operators, we can get different
results. In particular, the sizeof operator has "object context"
instead of "value context":

size1 = sizeof arr5;
size2 = sizeof holder;

will set size1 to some fairly large number (like 200, 400, or 800)
and size2 to some small number (like 2, 4, or 8) on today's machines;
so "arr5" and "holder" really are very different. Their *values*
are the same, though, due to The Rule, and to us setting "holder"
to the value The Rule creates for "arr5".

In short: arrays are not pointers; but, due to The Rule, the "value"
of an array *is* a pointer, so a pointer is "just as good" as the
actual array, if you only want the value. (But the pointer *MUST*
be set to the right value first! Arrays are collections of lots
of objects -- N of them, where N is the size in the array definition
-- while a pointer is just *one* object. To use it like an array,
you have to point it at enough memory to hold all N objects.)


Everything between this and my last comment I'll have to read some more and
think about some more. I'm getting it, but you know. (It's getting
late....for me; work comes early at 6:15am) But anyway, that was a lot of
good stuff. :) But I may have questions about it later :). If you're
still paying attention by then.

[I probably should have snipped some of the above, but I didn't know if you
wanted to refer to any of it for whatever reason, so I figured it'd be
easier if I just left it]
[skipping a bunch of stuff]
any string that's in quotes in a C program gets stored in a read-only
part of your program when it's running ...

Well, maybe: it *may* be read-only, and as a programmer, you are
expected not to write on it. If you *do* write on it, the language
makes no promises; things can go very wrong. So you should treat
it as if it is read-only, even if it happens not to be.

Also, this is not true for "any" string, just "most" of them (as
you noted later, in something I snipped later).
so the line:

char *str = "Confused";

gets coppied inot the read-only memory as soon as your program sees it,
then assigns that address to str, which is why you can't change strings
like that.

More precisely, string literals -- those things inside double quotes
-- are a shorthand for creating anonymous arrays of "char". These
arrays may be stored in read-only areas (and high-quality C compilers
should strive to make sure they are). There is an exception for
string literals used to initialize arrays of "char"s, so that:

char buf[] = "hello";

is not required to create one of those anonymous arrays (you have
a perfectly good, non-anonymous, array named "buf"; why bother with
two copies of the character sequence 'h' 'e' 'l' 'l' 'o' '\0'?).
But other cases do create the anonymous arrays.

Logically speaking, these anonymous arrays *should* have type
"const char [N]":

const char __internal_string_00000[9] =
{ 'C', 'o', 'n', 'f', 'u', 's', 'e', 'd', '\0' };

because they are supposed to be read-only; but for historical
compatibility with C from the 1970s and 1980s, the "const" is
left off of the type. The string literal "Confused" thus has
type "array 9 of char" -- 9 because there are nine bytes inside
it, counting the terminating '\0' -- instead of "array 9 of
const char".

Since this is an array, it -- like all arrays -- is once again
subject to The Rule. You can write:

str = "Confused"; /* just like: str = __internal_string_00000; */

to make str point to the uppercase 'C' -- the first element
of the array.

Every string literal (except those that initialize arrays of char)
generates, inside the compiler, another one of these
"__internal_string_01234[]" style arrays, and every one of these
arrays is another candidate for The Rule.

Are these candidates for objects that aren't 'named'? (For instance like
the 'int sz = sizeof "This is a string"' below, that I snipped) If so,
what about:

const char buf[] = "This is a string literal";

Is "This is a string literal" an object? What about buf? Isn't that an
lvalue referring to the object, i.e. naming it? Is there an internal
representation for "This is a string literal" (internal name) *and* my own
name (lvalue) for it?

What about 'int i = 15'? Is 15 an object and i an lvalue? :)

[snipped more good stuff]



Thanks for any help, and patience -
jab3
 
C

Chris Torek

I do not have time to answer all of this now, but I will put in two
short answers... (well, short-ish; they got longer than I expected.)

[I wrote]
Is "var" an identifier, an object, an lvalue, or a variable? :)

All three, in fact.

The name -- the three-letter sequence v, a, r -- is an identifier.
(This is a syntactic element, i.e., something the compiler uses to
figure out what you wrote. Each token is a syntactic element of
some sort; some tokens are identifiers, like the keyword "char",
some are single character thingies like the '*'; some are two-character
thingies like an && operator. This particular syntactic element
is an identifier.)

The compiler must look up the identifier to see how it is declared
and/or defined. If it is defined as, for instance, a typedef-name
-- such as the ST_TYPE in:

typedef struct st ST_TYPE;

-- then it would be an identifier, but not a variable or lvalue.
But here, it has now been declared (and also defined, eventually)
as a variable:

char **var;

so it is a variable. Identifiers have a bunch of properties, such
as scopes and name-spaces, and a single identifier can actually
have multiple meanings, as in the (really awful) code:

void x(void) {
int x;
goto x;
y:
x += 17;
printf("the answer is %d\n", x);
return;
x:
x = 25;
goto y;
}

Here the single identifier "x" has three different meanings: it is
the name of the function x(), it is the name of a variable of type
int also called x, and it is a goto-label just like "y". (Yuck!)

C99 has kind of mucked up the word "lvalue", which was pretty well
defined in C89; but it is safe to say that all ordinary variables
are lvalues. Even array variables are still lvalues, except that,
confusingly enough, they are "non-modifiable" lvalues. (The term
lvalue dates back to compiler guys saying "the thing on the left
of an assignment", so if you cannot put an array on the left of an
assignment -- because the array is not modifiable -- then why call
it an lvalue at all? Probably it was a bad idea, just like us
USAliens using the word "gas" to refer to both petrol and methane.
But, as Kurt Vonnegut wrote, so it goes.)
... but what is the
situation with lvalue and variable and identifier and object?

The kind of problem we want to solve, by using different words like
"lvalue" and "identifier" and "object", is to be able to talk about
what *p or p means when p has a value from malloc():

char *p;

p = malloc(len + 1);
if (p == NULL) ... handle error ...
strcpy(p, str);

The strcpy() writes on various p's, e.g., setting p[0] to 'h'
and p[1] to 'e' and so on to put "hello world" into it. These
p's must be storage, but it is, at least in how we can talk
about it, *different* from that for, e.g.:

char buf[100];
p = &buf[0];
strcpy(p, str);

because in this second case we know that p[0] is the same thing as
buf[0], and so on. When the memory comes from malloc(), p[0] has
no other name like buf[0] -- but it is still memory; it can still
hold values. I call p[0] an object (and so does both C89 and C99).
What about 'int i = 15'? Is 15 an object and i an lvalue? :)

15 is not an object, it is just a value. Objects hold values (or
hold garbage); values are the things you stick into objects. The
name "i" is an identifier that, in this case, names the object;
the C standards (both C89 and C99) say it is indeed an lvalue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top