char **argv & char *argv[]

Discussion in 'C Programming' started by jab3, Dec 4, 2004.

  1. jab3

    jab3 Guest

    (again :))

    Hello everyone.

    I'll ask this even at risk of being accused of not researching adequately.
    My question (before longer reasoning) is: How does declaring (or defining,
    whatever) a variable **var make it an array of pointers?

    I realize that 'char **var' is a pointer to a pointer of type char (I hope).
    And I realize that with var[], var is actually a memory address (or at
    least as it is represented by C, IIRC (an internal copy which is a fixed
    pointer)) pointing (permanently) to the first element of an array. And I
    realize that *var[] is an array of pointers where each pointer can point to
    the beginning of a string (or whatever). But then there is **var. How
    does that then become an array of pointers?

    Hmmm. It's coming to me. Wait. So we declare 'char **var'. *var
    is/contains a memory address of size char, which could point to the
    beginning of a string (**var). (?) *var+1 would be the next char memory
    address, which could point to a string (*(*var+1)) (moving char bytes
    through memory (1)). That's my hangup. How is the **argv structure
    formed? Are the arguments added, then memory allocated for that many, then
    dividing them up across the argv variable? Because I've learned to accept
    that **argv points to a string, *(*argv+1) points to the next, etc. (Are
    the () necessary? Am I even right?) But how does it get that way? I feel
    like I'm almost to a satori experience with this aspect of pointers (which
    would be nice :)), but there's something holding me back (my mind maybe?).
    I think I just need to get a grasp of the mechanics behind the creation of
    argv. (Don't ask; whenever I'm studying pointers I get stuck on these
    issues and I can't stop thinking about how, so I become unable to wrap my
    head around it)

    Where does the program store the arguments before putting them in argv? Is
    there a buffer it puts each argument in, then copies it into argv? It's
    driving me crazy. (Similar to how passing a pointer to printf with %s
    (char *str = "Confused";printf("%s", str);) is the same as a string. How
    does it (the compiler, program, ??) know? I then figured when it receives
    the memory address, expecting a string, it dereferences the pointer,
    traversing it until it gets a '\0'? Close?) Although it actually just hit
    me that if I were to pass a normal string variable (char str[6] = "idiot")
    as 'printf("hello, %s", str)' then str is actually a pointer to the first
    element of str[6]. Ahhhh... :)

    I realize that perhaps the argv example is implementation specific and not
    topical. Perhaps you could imagine a similar situation, i.e. passing a
    **var in a function that is in fact an array of pointers. Is the **var
    construction often used without being an array of pointers? Also, why is
    it technically more accurate to define argv as **argv and not *argv[]?
    (according to a book I have, Linux Programming by Example).


    Please excuse the rambling. I know I'm not being very clear. There's a
    reason for that; hence the post :). Thanks for any help or guidance, and
    patience.

    -jab3
    jab3, Dec 4, 2004
    #1
    1. Advertising

  2. jab3

    Yan Guest

    (please those that see errors in my answers, point them out)

    jab3 wrote:
    > (again :))
    >
    > Hello everyone.
    >
    > I'll ask this even at risk of being accused of not researching adequately.
    > My question (before longer reasoning) is: How does declaring (or defining,
    > whatever) a variable **var make it an array of pointers?
    >
    > I realize that 'char **var' is a pointer to a pointer of type char (I hope).
    > And I realize that with var[], var is actually a memory address (or at
    > least as it is represented by C, IIRC (an internal copy which is a fixed
    > pointer)) pointing (permanently) to the first element of an array. And I
    > realize that *var[] is an array of pointers where each pointer can point to
    > the beginning of a string (or whatever). But then there is **var. How
    > does that then become an array of pointers?


    In the declaration 'char var[];' var, when used by itself is just a
    pointer to the first element, and when you access the first or second
    element your compiler even turns that var[0] into *(var+0) and var[1]
    into *(var+1), literally using the number you gave it as the offset. So
    you can think of var[] being sorta equal to *var so *var[] can sorta
    equal to **var.. I don't know if im making too much sense, but check out
    K&R's book, it explains it well

    >
    > Hmmm. It's coming to me. Wait. So we declare 'char **var'. *var
    > is/contains a memory address of size char, which could point to the
    > beginning of a string (**var). (?) *var+1 would be the next char memory
    > address, which could point to a string (*(*var+1)) (moving char bytes
    > through memory (1)). That's my hangup. How is the **argv structure
    > formed? Are the arguments added, then memory allocated for that many, then
    > dividing them up across the argv variable? Because I've learned to accept
    > that **argv points to a string, *(*argv+1) points to the next, etc. (Are
    > the () necessary?


    yeah they are necessary because + is of lower precedence than *

    > Am I even right?) But how does it get that way? I feel
    > like I'm almost to a satori experience with this aspect of pointers (which
    > would be nice :)), but there's something holding me back (my mind maybe?).
    > I think I just need to get a grasp of the mechanics behind the creation of
    > argv. (Don't ask; whenever I'm studying pointers I get stuck on these
    > issues and I can't stop thinking about how, so I become unable to wrap my
    > head around it)


    really check out K&R's book and go to the chapter on pointers, it's
    really of great help

    >
    > Where does the program store the arguments before putting them in argv?


    When a process is created (at least under unix) the first thing that's
    put on the stack is your program's activation record, your environmental
    variables, your arguments and your count of args, thus when you pop them
    off the stack one by one (as by the standard calling declaration) you
    take the count and the args. that's done by the operating system.

    > Is
    > there a buffer it puts each argument in, then copies it into argv? It's
    > driving me crazy. (Similar to how passing a pointer to printf with %s
    > (char *str = "Confused";printf("%s", str);) is the same as a string.


    any string that's in quotes in a C program gets stored in a read-only
    part of your program when it's running, so the line:

    char *str = "Confused";

    gets coppied inot the read-only memory as soon as your program sees it,
    then assigns that address to str, which is why you can't change strings
    like that. when you call printf() with that str pointer as one of the
    args, it simply goes to that location in read-only memory and reads it.


    How
    > does it (the compiler, program, ??) know? I then figured when it receives
    > the memory address, expecting a string, it dereferences the pointer,
    > traversing it until it gets a '\0'? Close?


    yup that's how c does strings

    ) Although it actually just hit
    > me that if I were to pass a normal string variable (char str[6] = "idiot")


    now saying:

    char str[6] = "idiot";

    is different from what i said above since in that statement you declare
    an array of chars of length 6 and you assign the string "idiot" to it,
    read: writeable memory, that statement is syntatically equivalent to:

    char str[6] = { 'i', 'd', 'i', 'o', 't', '\0' };


    > as 'printf("hello, %s", str)' then str is actually a pointer to the first
    > element of str[6]. Ahhhh... :)


    so as i said above str by itself is just a pointer the the location of
    the first char, just as it was in constant string just like it was in an
    array as i mentioned first thing in the response, so to the printf
    statement it pretty much looks like the same thing

    >
    > I realize that perhaps the argv example is implementation specific and not
    > topical. Perhaps you could imagine a similar situation, i.e. passing a
    > **var in a function that is in fact an array of pointers. Is the **var
    > construction often used without being an array of pointers?


    Also, why is
    > it technically more accurate to define argv as **argv and not *argv[]?
    > (according to a book I have, Linux Programming by Example).
    >


    its more accurate because in your system, argv is exactly that, a
    pointer to a pointer, the "first dereferencing" gives the address of the
    pointer to where the first string is (in argv's case, your program's
    name), then the next dereferencing (**argv) would point to the actual
    first letter in the first string, (*(*argv+1) would point to the first
    letter of the second string, etc)

    >
    > Please excuse the rambling. I know I'm not being very clear. There's a
    > reason for that; hence the post :). Thanks for any help or guidance, and
    > patience.
    >
    > -jab3
    >
    Yan, Dec 4, 2004
    #2
    1. Advertising

  3. jab3

    CBFalconer Guest

    jab3 wrote:
    >
    > I'll ask this even at risk of being accused of not researching
    > adequately. My question (before longer reasoning) is: How does
    > declaring (or defining, whatever) a variable **var make it an
    > array of pointers?


    It doesn't. It makes it a variable holding a pointer to some other
    type of pointer. The confusion arises because this is exactly what
    you get when you pass an array of those pointers to a function. A
    passed array is represented by a pointer to its zeroth element.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Dec 4, 2004
    #3
  4. jab3

    Chris Torek Guest

    >jab3 wrote:
    >>I'll ask this even at risk of being accused of not researching
    >>adequately. My question (before longer reasoning) is: How does
    >>declaring (or defining, whatever) a variable **var make it an array
    >>of pointers?


    The short answer is, "it does not".

    >>I realize that 'char **var' is a pointer to a pointer of type char
    >>(I hope).


    More precisely, "char **var" declares "var" as a variable of *type*
    "pointer to pointer to char". Whether "var" actually points to
    anything at all (much less "anything useful") is up to you, the
    programmer.

    >> And I realize that with var[], var is actually a memory address (or at
    >> least as it is represented by C, IIRC (an internal copy which is a fixed
    >> pointer)) pointing (permanently) to the first element of an array.


    This is also wrong, or at least, not quite right. :)

    There are some "gotchas" with array declarations that do not occur
    with pointers, so we have to start adding more context. If we write,
    for instance:

    int arr1[8] = { 1, 2, 3, 0 };

    outside of a function, or inside a block, we have both declared
    and defined "arr1" as a variable of type "array 8 of int" (to use
    the "cdecl" program's syntax). Because we initialized the array,
    we can omit the size, and have the compiler figure it out:

    int arr2[] = { 1, 2, 3, 0 };

    but now we get an "array 4 of int", because we only used four
    initializers.

    On the other hand, we have a peculiar feature of the C language in
    which function parameters that *look like* arrays are actually
    declared as pointers. If we write:

    void somefunc(char s[]) {
    /* code */
    }

    the compiler is obligated to pretend that we actually wrote:

    void somefunc(char *s) {
    /* code */
    }

    That is, the local-variable "s" within the function somefunc() has
    type "pointer to char", rather than "array MISSING_SIZE of char".

    The reason for this peculiar feature has to do with what I call
    "The Rule" about arrays and pointers in C, combined with the fact
    that C passes arguments by value. For (much) more about The Rule,
    see <http://web.torek.net/torek/c/pa.html>.

    Except for some new features in C99 intended for optimization,
    there is never any reason you *have* to use the array notation to
    declare formal parameter names in function definitions, and I
    encourage programmers to use the pointer notation, so that the
    declaration is not misleading: since "s" inside somefunc() has type
    "char *", we should all declare it as "char *" in the first place.

    Ever since the C89 standard came out, something peculiar happens
    if we write:

    int arr3[];

    outside a function. This is a "tentative definition" of the
    array "arr3", and if we reach the end of a translation unit (roughly,
    "C source file") without coming across any more details for arr3[],
    it acts as if we had written:

    int arr3[1] = { 0 };

    On the other hand, though, if we try to use empty square brackets
    *inside* a function (not as a parameter but inside the {}s):

    void wrong(void) {
    int arr4[]; /* ERROR */
    /* more stuff */
    }

    we have done something wrong. Empty square brackets are not allowed
    here.

    Finally, C99 has something called a "flexible array member" of
    structures, which we can ignore for now, but does give you one more
    place where you can write empty square brackets and have it mean
    something special.

    All of these are just things you have to memorize -- quirks about
    C that "are just the way they are": not for any particular reason
    other than that Dennis Ritchie and/or the C standards folks said
    so. They all make it a little more tricky to talk about arrays in
    C.

    >>And I realize that *var[] is an array of pointers where each
    >>pointer can point to the beginning of a string (or whatever).


    If it is indeed an array at all -- for instance, if we write:

    char *arr5[100];

    either outside or inside a function (not as a parameter to a
    function), then arr5 has type "array 100 of pointer to char", and
    each of those 100 "pointer to char"s can point to the first of a
    sequence of "char"s making up a C string.

    >>But then there is **var. How does that then become an array
    >>of pointers?


    Again, "it does not"...

    In article <Ckbsd.218265$>
    Yan <> wrote:
    >In the declaration 'char var[];' var, when used by itself is just a
    >pointer to the first element ...


    I think it is better to say that it *becomes* a pointer to the
    first element of the array.

    This is The Rule about arrays and pointers in C:

    In a value context, an object of type "array N of T" becomes
    a value of type "pointer to T", pointing to the first element
    of that array, i.e., the one with subscript zero.

    The compiler has to *produce* this pointer, often using a single
    machine instruction. The array itself is a C object -- something
    occupying memory, and (we can hope) holding some useful values --
    but the pointer the C compiler comes up with is a mere "value"
    (an "rvalue", in typical computer-science lingo).

    The Rule is yet another arbitrary rule, something else you have to
    memorize about C. It is *so* important, and used so often, though,
    that it is not "just" another rule, it is *The* Rule: The Rule
    about arrays and pointers in C. Memorize it, work with it, until
    it seems natural, and then all this pointer stuff in C will start
    to make sense.

    Note that The Rule applies only to *objects* in *value contexts*.
    You have to be able to distinguish between objects and values, and
    spot the contexts, but this is pretty easy if you have done any C
    programming, or even much programming in other languages. If you
    have statements like:

    a = 17;
    b = a + 25;

    you know that "a" gets set to 17, and "b" gets set to 42. But how
    is it that we *set* "a" on the first line, then *get* its value to
    add 25 to it on the second line? The answer is "object" vs "value"
    contexts. The "a =" part means "set a" -- set the object. The
    "17" part just means "the value 17". The "b =" part means we will
    set "b" (the object), and the "a + 25" part means we will fetch
    the value in "a" and add 25.

    Most of these contexts are obvious -- the left side of an "="
    operator is an "object context", and the right side is a "value
    context". C has a lot of operators, though, and there are two
    important ones that have "object context": the unary-& operator,
    which takes the address of an object, and the sizeof operator,
    which produces the size (in "C bytes") of an object.

    Most of the other operators have value context, and if you name an
    object, such as an ordinary variable, you get the object's value.
    For ordinary "int"s and "double"s and such, the value of the object
    is whatever value you last stored in the variable. For arrays,
    the value is that produced by The Rule.

    You can either use this value right away -- printing it out, or
    applying some operator to it, for instance -- or you can store it
    in an object. Consider what happens if we choose to store the
    value the compiler produces when we apply The Rule to "arr5".

    Remember that "arr5" has type "array 100 of pointer to char":
    char *arr5[100];
    and that The Rule says:
    In a value context,
    (yep, got one of those)
    an object of type "array N
    (check -- that is what we have; N here is 100)
    of T"
    (and T is "pointer to char")
    becomes a value of type "pointer to T",
    (pointer to pointer to char)
    pointing to the first element of the array, i.e., the one
    with subscript zero.

    So if we want to store this value in an object, we need one of
    type "pointer to pointer to char", or "char **":

    char **holder;

    holder = arr5;

    (Note that the array's size -- the constant named N, 100 in this
    case -- gets throw away. We are allowed to ignore it when working
    with The Rule. It is a darn good idea to save it away somewhere
    else, though, because if the array has 100 elements, we had better
    not write over arr5[231], which does not exist. The Rule tosses
    the constant, so in practical code, *we* have to save it -- the
    language threw it away, but it really does matter. For now, we
    will ignore it, and perhaps cross fingers, toes, and/or eyes and
    hope we do not use an out-of-bounds array subscript later. Or
    maybe we will occasionally check, remembering the size is 100.)

    Now "holder" stores the value produced by The Rule: a value of
    type "pointer to pointer to char", pointing to the first element
    of arr5 -- &arr5[0], in other words.

    This is where things get interesting. Suppose we now want to use
    the value with the subscript operators, or with ordinary pointer
    arithmetic. It may be time to remember that subscripts are in fact
    defined in terms of pointer arithmetic, and the unary "*"
    pointer-following operator:

    a

    "means":

    *((a) + (i))

    where the addition uses pointer arithmetic. We fetch the values
    of the two operands -- the array "a" and the index "i" -- and add
    them, then we use pointer-follower-"*" to find the object in that
    slot in the array.

    But wait! I just said "we use the value of the array"! There it
    goes again: The Rule tells us how to find the "value" of an array
    object in a value context. If "a" is an array, The Rule says that
    we find its value by dropping the constant N, and then taking the
    address of its first element.

    This is exactly what we did when we stuck the "value" of arr5 into
    the variable named "holder"! If "holder" holds the value that
    gets produced by The Rule, what difference is there between:

    arr5 /* i.e., *((arr5) + (i)) */

    and:

    holder /* i.e., *((holder) + (i)) */

    ? The answer is: there is no significant difference at all --
    "arr5" has The Rule applied, producing a value, but "holder" is
    used in a value context, pulling the *same* value out of it. The
    only real difference is in the machine instruction(s) used to create
    the value the first time (by applying The Rule), or to pull the
    value out of the "holder" variable.

    Of course, if we use *different* operators, we can get different
    results. In particular, the sizeof operator has "object context"
    instead of "value context":

    size1 = sizeof arr5;
    size2 = sizeof holder;

    will set size1 to some fairly large number (like 200, 400, or 800)
    and size2 to some small number (like 2, 4, or 8) on today's machines;
    so "arr5" and "holder" really are very different. Their *values*
    are the same, though, due to The Rule, and to us setting "holder"
    to the value The Rule creates for "arr5".

    In short: arrays are not pointers; but, due to The Rule, the "value"
    of an array *is* a pointer, so a pointer is "just as good" as the
    actual array, if you only want the value. (But the pointer *MUST*
    be set to the right value first! Arrays are collections of lots
    of objects -- N of them, where N is the size in the array definition
    -- while a pointer is just *one* object. To use it like an array,
    you have to point it at enough memory to hold all N objects.)

    [skipping a bunch of stuff]

    >any string that's in quotes in a C program gets stored in a read-only
    >part of your program when it's running ...


    Well, maybe: it *may* be read-only, and as a programmer, you are
    expected not to write on it. If you *do* write on it, the language
    makes no promises; things can go very wrong. So you should treat
    it as if it is read-only, even if it happens not to be.

    Also, this is not true for "any" string, just "most" of them (as
    you noted later, in something I snipped later).

    >so the line:
    >
    > char *str = "Confused";
    >
    >gets coppied inot the read-only memory as soon as your program sees it,
    >then assigns that address to str, which is why you can't change strings
    >like that.


    More precisely, string literals -- those things inside double quotes
    -- are a shorthand for creating anonymous arrays of "char". These
    arrays may be stored in read-only areas (and high-quality C compilers
    should strive to make sure they are). There is an exception for
    string literals used to initialize arrays of "char"s, so that:

    char buf[] = "hello";

    is not required to create one of those anonymous arrays (you have
    a perfectly good, non-anonymous, array named "buf"; why bother with
    two copies of the character sequence 'h' 'e' 'l' 'l' 'o' '\0'?).
    But other cases do create the anonymous arrays.

    Logically speaking, these anonymous arrays *should* have type
    "const char [N]":

    const char __internal_string_00000[9] =
    { 'C', 'o', 'n', 'f', 'u', 's', 'e', 'd', '\0' };

    because they are supposed to be read-only; but for historical
    compatibility with C from the 1970s and 1980s, the "const" is
    left off of the type. The string literal "Confused" thus has
    type "array 9 of char" -- 9 because there are nine bytes inside
    it, counting the terminating '\0' -- instead of "array 9 of
    const char".

    Since this is an array, it -- like all arrays -- is once again
    subject to The Rule. You can write:

    str = "Confused"; /* just like: str = __internal_string_00000; */

    to make str point to the uppercase 'C' -- the first element
    of the array.

    Every string literal (except those that initialize arrays of char)
    generates, inside the compiler, another one of these
    "__internal_string_01234[]" style arrays, and every one of these
    arrays is another candidate for The Rule. (Identical string literals
    may or may not reuse an already-internally-generated array -- this
    part is up to the compiler. Note also that "Hel-LO, World", and
    "O, World" could actually share a single array, if the compiler is
    sufficiently clever/sneaky, because they both *end* with the same
    sequence.)

    >when you call printf() with that str pointer as one of the
    >args, it simply goes to that location in read-only memory and reads it.


    Indeed, every time you pass a string literal to printf() for the
    format argument:

    printf("str is `%s'\n", str);

    you create another one of these anonymous arrays and apply The
    Rule. (See how often The Rule gets used? Almost every printf()
    in every C program has at least one occurrence.)

    Note that the anonymous array *is* still an array, not a pointer;
    if you apply one of the "object context" operators to it, it stays
    an array. In particular, if you apply the sizeof operator, you
    should get the size of the array, *not* the size of a pointer:

    int sz = sizeof "the anonymous array for this string literal";

    *must* set sz to 44 (if I counted right) -- 43 characters plus the
    terminating '\0'. Likewise, we can even do weird things like:

    char (*p1)[44] = &"the anonymous array for this string literal";

    The "&" operator produces the address of the anonymous array,
    just as if we had written out a named array:

    char some_name[44] = "...";
    char (*p2)[44] = &some_name;

    except that the entire array to which "p1" points is read-only,
    despite not being const-qualified. (We could const-qualify the
    some_name array and p2, and even const-qualify p1 except for
    some brokenness in C's type system. This brokenness is what you
    get when a committee designs the thing. :) )

    (Some C compilers have gotten string literals wrong, historically,
    producing pointers instead of arrays. There are only two ways to
    tell in legitimate C code, using sizeof and unary "&". That is,
    only the tricks shown here with "sz" and "p1" will expose the
    difference, because The Rule is so darn ubiquitous.)
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Dec 5, 2004
    #4
  5. jab3

    jab3 Guest

    Chris Torek graciously wrote on Sunday 05 December 2004 05:12 pm:

    >>jab3 wrote:
    >>>I'll ask this even at risk of being accused of not researching
    >>>adequately. My question (before longer reasoning) is: How does
    >>>declaring (or defining, whatever) a variable **var make it an array
    >>>of pointers?

    >
    > The short answer is, "it does not".
    >


    So I hear :).

    >>>I realize that 'char **var' is a pointer to a pointer of type char
    >>>(I hope).

    >
    > More precisely, "char **var" declares "var" as a variable of *type*
    > "pointer to pointer to char". Whether "var" actually points to
    > anything at all (much less "anything useful") is up to you, the
    > programmer.
    >


    Is "var" an identifier, an object, an lvalue, or a variable? :) Seriously,
    it could also be a value in certain contexts I see, but what is the
    situation with lvalue and variable and identifier and object? I see in
    K&R2 that an object is a "named region of storage," and an lvalue is an
    "expression referring to an object." (197; I don't have the C Standard)
    Then it says an identifier is a sequence of letters and digits. (192) Then
    in your C for Smarties (which is good BTW; I'll have to digest it some
    more), at first you say lvalues and objects are the same (the former being
    ISO's term and the latter yours, sort of :)), but then you clarify it by
    saying that an lvalue names an object, which is how I see K&R2. Then you
    say variables are the best examples of objects. Is that the name or the
    location/storage? And where to identifiers fit in? :) Am I right to think
    of objects, strictly speaking, as the hardware location of 'stuff'? And
    the lvalue is the name I've given that 'stuff,' for instance char stf[] =
    "blah". 'stf' is the lvalue, and its location in memory is the object? I
    think the more I type the more I confuse myself :).

    >>> And I realize that with var[], var is actually a memory address (or at
    >>> least as it is represented by C, IIRC (an internal copy which is a fixed
    >>> pointer)) pointing (permanently) to the first element of an array.

    >
    > This is also wrong, or at least, not quite right. :)


    That IIRC comment was an attempt at remembering your article about "The
    Rule" I read a couple of months ago, but I didn't have the
    conceptual...framework to process it (I didn't read the previous 3 articles
    about types and objects and values and contexts, etc. then) Oh well.

    >
    > There are some "gotchas" with array declarations that do not occur
    > with pointers, so we have to start adding more context. If we write,
    > for instance:
    >
    > int arr1[8] = { 1, 2, 3, 0 };
    >
    > outside of a function, or inside a block, we have both declared
    > and defined "arr1" as a variable of type "array 8 of int" (to use
    > the "cdecl" program's syntax). Because we initialized the array,
    > we can omit the size, and have the compiler figure it out:
    >
    > int arr2[] = { 1, 2, 3, 0 };
    >
    > but now we get an "array 4 of int", because we only used four
    > initializers.
    >
    > On the other hand, we have a peculiar feature of the C language in
    > which function parameters that *look like* arrays are actually
    > declared as pointers. If we write:
    >
    > void somefunc(char s[]) {
    > /* code */
    > }
    >
    > the compiler is obligated to pretend that we actually wrote:
    >
    > void somefunc(char *s) {
    > /* code */
    > }
    >
    > That is, the local-variable "s" within the function somefunc() has
    > type "pointer to char", rather than "array MISSING_SIZE of char".
    >
    > The reason for this peculiar feature has to do with what I call
    > "The Rule" about arrays and pointers in C, combined with the fact
    > that C passes arguments by value.


    Ahh...That's why "The Rule" is effected. The argument is in a value
    context, and C stipulates that the value of an array is a pointer to its
    first element, so "The Rule" happens (close?).

    > Except for some new features in C99 intended for optimization,
    > there is never any reason you *have* to use the array notation to
    > declare formal parameter names in function definitions, and I
    > encourage programmers to use the pointer notation, so that the
    > declaration is not misleading: since "s" inside somefunc() has type
    > "char *", we should all declare it as "char *" in the first place.


    Ah, I see. (At least I think I do. Right now. Tonight :))

    > Ever since the C89 standard came out, something peculiar happens
    > if we write:
    >
    > int arr3[];
    >
    > outside a function. This is a "tentative definition" of the
    > array "arr3", and if we reach the end of a translation unit (roughly,
    > "C source file") without coming across any more details for arr3[],
    > it acts as if we had written:
    >
    > int arr3[1] = { 0 };
    >
    > On the other hand, though, if we try to use empty square brackets
    > *inside* a function (not as a parameter but inside the {}s):
    >
    > void wrong(void) {
    > int arr4[]; /* ERROR */
    > /* more stuff */
    > }
    >
    > we have done something wrong. Empty square brackets are not allowed
    > here.


    (BTW, what _is_ a translation unit? I see it used in the K&R2 Appendix A,
    and I see it here, but I couldn't find that K&R2 defined what it meant.
    They just say "a program consists of one or more _translation units_ stored
    in files." (191) Granted, I haven't made it through the book yet. Just
    skipped to that Appendix :))

    So why is it wrong to declare an 'incomplete' type inside a function?

    > Finally, C99 has something called a "flexible array member" of
    > structures, which we can ignore for now, but does give you one more
    > place where you can write empty square brackets and have it mean
    > something special.
    >
    > All of these are just things you have to memorize -- quirks about
    > C that "are just the way they are": not for any particular reason
    > other than that Dennis Ritchie and/or the C standards folks said
    > so. They all make it a little more tricky to talk about arrays in
    > C.
    >
    >>>And I realize that *var[] is an array of pointers where each
    >>>pointer can point to the beginning of a string (or whatever).

    >
    > If it is indeed an array at all -- for instance, if we write:
    >
    > char *arr5[100];
    >
    > either outside or inside a function (not as a parameter to a
    > function), then arr5 has type "array 100 of pointer to char", and
    > each of those 100 "pointer to char"s can point to the first of a
    > sequence of "char"s making up a C string.
    >


    Umm...I think that's what I meant :).


    >>>But then there is **var. How does that then become an array
    >>>of pointers?

    >
    > Again, "it does not"...
    >
    > In article <Ckbsd.218265$>
    > Yan <> wrote:
    >>In the declaration 'char var[];' var, when used by itself is just a
    >>pointer to the first element ...

    >
    > I think it is better to say that it *becomes* a pointer to the
    > first element of the array.


    Yeah, that's what I didn't understand before this reply and further reading
    on your site. I had forgotten that "The Rule" is something that happens in
    certain situations; not something that is persistent. Right? I mean,
    let's say a function is called with a parameter of (char *str) but the
    argument passed is "char a_str[20]". So inside of the function, a_str
    'becomes' a pointer to char, the first element specifically. So then when
    the function is over, is the pointer destroyed?

    >
    > This is The Rule about arrays and pointers in C:
    >
    > In a value context, an object of type "array N of T" becomes
    > a value of type "pointer to T", pointing to the first element
    > of that array, i.e., the one with subscript zero.
    >
    > The compiler has to *produce* this pointer, often using a single
    > machine instruction. The array itself is a C object -- something
    > occupying memory, and (we can hope) holding some useful values --
    > but the pointer the C compiler comes up with is a mere "value"
    > (an "rvalue", in typical computer-science lingo).
    >
    > The Rule is yet another arbitrary rule, something else you have to
    > memorize about C. It is *so* important, and used so often, though,
    > that it is not "just" another rule, it is *The* Rule: The Rule
    > about arrays and pointers in C. Memorize it, work with it, until
    > it seems natural, and then all this pointer stuff in C will start
    > to make sense.
    >
    > Note that The Rule applies only to *objects* in *value contexts*.
    > You have to be able to distinguish between objects and values, and
    > spot the contexts, but this is pretty easy if you have done any C
    > programming, or even much programming in other languages. If you
    > have statements like:
    >
    > a = 17;
    > b = a + 25;
    >
    > you know that "a" gets set to 17, and "b" gets set to 42. But how
    > is it that we *set* "a" on the first line, then *get* its value to
    > add 25 to it on the second line? The answer is "object" vs "value"
    > contexts. The "a =" part means "set a" -- set the object. The
    > "17" part just means "the value 17". The "b =" part means we will
    > set "b" (the object), and the "a + 25" part means we will fetch
    > the value in "a" and add 25.
    >
    > Most of these contexts are obvious -- the left side of an "="
    > operator is an "object context", and the right side is a "value
    > context". C has a lot of operators, though, and there are two
    > important ones that have "object context": the unary-& operator,
    > which takes the address of an object, and the sizeof operator,
    > which produces the size (in "C bytes") of an object.
    >
    > Most of the other operators have value context, and if you name an
    > object, such as an ordinary variable, you get the object's value.
    > For ordinary "int"s and "double"s and such, the value of the object
    > is whatever value you last stored in the variable. For arrays,
    > the value is that produced by The Rule.


    This reminds me of scalar and list context in Perl. Sort of. :) Not as far
    as what each context means, but just the different contexts and how a
    'variable' behaves/is treated differently based on how it is being used. I
    can get that, for the most part; I'm sure there are tricky ones. But that
    still doesn't clarify my confusion over objects, lvaues, identifiers, and
    variables.

    For instance, what is an example of an object that is not named? The
    pointer produced by "The Rule?"

    > You can either use this value right away -- printing it out, or
    > applying some operator to it, for instance -- or you can store it
    > in an object. Consider what happens if we choose to store the
    > value the compiler produces when we apply The Rule to "arr5".
    >
    > Remember that "arr5" has type "array 100 of pointer to char":
    > char *arr5[100];
    > and that The Rule says:
    > In a value context,
    > (yep, got one of those)
    > an object of type "array N
    > (check -- that is what we have; N here is 100)
    > of T"
    > (and T is "pointer to char")
    > becomes a value of type "pointer to T",
    > (pointer to pointer to char)
    > pointing to the first element of the array, i.e., the one
    > with subscript zero.
    >
    > So if we want to store this value in an object, we need one of
    > type "pointer to pointer to char", or "char **":
    >
    > char **holder;
    >
    > holder = arr5;
    >
    > (Note that the array's size -- the constant named N, 100 in this
    > case -- gets throw away. We are allowed to ignore it when working
    > with The Rule. It is a darn good idea to save it away somewhere
    > else, though, because if the array has 100 elements, we had better
    > not write over arr5[231], which does not exist. The Rule tosses
    > the constant, so in practical code, *we* have to save it -- the
    > language threw it away, but it really does matter. For now, we
    > will ignore it, and perhaps cross fingers, toes, and/or eyes and
    > hope we do not use an out-of-bounds array subscript later. Or
    > maybe we will occasionally check, remembering the size is 100.)
    >
    > Now "holder" stores the value produced by The Rule: a value of
    > type "pointer to pointer to char", pointing to the first element
    > of arr5 -- &arr5[0], in other words.
    >
    > This is where things get interesting. Suppose we now want to use
    > the value with the subscript operators, or with ordinary pointer
    > arithmetic. It may be time to remember that subscripts are in fact
    > defined in terms of pointer arithmetic, and the unary "*"
    > pointer-following operator:
    >
    > a
    >
    > "means":
    >
    > *((a) + (i))
    >
    > where the addition uses pointer arithmetic. We fetch the values
    > of the two operands -- the array "a" and the index "i" -- and add
    > them, then we use pointer-follower-"*" to find the object in that
    > slot in the array.
    >
    > But wait! I just said "we use the value of the array"! There it
    > goes again: The Rule tells us how to find the "value" of an array
    > object in a value context. If "a" is an array, The Rule says that
    > we find its value by dropping the constant N, and then taking the
    > address of its first element.
    >
    > This is exactly what we did when we stuck the "value" of arr5 into
    > the variable named "holder"! If "holder" holds the value that
    > gets produced by The Rule, what difference is there between:
    >
    > arr5 /* i.e., *((arr5) + (i)) */
    >
    > and:
    >
    > holder /* i.e., *((holder) + (i)) */
    >
    > ? The answer is: there is no significant difference at all --
    > "arr5" has The Rule applied, producing a value, but "holder" is
    > used in a value context, pulling the *same* value out of it. The
    > only real difference is in the machine instruction(s) used to create
    > the value the first time (by applying The Rule), or to pull the
    > value out of the "holder" variable.
    >
    > Of course, if we use *different* operators, we can get different
    > results. In particular, the sizeof operator has "object context"
    > instead of "value context":
    >
    > size1 = sizeof arr5;
    > size2 = sizeof holder;
    >
    > will set size1 to some fairly large number (like 200, 400, or 800)
    > and size2 to some small number (like 2, 4, or 8) on today's machines;
    > so "arr5" and "holder" really are very different. Their *values*
    > are the same, though, due to The Rule, and to us setting "holder"
    > to the value The Rule creates for "arr5".
    >
    > In short: arrays are not pointers; but, due to The Rule, the "value"
    > of an array *is* a pointer, so a pointer is "just as good" as the
    > actual array, if you only want the value. (But the pointer *MUST*
    > be set to the right value first! Arrays are collections of lots
    > of objects -- N of them, where N is the size in the array definition
    > -- while a pointer is just *one* object. To use it like an array,
    > you have to point it at enough memory to hold all N objects.)
    >


    Everything between this and my last comment I'll have to read some more and
    think about some more. I'm getting it, but you know. (It's getting
    late....for me; work comes early at 6:15am) But anyway, that was a lot of
    good stuff. :) But I may have questions about it later :). If you're
    still paying attention by then.

    [I probably should have snipped some of the above, but I didn't know if you
    wanted to refer to any of it for whatever reason, so I figured it'd be
    easier if I just left it]

    > [skipping a bunch of stuff]
    >
    >>any string that's in quotes in a C program gets stored in a read-only
    >>part of your program when it's running ...

    >
    > Well, maybe: it *may* be read-only, and as a programmer, you are
    > expected not to write on it. If you *do* write on it, the language
    > makes no promises; things can go very wrong. So you should treat
    > it as if it is read-only, even if it happens not to be.
    >
    > Also, this is not true for "any" string, just "most" of them (as
    > you noted later, in something I snipped later).
    >
    >>so the line:
    >>
    >> char *str = "Confused";
    >>
    >>gets coppied inot the read-only memory as soon as your program sees it,
    >>then assigns that address to str, which is why you can't change strings
    >>like that.

    >
    > More precisely, string literals -- those things inside double quotes
    > -- are a shorthand for creating anonymous arrays of "char". These
    > arrays may be stored in read-only areas (and high-quality C compilers
    > should strive to make sure they are). There is an exception for
    > string literals used to initialize arrays of "char"s, so that:
    >
    > char buf[] = "hello";
    >
    > is not required to create one of those anonymous arrays (you have
    > a perfectly good, non-anonymous, array named "buf"; why bother with
    > two copies of the character sequence 'h' 'e' 'l' 'l' 'o' '\0'?).
    > But other cases do create the anonymous arrays.
    >
    > Logically speaking, these anonymous arrays *should* have type
    > "const char [N]":
    >
    > const char __internal_string_00000[9] =
    > { 'C', 'o', 'n', 'f', 'u', 's', 'e', 'd', '\0' };
    >
    > because they are supposed to be read-only; but for historical
    > compatibility with C from the 1970s and 1980s, the "const" is
    > left off of the type. The string literal "Confused" thus has
    > type "array 9 of char" -- 9 because there are nine bytes inside
    > it, counting the terminating '\0' -- instead of "array 9 of
    > const char".
    >
    > Since this is an array, it -- like all arrays -- is once again
    > subject to The Rule. You can write:
    >
    > str = "Confused"; /* just like: str = __internal_string_00000; */
    >
    > to make str point to the uppercase 'C' -- the first element
    > of the array.
    >
    > Every string literal (except those that initialize arrays of char)
    > generates, inside the compiler, another one of these
    > "__internal_string_01234[]" style arrays, and every one of these
    > arrays is another candidate for The Rule.


    Are these candidates for objects that aren't 'named'? (For instance like
    the 'int sz = sizeof "This is a string"' below, that I snipped) If so,
    what about:

    const char buf[] = "This is a string literal";

    Is "This is a string literal" an object? What about buf? Isn't that an
    lvalue referring to the object, i.e. naming it? Is there an internal
    representation for "This is a string literal" (internal name) *and* my own
    name (lvalue) for it?

    What about 'int i = 15'? Is 15 an object and i an lvalue? :)

    [snipped more good stuff]



    Thanks for any help, and patience -
    jab3
    jab3, Dec 8, 2004
    #5
  6. jab3

    Chris Torek Guest

    I do not have time to answer all of this now, but I will put in two
    short answers... (well, short-ish; they got longer than I expected.)

    [I wrote]
    >> More precisely, "char **var" declares "var" as a variable of *type*
    >> "pointer to pointer to char". Whether "var" actually points to
    >> anything at all (much less "anything useful") is up to you, the
    >> programmer.


    In article <>
    jab3 <> wrote:
    >Is "var" an identifier, an object, an lvalue, or a variable? :)


    All three, in fact.

    The name -- the three-letter sequence v, a, r -- is an identifier.
    (This is a syntactic element, i.e., something the compiler uses to
    figure out what you wrote. Each token is a syntactic element of
    some sort; some tokens are identifiers, like the keyword "char",
    some are single character thingies like the '*'; some are two-character
    thingies like an && operator. This particular syntactic element
    is an identifier.)

    The compiler must look up the identifier to see how it is declared
    and/or defined. If it is defined as, for instance, a typedef-name
    -- such as the ST_TYPE in:

    typedef struct st ST_TYPE;

    -- then it would be an identifier, but not a variable or lvalue.
    But here, it has now been declared (and also defined, eventually)
    as a variable:

    char **var;

    so it is a variable. Identifiers have a bunch of properties, such
    as scopes and name-spaces, and a single identifier can actually
    have multiple meanings, as in the (really awful) code:

    void x(void) {
    int x;
    goto x;
    y:
    x += 17;
    printf("the answer is %d\n", x);
    return;
    x:
    x = 25;
    goto y;
    }

    Here the single identifier "x" has three different meanings: it is
    the name of the function x(), it is the name of a variable of type
    int also called x, and it is a goto-label just like "y". (Yuck!)

    C99 has kind of mucked up the word "lvalue", which was pretty well
    defined in C89; but it is safe to say that all ordinary variables
    are lvalues. Even array variables are still lvalues, except that,
    confusingly enough, they are "non-modifiable" lvalues. (The term
    lvalue dates back to compiler guys saying "the thing on the left
    of an assignment", so if you cannot put an array on the left of an
    assignment -- because the array is not modifiable -- then why call
    it an lvalue at all? Probably it was a bad idea, just like us
    USAliens using the word "gas" to refer to both petrol and methane.
    But, as Kurt Vonnegut wrote, so it goes.)

    >... but what is the
    >situation with lvalue and variable and identifier and object?


    The kind of problem we want to solve, by using different words like
    "lvalue" and "identifier" and "object", is to be able to talk about
    what *p or p means when p has a value from malloc():

    char *p;

    p = malloc(len + 1);
    if (p == NULL) ... handle error ...
    strcpy(p, str);

    The strcpy() writes on various p's, e.g., setting p[0] to 'h'
    and p[1] to 'e' and so on to put "hello world" into it. These
    p's must be storage, but it is, at least in how we can talk
    about it, *different* from that for, e.g.:

    char buf[100];
    p = &buf[0];
    strcpy(p, str);

    because in this second case we know that p[0] is the same thing as
    buf[0], and so on. When the memory comes from malloc(), p[0] has
    no other name like buf[0] -- but it is still memory; it can still
    hold values. I call p[0] an object (and so does both C89 and C99).

    >What about 'int i = 15'? Is 15 an object and i an lvalue? :)


    15 is not an object, it is just a value. Objects hold values (or
    hold garbage); values are the things you stick into objects. The
    name "i" is an identifier that, in this case, names the object;
    the C standards (both C89 and C99) say it is indeed an lvalue.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Dec 8, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bret

    char **argv vs. char* argv[]

    Bret, Aug 31, 2003, in forum: C Programming
    Replies:
    21
    Views:
    4,559
    Richard Heathfield
    Sep 3, 2003
  2. David
    Replies:
    10
    Views:
    5,924
    Richard Heathfield
    Sep 15, 2003
  3. Hal Styli
    Replies:
    14
    Views:
    1,615
    Old Wolf
    Jan 20, 2004
  4. lovecreatesbeauty
    Replies:
    1
    Views:
    1,015
    Ian Collins
    May 9, 2006
  5. oogie
    Replies:
    9
    Views:
    645
    Default User
    Nov 18, 2007
Loading...

Share This Page