simple question regarding 5.5 of Ritchie & Kernighan

Discussion in 'C Programming' started by niclane, Jun 19, 2005.

  1. niclane

    niclane Guest

    Hi,

    I was reading section 5.5 of Ritchie and Kernighan and saw the
    following:

    "
    .....

    char amessage[] = "now is the time";
    char *pmessage = "now is the time";

    .....

    pmessage is a pointer, initalized to point to a string constant; the
    pointer may subsequently be modified to point elsewhere, but the result
    is undefined if you try to modify the string contents.
    "

    Why would the result be undefined? Doesn't the initization create an
    array of chars in memory terminated with a NULL and this is pointed to
    by pmessage? In this case why could one of these elements of the array
    be altered? The book says that the declaration of amessage would allow
    individual chars to be altered. Which only makes me more confused since
    aren't these two statements in the respect of the rhs the same?

    Thanks,

    Nic
     
    niclane, Jun 19, 2005
    #1
    1. Advertising

  2. niclane

    Michael Mair Guest

    niclane wrote:
    > Hi,
    >
    > I was reading section 5.5 of Ritchie and Kernighan and saw the
    > following:
    >
    > "
    > ....
    >
    > char amessage[] = "now is the time";
    > char *pmessage = "now is the time";
    >
    > ....
    >
    > pmessage is a pointer, initalized to point to a string constant; the
    > pointer may subsequently be modified to point elsewhere, but the result
    > is undefined if you try to modify the string contents.
    > "
    >
    > Why would the result be undefined?


    Because the C standard says so.

    > Doesn't the initization create an
    > array of chars in memory terminated with a NULL and this is pointed to
    > by pmessage?


    No. NULL is a null pointer constant and as such not part of a C string.

    In case you mean the string terminator '\0' or 0:
    This is possible.
    However, string literals have static storage duration (i.e. throughout
    the program's life time) and the implementation may do things like
    - reusing a string literal which already exists. I.e. you have a
    string literal "now is the time" and there is another one, maybe in
    another translation unit (or even in a library you are linking with),
    which also says "now is the time"; so pmessage may point at a string
    literal "of its own" or it may point at a shared one.
    - reusing the end of an already existing string literal.
    Imagine you need the string literal "time" somewhere else, then the
    implementation may point at pmessage + strlen("now is the ") .

    In addition to that, string literals may be stored in a storage
    area which cannot be modified by your program (maybe even burned
    into some kind of ROM).

    > In this case why could one of these elements of the array
    > be altered?


    As pmessage is not a pointer to const char but to char, the
    string literal _could_ be modified using pmessage if this had not
    been outlawed elsewhere.

    > The book says that the declaration of amessage would allow
    > individual chars to be altered. Which only makes me more confused since
    > aren't these two statements in the respect of the rhs the same?


    No. amessage is an array of char containing a string, pmessage is
    a pointer to a string literal.
    The right hand side is the initializer. This initializer is treated
    differently.

    In the case of pmessage, we copy the start address of "now is the time"
    into pmessage.
    In the case of amessage, we create an array of char with size
    strlen("now is the time")+1; then we
    strcpy(amessage, "now is the time").

    This array's contents are yours to modify, the array may have automatic
    storage duration.

    Cheers
    Michael
    --
    E-Mail: Mine is an /at/ gmx /dot/ de address.
     
    Michael Mair, Jun 19, 2005
    #2
    1. Advertising

  3. On Sun, 19 Jun 2005 04:13:08 -0700, niclane wrote:

    > Hi,
    >
    > I was reading section 5.5 of Ritchie and Kernighan and saw the
    > following:
    >
    > "
    > ....
    >
    > char amessage[] = "now is the time";
    > char *pmessage = "now is the time";
    >
    > ....
    >
    > pmessage is a pointer, initalized to point to a string constant; the
    > pointer may subsequently be modified to point elsewhere, but the result
    > is undefined if you try to modify the string contents.
    > "
    >
    > Why would the result be undefined?


    The standard specifies that string literals (i.e. "..." in the source
    code) define static objects and any ttempt to modify these objects
    results un undefined behaviour. That means that an implementation could
    put them in read-only memory, a read-only segment and so on. It also
    explicitly permits string literal objects to be merged so, for example

    char *p1 = "ABCD";
    char *p2 = "BCD";

    the compiler could just make things so that p2 ends up a p1+1. SO even if
    you did manage to write to a string literal it may have unexpected effects
    on the rest of the program.

    > Doesn't the initization create an
    > array of chars in memory terminated with a NULL and this is pointed to
    > by pmessage?


    Yes, that is correct. pmessage points at the actual non-modifiable static
    object defined by the string literal.

    > In this case why could one of these elements of the array
    > be altered?


    For example

    pmessage[0] = 'X';

    would attempt to modify that static object.

    > The book says that the declaration of amessage would allow
    > individual chars to be altered. Which only makes me more confused since
    > aren't these two statements in the respect of the rhs the same?


    In the case of amessage you are defining a separate array which is
    initialised in effect by copying in the string data from the string
    literal object. amessage is a normal array, not the string literal object
    and is modifiable. So

    amessage[0] = 'X';

    writes to the array amessage and not the string literal object, which is
    fine.

    Lawrence
     
    Lawrence Kirby, Jun 19, 2005
    #3
  4. niclane

    niclane Guest

    Thanks Michael and Lawerence. You both cleared things up for me. I
    think the key point here is that my confusion mainly stemed from the
    fact that both of these variables were being initialized by a constant
    string literal which brings in the element of immutability but because
    in one case it was being used to initalize an array and hence being
    copied (with the copy being modifiable) and in the other case being
    just pointed to and hence not ever being modifiable.

    Cheers guys,

    Nic
     
    niclane, Jun 19, 2005
    #4
  5. niclane

    niclane Guest

    Thanks Michael and Lawerence. You both cleared things up for me. I
    think the key point here is that my confusion mainly stemed from the
    fact that both of these variables were being initialized by a constant
    string literal which brings in the element of immutability but because
    in one case it was being used to initalize an array and hence being
    copied (with the copy being modifiable) and in the other case being
    just pointed to and hence not ever being modifiable.

    Cheers guys,

    Nic
     
    niclane, Jun 19, 2005
    #5
  6. "Lawrence Kirby" <> wrote in message
    news:p...
    > On Sun, 19 Jun 2005 04:13:08 -0700, niclane wrote:
    >
    > > Hi,
    > >
    > > I was reading section 5.5 of Ritchie and Kernighan and saw the
    > > following:
    > >
    > > "
    > > ....
    > >
    > > char amessage[] = "now is the time";
    > > char *pmessage = "now is the time";
    > >
    > > ....
    > >
    > > pmessage is a pointer, initalized to point to a string constant; the
    > > pointer may subsequently be modified to point elsewhere, but the result
    > > is undefined if you try to modify the string contents.
    > > "
    > >
    > > Why would the result be undefined?

    >
    > The standard specifies that string literals (i.e. "..." in the source
    > code) define static objects and any ttempt to modify these objects
    > results un undefined behaviour. That means that an implementation could
    > put them in read-only memory, a read-only segment and so on. It also
    > explicitly permits string literal objects to be merged so, for example
    >
    > char *p1 = "ABCD";
    > char *p2 = "BCD";
    >
    > the compiler could just make things so that p2 ends up a p1+1. SO even if
    > you did manage to write to a string literal it may have unexpected effects
    > on the rest of the program.
    >
    > > Doesn't the initization create an
    > > array of chars in memory terminated with a NULL and this is pointed to
    > > by pmessage?

    >
    > Yes, that is correct. pmessage points at the actual non-modifiable static
    > object defined by the string literal.
    >
    > > In this case why could one of these elements of the array
    > > be altered?

    >
    > For example
    >
    > pmessage[0] = 'X';
    >
    > would attempt to modify that static object.
    >
    > > The book says that the declaration of amessage would allow
    > > individual chars to be altered. Which only makes me more confused since
    > > aren't these two statements in the respect of the rhs the same?

    >
    > In the case of amessage you are defining a separate array which is
    > initialised in effect by copying in the string data from the string
    > literal object. amessage is a normal array, not the string literal object
    > and is modifiable. So
    >
    > amessage[0] = 'X';
    >
    > writes to the array amessage and not the string literal object, which is
    > fine.
    >
    > Lawrence


    I'm still confused. K&R 5.5 states "Individual characters within the array
    may be changed but amessage will always refer to the same storage" and
    pictures amessage as pointing to a string constant (look at the picture in
    the book! there is no mention of string copying), though amessage is an
    array, not a pointer. But if the string constant is stored in a read-only
    segment, this would keep us from modifying the string in the statement:

    pmessage[0] = 'X';

    When I run the following program in gcc/cygwin:

    #include <stdio.h>
    int main(void)
    {
    char * pmessage = "now is the time";
    pmessage[0] = 'X';
    return 0;
    }

    I get a segmentation fault.
    So, when K&R states that characters within the array may be changed with
    amessage, do I misunderstand, or gcc is bugged? If "may" means
    implementation-possible, that means that the behaviour is undefined, just as
    with pmessage.

    André.
     
    André Brière, Jun 19, 2005
    #6
  7. niclane

    Netocrat Guest

    On Sun, 19 Jun 2005 15:11:27 -0400, André Brière wrote:

    >> In the case of amessage you are defining a separate array which is
    >> initialised in effect by copying in the string data from the string
    >> literal object. amessage is a normal array, not the string literal
    >> object and is modifiable. So
    >>
    >> amessage[0] = 'X';
    >>
    >> writes to the array amessage and not the string literal object, which is
    >> fine.


    > I'm still confused. K&R 5.5 states "Individual characters within the
    > array may be changed but amessage will always refer to the same storage"


    "Same storage" doesn't mean the same storage as pmessage. It means that
    if you change the contents of amessage the place where they are stored
    is not different to the place amessage's previous contents were stored.

    > and pictures amessage as pointing to a string constant


    > (look at the picture in the book!


    I would if I had not left it behind in a move.

    > there is no mention of string copying)


    There may be no mention of it in the book, nevertheless that is the effect
    of the initialisation.

    > though amessage is an array, not a pointer.


    Without the book in front of me I can't explain why you are
    misinterpreting it as representing an array as a pointer, but that you
    must be because K&R would not do so.

    Regardless of what you think the book is trying to represent, the two
    statements do the following:

    char amessage[] = "now is the time";

    causes storage for an array of characters with enough size to hold the
    string "now is the time" (including terminating '\0') to be allocated and
    for that string to be effectively copied into that storage. Since it is an
    array and not declared const, the contents may be modified, but the
    storage space itself - i.e. where amessage points to and the size of
    what it points to - may not.

    char *pmessage = "now is the time";

    causes the character pointer pmessage to point to the start of a constant
    string "now is the time". Since the string is constant, you may not
    modify any part of it and may (will?) get errors if you try to do so.

    Since pmessage itself is not declared const, there is nothing to stop you
    from pointing it to another place at a later point in time. So in this
    case you can modify pmessage, but not the contents of what it initially
    points to.

    > But if the string constant is
    > stored in a read-only segment, this would keep us from modifying the
    > string in the statement:
    >
    > pmessage[0] = 'X';


    Correct.

    > When I run the following program in gcc/cygwin:
    >
    > #include <stdio.h>
    > int main(void)
    > {
    > char * pmessage = "now is the time";
    > pmessage[0] = 'X';
    > return 0;
    > }
    > }
    > I get a segmentation fault.


    As expected and according with what everyone in this thread has explained.

    > So, when K&R states that characters within the array may be changed with
    > amessage, do I misunderstand, or gcc is bugged?


    How is your code in any way related to amessage? It or an array of any
    type doesn't even appear in the code. If your code had used amessage
    instead of pmessage it would not have crashed.
     
    Netocrat, Jun 20, 2005
    #7
  8. "Netocrat" <> wrote in message
    news:p...
    > On Sun, 19 Jun 2005 15:11:27 -0400, André Brière wrote:
    >
    > > When I run the following program in gcc/cygwin:
    > >
    > > #include <stdio.h>
    > > int main(void)
    > > {
    > > char * pmessage = "now is the time";
    > > pmessage[0] = 'X';
    > > return 0;
    > > }
    > > }
    > > I get a segmentation fault.

    >
    > As expected and according with what everyone in this thread has explained.
    >
    > > So, when K&R states that characters within the array may be changed with
    > > amessage, do I misunderstand, or gcc is bugged?

    >
    > How is your code in any way related to amessage? It or an array of any
    > type doesn't even appear in the code. If your code had used amessage
    > instead of pmessage it would not have crashed.
    >


    Deeply sorry! I really should read my own postings before sending them. The
    piece of code that I produced showed no relation to what I was trying to
    say, and replacing the pointer pmessage with the array amessage makes the
    code work.
    Now if I understand,
    char * pmessage = "now is the time"
    defines a pointer that points to a string constant: the pointer can point
    elsewhere, but the string pointed to by it is unmodifyable, or at least
    trying to modify it is implementation-dependent, hence leads to undefined
    behaviour.
    char amessage[] = "now is the time"
    defines (and allocates) an array long enough to contain the string "now is
    the time", but amessage itself, i.e. &amessage[0], does not "point" to the
    string literal "now is the time" as it appears in the program code itself,
    or as it may be stored in a special zone of memory, read-only or not. Hence
    modifying the array contents does not affect the string literal at all.
    The two statements seem to be different in nature; while
    char * pmessage = "now is the time"
    is a one-line way of expressing a definition and an initialisation:
    char * pmessage;
    pmessage = "now is the time";
    the statement
    char amessage[] = "now is the time"
    seems to be a C-syntax allowed shortcut for initialising an array of chars
    right in its definition, that would not make sense in any other context; we
    could not write for example:
    char amessage[];
    amessage = "now is the time"
    The statement char amessage[] = "now is the time" looks like any other legal
    one-line way of expressing tow statements, like:
    int i = 5;
    for:
    int i;
    i = 5;
    so it has always made me think of it as a one-way of expressing a definition
    and the pointing of amessage to a contant string literal, which it is not.
    Am I right?

    André.
     
    André Brière, Jun 20, 2005
    #8
  9. niclane

    Netocrat Guest

    On Sun, 19 Jun 2005 22:54:52 -0400, André Brière wrote:

    <snip>

    > The two statements seem to be different in nature ...


    Yes they are. Also consider this illustration of the difference between
    pointers to char and arrays of char:

    #include <stdio.h>

    int main(int argc, char **argv)
    {
    char * pmessage = "now is the time";
    char amessage[] = "now is the time";

    printf("pmessage holds: %s\n", pmessage);
    printf("pmessage: %p; &pmessage: %p\n", pmessage, (void *)&pmessage);
    printf("amessage holds: %s\n", amessage);
    printf("amessage: %p; &amessage: %p\n", amessage, (void *)&amessage);
    return 0;
    }

    Output:

    pmessage holds: now is the time
    pmessage: 0x8048534; &pmessage: 0xbffff29c
    amessage holds: now is the time
    amessage: 0xbffff280; &amessage: 0xbffff280

    Notice that amessage and &amessage are the same value, whereas they are
    different for pmessage. i.e. since we can modify the pointer we require
    a memory address to hold the value it points to and that memory address
    is - obviously - different to the address of the first character of the
    string that it points to.

    Whereas since we can't modify the array, there is no need for a separate
    memory address to hold a pointer to the start of the string that it points
    to, so dereferencing it simply gives back the same address as the first
    character in the array.

    This can be a source of confusion with dynamically allocated (i.e. at
    runtime) multidimensional arrays. These are usually implemented as
    arrays of pointers to arrays, and are fundamentally different from
    statically assigned multidimensional arrays, for - by extension - the same
    reasons as above. But I won't continue with that unless you ask because
    it can be confusing without a reasonable level of familiarity with
    pointers and arrays.

    > the statement
    > char amessage[] = "now is the time"
    > seems to be a C-syntax allowed shortcut for initialising an array of
    > chars right in its definition, that would not make sense in any other
    > context; we could not write for example:
    > char amessage[];
    > amessage = "now is the time"
    > Am I right?


    Spot on. Well put.
     
    Netocrat, Jun 20, 2005
    #9
  10. Thanks a lot! I had heard many opinions on this topic, but either I
    misunderstood them, or they conflicted with each other. My mind's now clear
    on this issue.

    "Netocrat" <> wrote in message
    news:p...
    >
    > This can be a source of confusion with dynamically allocated (i.e. at
    > runtime) multidimensional arrays. These are usually implemented as
    > arrays of pointers to arrays, and are fundamentally different from
    > statically assigned multidimensional arrays, for - by extension - the same
    > reasons as above.


    Pointers are clear to me: they are variables by themselves, allocated at
    addresses which have nothing to do with the addresses they contain, these
    latter pointing to other variables, or constants, or functions, or
    dynamically allocated arrays (or sub-arrays of dynamically-allocated
    multidimensional arrays) ... The problem I had always had was with arrays
    declared as such, with the meaning of "amessage", and your explanation, and
    Lawrence Kirby's one retrospectively, enlightens me after a long while of
    confusion.

    Many thanks!

    André.
     
    André Brière, Jun 20, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Albert
    Replies:
    4
    Views:
    374
    Mike Wahler
    Dec 30, 2005
  2. kaili
    Replies:
    1
    Views:
    353
    Simon Biber
    Jan 1, 2007
  3. kaili
    Replies:
    8
    Views:
    381
    Maraw
    Jan 3, 2007
  4. Replies:
    13
    Views:
    4,268
    rideema
    Dec 17, 2008
  5. sandeep

    Questions about K&R (Kernighan and Ritchi)

    sandeep, Apr 22, 2010, in forum: C Programming
    Replies:
    57
    Views:
    1,193
    Tim Rentsch
    Apr 29, 2010
Loading...

Share This Page