simple question regarding 5.5 of Ritchie & Kernighan

niclane · Jun 19, 2005

Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
.....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

.....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined? Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage? In this case why could one of these elements of the array
be altered? The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

Thanks,

Nic

Michael Mair · Jun 19, 2005

niclane said:
Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined?

Because the C standard says so.

Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage?

No. NULL is a null pointer constant and as such not part of a C string.

In case you mean the string terminator '\0' or 0:
This is possible.
However, string literals have static storage duration (i.e. throughout
the program's life time) and the implementation may do things like
- reusing a string literal which already exists. I.e. you have a
string literal "now is the time" and there is another one, maybe in
another translation unit (or even in a library you are linking with),
which also says "now is the time"; so pmessage may point at a string
literal "of its own" or it may point at a shared one.
- reusing the end of an already existing string literal.
Imagine you need the string literal "time" somewhere else, then the
implementation may point at pmessage + strlen("now is the ") .

In addition to that, string literals may be stored in a storage
area which cannot be modified by your program (maybe even burned
into some kind of ROM).

In this case why could one of these elements of the array
be altered?

As pmessage is not a pointer to const char but to char, the
string literal _could_ be modified using pmessage if this had not
been outlawed elsewhere.

The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

No. amessage is an array of char containing a string, pmessage is
a pointer to a string literal.
The right hand side is the initializer. This initializer is treated
differently.

In the case of pmessage, we copy the start address of "now is the time"
into pmessage.
In the case of amessage, we create an array of char with size
strlen("now is the time")+1; then we
strcpy(amessage, "now is the time").

This array's contents are yours to modify, the array may have automatic
storage duration.

Cheers
Michael

Lawrence Kirby · Jun 19, 2005

Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined?

The standard specifies that string literals (i.e. "..." in the source
code) define static objects and any ttempt to modify these objects
results un undefined behaviour. That means that an implementation could
put them in read-only memory, a read-only segment and so on. It also
explicitly permits string literal objects to be merged so, for example

char *p1 = "ABCD";
char *p2 = "BCD";

the compiler could just make things so that p2 ends up a p1+1. SO even if
you did manage to write to a string literal it may have unexpected effects
on the rest of the program.

Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage?

Yes, that is correct. pmessage points at the actual non-modifiable static
object defined by the string literal.

In this case why could one of these elements of the array
be altered?

For example

pmessage[0] = 'X';

would attempt to modify that static object.

The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

In the case of amessage you are defining a separate array which is
initialised in effect by copying in the string data from the string
literal object. amessage is a normal array, not the string literal object
and is modifiable. So

amessage[0] = 'X';

writes to the array amessage and not the string literal object, which is
fine.

Lawrence

niclane · Jun 19, 2005

Thanks Michael and Lawerence. You both cleared things up for me. I
think the key point here is that my confusion mainly stemed from the
fact that both of these variables were being initialized by a constant
string literal which brings in the element of immutability but because
in one case it was being used to initalize an array and hence being
copied (with the copy being modifiable) and in the other case being
just pointed to and hence not ever being modifiable.

Cheers guys,

Nic

niclane · Jun 19, 2005

Thanks Michael and Lawerence. You both cleared things up for me. I
think the key point here is that my confusion mainly stemed from the
fact that both of these variables were being initialized by a constant
string literal which brings in the element of immutability but because
in one case it was being used to initalize an array and hence being
copied (with the copy being modifiable) and in the other case being
just pointed to and hence not ever being modifiable.

Cheers guys,

Nic

André Brière · Jun 19, 2005

Lawrence Kirby said:
Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined?

Click to expand...

The standard specifies that string literals (i.e. "..." in the source
code) define static objects and any ttempt to modify these objects
results un undefined behaviour. That means that an implementation could
put them in read-only memory, a read-only segment and so on. It also
explicitly permits string literal objects to be merged so, for example

char *p1 = "ABCD";
char *p2 = "BCD";

the compiler could just make things so that p2 ends up a p1+1. SO even if
you did manage to write to a string literal it may have unexpected effects
on the rest of the program.

Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage?

Click to expand...

Yes, that is correct. pmessage points at the actual non-modifiable static
object defined by the string literal.

In this case why could one of these elements of the array
be altered?

Click to expand...

For example

pmessage[0] = 'X';

would attempt to modify that static object.

The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

Click to expand...

In the case of amessage you are defining a separate array which is
initialised in effect by copying in the string data from the string
literal object. amessage is a normal array, not the string literal object
and is modifiable. So

amessage[0] = 'X';

writes to the array amessage and not the string literal object, which is
fine.

Lawrence

I'm still confused. K&R 5.5 states "Individual characters within the array
may be changed but amessage will always refer to the same storage" and
pictures amessage as pointing to a string constant (look at the picture in
the book! there is no mention of string copying), though amessage is an
array, not a pointer. But if the string constant is stored in a read-only
segment, this would keep us from modifying the string in the statement:

pmessage[0] = 'X';

When I run the following program in gcc/cygwin:

#include <stdio.h>
int main(void)
{
char * pmessage = "now is the time";
pmessage[0] = 'X';
return 0;
}

I get a segmentation fault.
So, when K&R states that characters within the array may be changed with
amessage, do I misunderstand, or gcc is bugged? If "may" means
implementation-possible, that means that the behaviour is undefined, just as
with pmessage.

André.

Netocrat · Jun 20, 2005

In the case of amessage you are defining a separate array which is
initialised in effect by copying in the string data from the string
literal object. amessage is a normal array, not the string literal
object and is modifiable. So

amessage[0] = 'X';

writes to the array amessage and not the string literal object, which is
fine.

Click to expand...

I'm still confused. K&R 5.5 states "Individual characters within the
array may be changed but amessage will always refer to the same storage"

"Same storage" doesn't mean the same storage as pmessage. It means that
if you change the contents of amessage the place where they are stored
is not different to the place amessage's previous contents were stored.

and pictures amessage as pointing to a string constant

(look at the picture in the book!

I would if I had not left it behind in a move.

there is no mention of string copying)

There may be no mention of it in the book, nevertheless that is the effect
of the initialisation.

though amessage is an array, not a pointer.

Without the book in front of me I can't explain why you are
misinterpreting it as representing an array as a pointer, but that you
must be because K&R would not do so.

Regardless of what you think the book is trying to represent, the two
statements do the following:

char amessage[] = "now is the time";

causes storage for an array of characters with enough size to hold the
string "now is the time" (including terminating '\0') to be allocated and
for that string to be effectively copied into that storage. Since it is an
array and not declared const, the contents may be modified, but the
storage space itself - i.e. where amessage points to and the size of
what it points to - may not.

char *pmessage = "now is the time";

causes the character pointer pmessage to point to the start of a constant
string "now is the time". Since the string is constant, you may not
modify any part of it and may (will?) get errors if you try to do so.

Since pmessage itself is not declared const, there is nothing to stop you
from pointing it to another place at a later point in time. So in this
case you can modify pmessage, but not the contents of what it initially
points to.

But if the string constant is
stored in a read-only segment, this would keep us from modifying the
string in the statement:

pmessage[0] = 'X';
Correct.

When I run the following program in gcc/cygwin:

#include <stdio.h>
int main(void)
{
char * pmessage = "now is the time";
pmessage[0] = 'X';
return 0;
}
}
I get a segmentation fault.

As expected and according with what everyone in this thread has explained.

So, when K&R states that characters within the array may be changed with
amessage, do I misunderstand, or gcc is bugged?

How is your code in any way related to amessage? It or an array of any
type doesn't even appear in the code. If your code had used amessage
instead of pmessage it would not have crashed.

André Brière · Jun 20, 2005

Netocrat said:
When I run the following program in gcc/cygwin:

#include <stdio.h>
int main(void)
{
char * pmessage = "now is the time";
pmessage[0] = 'X';
return 0;
}
}
I get a segmentation fault.

Click to expand...

As expected and according with what everyone in this thread has explained.

So, when K&R states that characters within the array may be changed with
amessage, do I misunderstand, or gcc is bugged?

Click to expand...

How is your code in any way related to amessage? It or an array of any
type doesn't even appear in the code. If your code had used amessage
instead of pmessage it would not have crashed.

Deeply sorry! I really should read my own postings before sending them. The
piece of code that I produced showed no relation to what I was trying to
say, and replacing the pointer pmessage with the array amessage makes the
code work.
Now if I understand,
char * pmessage = "now is the time"
defines a pointer that points to a string constant: the pointer can point
elsewhere, but the string pointed to by it is unmodifyable, or at least
trying to modify it is implementation-dependent, hence leads to undefined
behaviour.
char amessage[] = "now is the time"
defines (and allocates) an array long enough to contain the string "now is
the time", but amessage itself, i.e. &amessage[0], does not "point" to the
string literal "now is the time" as it appears in the program code itself,
or as it may be stored in a special zone of memory, read-only or not. Hence
modifying the array contents does not affect the string literal at all.
The two statements seem to be different in nature; while
char * pmessage = "now is the time"
is a one-line way of expressing a definition and an initialisation:
char * pmessage;
pmessage = "now is the time";
the statement
char amessage[] = "now is the time"
seems to be a C-syntax allowed shortcut for initialising an array of chars
right in its definition, that would not make sense in any other context; we
could not write for example:
char amessage[];
amessage = "now is the time"
The statement char amessage[] = "now is the time" looks like any other legal
one-line way of expressing tow statements, like:
int i = 5;
for:
int i;
i = 5;
so it has always made me think of it as a one-way of expressing a definition
and the pointing of amessage to a contant string literal, which it is not.
Am I right?

André.

Netocrat · Jun 20, 2005

On Sun, 19 Jun 2005 22:54:52 -0400, André Brière wrote:

The two statements seem to be different in nature ...

Yes they are. Also consider this illustration of the difference between
pointers to char and arrays of char:

#include <stdio.h>

int main(int argc, char **argv)
{
char * pmessage = "now is the time";
char amessage[] = "now is the time";

printf("pmessage holds: %s\n", pmessage);
printf("pmessage: %p; &pmessage: %p\n", pmessage, (void *)&pmessage);
printf("amessage holds: %s\n", amessage);
printf("amessage: %p; &amessage: %p\n", amessage, (void *)&amessage);
return 0;
}

Output:

pmessage holds: now is the time
pmessage: 0x8048534; &pmessage: 0xbffff29c
amessage holds: now is the time
amessage: 0xbffff280; &amessage: 0xbffff280

Notice that amessage and &amessage are the same value, whereas they are
different for pmessage. i.e. since we can modify the pointer we require
a memory address to hold the value it points to and that memory address
is - obviously - different to the address of the first character of the
string that it points to.

Whereas since we can't modify the array, there is no need for a separate
memory address to hold a pointer to the start of the string that it points
to, so dereferencing it simply gives back the same address as the first
character in the array.

This can be a source of confusion with dynamically allocated (i.e. at
runtime) multidimensional arrays. These are usually implemented as
arrays of pointers to arrays, and are fundamentally different from
statically assigned multidimensional arrays, for - by extension - the same
reasons as above. But I won't continue with that unless you ask because
it can be confusing without a reasonable level of familiarity with
pointers and arrays.

the statement
char amessage[] = "now is the time"
seems to be a C-syntax allowed shortcut for initialising an array of
chars right in its definition, that would not make sense in any other
context; we could not write for example:
char amessage[];
amessage = "now is the time"
Am I right?

Spot on. Well put.

André Brière · Jun 20, 2005

Thanks a lot! I had heard many opinions on this topic, but either I
misunderstood them, or they conflicted with each other. My mind's now clear
on this issue.

Netocrat said:
This can be a source of confusion with dynamically allocated (i.e. at
runtime) multidimensional arrays. These are usually implemented as
arrays of pointers to arrays, and are fundamentally different from
statically assigned multidimensional arrays, for - by extension - the same
reasons as above.

Pointers are clear to me: they are variables by themselves, allocated at
addresses which have nothing to do with the addresses they contain, these
latter pointing to other variables, or constants, or functions, or
dynamically allocated arrays (or sub-arrays of dynamically-allocated
multidimensional arrays) ... The problem I had always had was with arrays
declared as such, with the meaning of "amessage", and your explanation, and
Lawrence Kirby's one retrospectively, enlightens me after a long while of
confusion.

Many thanks!

André.

Question about string constants, para 5.5 K&R	14	Jan 28, 2007
Solutions for the Kernighan and Ritchie	13	Oct 2, 2008
Question regarding array assignment	41	Dec 8, 2013
The question regarding type of pointers	17	Apr 25, 2012
question regarding ++ operator.	7	Jan 25, 2013
K&R2, exercise 5.5	7	Apr 10, 2008
Simple question regarding the use of ;	16	Dec 31, 2008
C Standard Regarding Null Pointer Dereferencing	280	Jul 21, 2010

simple question regarding 5.5 of Ritchie & Kernighan

niclane

Michael Mair

Lawrence Kirby

niclane

niclane

André Brière

Netocrat

André Brière

Netocrat

André Brière

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads