simple question regarding 5.5 of Ritchie & Kernighan

N

niclane

Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
.....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

.....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined? Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage? In this case why could one of these elements of the array
be altered? The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

Thanks,

Nic
 
M

Michael Mair

niclane said:
Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined?

Because the C standard says so.
Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage?

No. NULL is a null pointer constant and as such not part of a C string.

In case you mean the string terminator '\0' or 0:
This is possible.
However, string literals have static storage duration (i.e. throughout
the program's life time) and the implementation may do things like
- reusing a string literal which already exists. I.e. you have a
string literal "now is the time" and there is another one, maybe in
another translation unit (or even in a library you are linking with),
which also says "now is the time"; so pmessage may point at a string
literal "of its own" or it may point at a shared one.
- reusing the end of an already existing string literal.
Imagine you need the string literal "time" somewhere else, then the
implementation may point at pmessage + strlen("now is the ") .

In addition to that, string literals may be stored in a storage
area which cannot be modified by your program (maybe even burned
into some kind of ROM).
In this case why could one of these elements of the array
be altered?

As pmessage is not a pointer to const char but to char, the
string literal _could_ be modified using pmessage if this had not
been outlawed elsewhere.
The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

No. amessage is an array of char containing a string, pmessage is
a pointer to a string literal.
The right hand side is the initializer. This initializer is treated
differently.

In the case of pmessage, we copy the start address of "now is the time"
into pmessage.
In the case of amessage, we create an array of char with size
strlen("now is the time")+1; then we
strcpy(amessage, "now is the time").

This array's contents are yours to modify, the array may have automatic
storage duration.

Cheers
Michael
 
L

Lawrence Kirby

Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined?

The standard specifies that string literals (i.e. "..." in the source
code) define static objects and any ttempt to modify these objects
results un undefined behaviour. That means that an implementation could
put them in read-only memory, a read-only segment and so on. It also
explicitly permits string literal objects to be merged so, for example

char *p1 = "ABCD";
char *p2 = "BCD";

the compiler could just make things so that p2 ends up a p1+1. SO even if
you did manage to write to a string literal it may have unexpected effects
on the rest of the program.
Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage?

Yes, that is correct. pmessage points at the actual non-modifiable static
object defined by the string literal.
In this case why could one of these elements of the array
be altered?

For example

pmessage[0] = 'X';

would attempt to modify that static object.
The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

In the case of amessage you are defining a separate array which is
initialised in effect by copying in the string data from the string
literal object. amessage is a normal array, not the string literal object
and is modifiable. So

amessage[0] = 'X';

writes to the array amessage and not the string literal object, which is
fine.

Lawrence
 
N

niclane

Thanks Michael and Lawerence. You both cleared things up for me. I
think the key point here is that my confusion mainly stemed from the
fact that both of these variables were being initialized by a constant
string literal which brings in the element of immutability but because
in one case it was being used to initalize an array and hence being
copied (with the copy being modifiable) and in the other case being
just pointed to and hence not ever being modifiable.

Cheers guys,

Nic
 
N

niclane

Thanks Michael and Lawerence. You both cleared things up for me. I
think the key point here is that my confusion mainly stemed from the
fact that both of these variables were being initialized by a constant
string literal which brings in the element of immutability but because
in one case it was being used to initalize an array and hence being
copied (with the copy being modifiable) and in the other case being
just pointed to and hence not ever being modifiable.

Cheers guys,

Nic
 
A

André Brière

Lawrence Kirby said:
Hi,

I was reading section 5.5 of Ritchie and Kernighan and saw the
following:

"
....

char amessage[] = "now is the time";
char *pmessage = "now is the time";

....

pmessage is a pointer, initalized to point to a string constant; the
pointer may subsequently be modified to point elsewhere, but the result
is undefined if you try to modify the string contents.
"

Why would the result be undefined?

The standard specifies that string literals (i.e. "..." in the source
code) define static objects and any ttempt to modify these objects
results un undefined behaviour. That means that an implementation could
put them in read-only memory, a read-only segment and so on. It also
explicitly permits string literal objects to be merged so, for example

char *p1 = "ABCD";
char *p2 = "BCD";

the compiler could just make things so that p2 ends up a p1+1. SO even if
you did manage to write to a string literal it may have unexpected effects
on the rest of the program.
Doesn't the initization create an
array of chars in memory terminated with a NULL and this is pointed to
by pmessage?

Yes, that is correct. pmessage points at the actual non-modifiable static
object defined by the string literal.
In this case why could one of these elements of the array
be altered?

For example

pmessage[0] = 'X';

would attempt to modify that static object.
The book says that the declaration of amessage would allow
individual chars to be altered. Which only makes me more confused since
aren't these two statements in the respect of the rhs the same?

In the case of amessage you are defining a separate array which is
initialised in effect by copying in the string data from the string
literal object. amessage is a normal array, not the string literal object
and is modifiable. So

amessage[0] = 'X';

writes to the array amessage and not the string literal object, which is
fine.

Lawrence

I'm still confused. K&R 5.5 states "Individual characters within the array
may be changed but amessage will always refer to the same storage" and
pictures amessage as pointing to a string constant (look at the picture in
the book! there is no mention of string copying), though amessage is an
array, not a pointer. But if the string constant is stored in a read-only
segment, this would keep us from modifying the string in the statement:

pmessage[0] = 'X';

When I run the following program in gcc/cygwin:

#include <stdio.h>
int main(void)
{
char * pmessage = "now is the time";
pmessage[0] = 'X';
return 0;
}

I get a segmentation fault.
So, when K&R states that characters within the array may be changed with
amessage, do I misunderstand, or gcc is bugged? If "may" means
implementation-possible, that means that the behaviour is undefined, just as
with pmessage.

André.
 
N

Netocrat

In the case of amessage you are defining a separate array which is
initialised in effect by copying in the string data from the string
literal object. amessage is a normal array, not the string literal
object and is modifiable. So

amessage[0] = 'X';

writes to the array amessage and not the string literal object, which is
fine.
I'm still confused. K&R 5.5 states "Individual characters within the
array may be changed but amessage will always refer to the same storage"

"Same storage" doesn't mean the same storage as pmessage. It means that
if you change the contents of amessage the place where they are stored
is not different to the place amessage's previous contents were stored.
and pictures amessage as pointing to a string constant
(look at the picture in the book!

I would if I had not left it behind in a move.
there is no mention of string copying)

There may be no mention of it in the book, nevertheless that is the effect
of the initialisation.
though amessage is an array, not a pointer.

Without the book in front of me I can't explain why you are
misinterpreting it as representing an array as a pointer, but that you
must be because K&R would not do so.

Regardless of what you think the book is trying to represent, the two
statements do the following:

char amessage[] = "now is the time";

causes storage for an array of characters with enough size to hold the
string "now is the time" (including terminating '\0') to be allocated and
for that string to be effectively copied into that storage. Since it is an
array and not declared const, the contents may be modified, but the
storage space itself - i.e. where amessage points to and the size of
what it points to - may not.

char *pmessage = "now is the time";

causes the character pointer pmessage to point to the start of a constant
string "now is the time". Since the string is constant, you may not
modify any part of it and may (will?) get errors if you try to do so.

Since pmessage itself is not declared const, there is nothing to stop you
from pointing it to another place at a later point in time. So in this
case you can modify pmessage, but not the contents of what it initially
points to.
But if the string constant is
stored in a read-only segment, this would keep us from modifying the
string in the statement:

pmessage[0] = 'X';
Correct.

When I run the following program in gcc/cygwin:

#include <stdio.h>
int main(void)
{
char * pmessage = "now is the time";
pmessage[0] = 'X';
return 0;
}
}
I get a segmentation fault.

As expected and according with what everyone in this thread has explained.
So, when K&R states that characters within the array may be changed with
amessage, do I misunderstand, or gcc is bugged?

How is your code in any way related to amessage? It or an array of any
type doesn't even appear in the code. If your code had used amessage
instead of pmessage it would not have crashed.
 
A

André Brière

Netocrat said:
When I run the following program in gcc/cygwin:

#include <stdio.h>
int main(void)
{
char * pmessage = "now is the time";
pmessage[0] = 'X';
return 0;
}
}
I get a segmentation fault.

As expected and according with what everyone in this thread has explained.
So, when K&R states that characters within the array may be changed with
amessage, do I misunderstand, or gcc is bugged?

How is your code in any way related to amessage? It or an array of any
type doesn't even appear in the code. If your code had used amessage
instead of pmessage it would not have crashed.

Deeply sorry! I really should read my own postings before sending them. The
piece of code that I produced showed no relation to what I was trying to
say, and replacing the pointer pmessage with the array amessage makes the
code work.
Now if I understand,
char * pmessage = "now is the time"
defines a pointer that points to a string constant: the pointer can point
elsewhere, but the string pointed to by it is unmodifyable, or at least
trying to modify it is implementation-dependent, hence leads to undefined
behaviour.
char amessage[] = "now is the time"
defines (and allocates) an array long enough to contain the string "now is
the time", but amessage itself, i.e. &amessage[0], does not "point" to the
string literal "now is the time" as it appears in the program code itself,
or as it may be stored in a special zone of memory, read-only or not. Hence
modifying the array contents does not affect the string literal at all.
The two statements seem to be different in nature; while
char * pmessage = "now is the time"
is a one-line way of expressing a definition and an initialisation:
char * pmessage;
pmessage = "now is the time";
the statement
char amessage[] = "now is the time"
seems to be a C-syntax allowed shortcut for initialising an array of chars
right in its definition, that would not make sense in any other context; we
could not write for example:
char amessage[];
amessage = "now is the time"
The statement char amessage[] = "now is the time" looks like any other legal
one-line way of expressing tow statements, like:
int i = 5;
for:
int i;
i = 5;
so it has always made me think of it as a one-way of expressing a definition
and the pointing of amessage to a contant string literal, which it is not.
Am I right?

André.
 
N

Netocrat

On Sun, 19 Jun 2005 22:54:52 -0400, André Brière wrote:

The two statements seem to be different in nature ...

Yes they are. Also consider this illustration of the difference between
pointers to char and arrays of char:

#include <stdio.h>

int main(int argc, char **argv)
{
char * pmessage = "now is the time";
char amessage[] = "now is the time";

printf("pmessage holds: %s\n", pmessage);
printf("pmessage: %p; &pmessage: %p\n", pmessage, (void *)&pmessage);
printf("amessage holds: %s\n", amessage);
printf("amessage: %p; &amessage: %p\n", amessage, (void *)&amessage);
return 0;
}

Output:

pmessage holds: now is the time
pmessage: 0x8048534; &pmessage: 0xbffff29c
amessage holds: now is the time
amessage: 0xbffff280; &amessage: 0xbffff280

Notice that amessage and &amessage are the same value, whereas they are
different for pmessage. i.e. since we can modify the pointer we require
a memory address to hold the value it points to and that memory address
is - obviously - different to the address of the first character of the
string that it points to.

Whereas since we can't modify the array, there is no need for a separate
memory address to hold a pointer to the start of the string that it points
to, so dereferencing it simply gives back the same address as the first
character in the array.

This can be a source of confusion with dynamically allocated (i.e. at
runtime) multidimensional arrays. These are usually implemented as
arrays of pointers to arrays, and are fundamentally different from
statically assigned multidimensional arrays, for - by extension - the same
reasons as above. But I won't continue with that unless you ask because
it can be confusing without a reasonable level of familiarity with
pointers and arrays.
the statement
char amessage[] = "now is the time"
seems to be a C-syntax allowed shortcut for initialising an array of
chars right in its definition, that would not make sense in any other
context; we could not write for example:
char amessage[];
amessage = "now is the time"
Am I right?

Spot on. Well put.
 
A

André Brière

Thanks a lot! I had heard many opinions on this topic, but either I
misunderstood them, or they conflicted with each other. My mind's now clear
on this issue.

Netocrat said:
This can be a source of confusion with dynamically allocated (i.e. at
runtime) multidimensional arrays. These are usually implemented as
arrays of pointers to arrays, and are fundamentally different from
statically assigned multidimensional arrays, for - by extension - the same
reasons as above.

Pointers are clear to me: they are variables by themselves, allocated at
addresses which have nothing to do with the addresses they contain, these
latter pointing to other variables, or constants, or functions, or
dynamically allocated arrays (or sub-arrays of dynamically-allocated
multidimensional arrays) ... The problem I had always had was with arrays
declared as such, with the meaning of "amessage", and your explanation, and
Lawrence Kirby's one retrospectively, enlightens me after a long while of
confusion.

Many thanks!

André.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top