why could not change the value of char*

J

jack jack

recently i learn the pointer of c. wrote a programme below

1 #include<stdio.h>
2
3 int main(void)
4 {
5 char *s = "Hello world!";
6 printf("%s\n", s);
7 *s = 'h';
8 printf("%s\n", s);
9
10 return 0;
11 }
i think it should print
Hello world!
hello world!
but it could not change the 'H' to 'h', and it run error!.
why it could not change the value although without the key const
 
I

Ian Collins

jack said:
recently i learn the pointer of c. wrote a programme below

1 #include<stdio.h>
2
3 int main(void)
4 {
5 char *s = "Hello world!";
6 printf("%s\n", s);
7 *s = 'h';
8 printf("%s\n", s);
9
10 return 0;
11 }
i think it should print
Hello world!
hello world!
but it could not change the 'H' to 'h', and it run error!.
why it could not change the value although without the key const


You've asked question 1.32 in the c.l.c FAQ, see

http://c-faq.com/decl/strlitinit.html
 
K

Kaz Kylheku

recently i learn the pointer of c. wrote a programme below

1 #include<stdio.h>
2
3 int main(void)
4 {
5 char *s = "Hello world!";

You are not creating a new string object here. The string literal "Hello
world!" is created at compile time and integrated into the program's
run-time image. It is effectively a piece of the program.
6 printf("%s\n", s);
7 *s = 'h';

Consequently, this assignment to s[0] constitutes self-modifying code:
your program is trying to change itself.
It is somewhat like trying to do 3 = 4.

This doesn't have any standard-defined behavior.

It cannot work for programs that are stored in ROM, including their string
literals. If you run this on some embedded systems, it might run without a
diagnostic, but the write to the ROM has no effect. On some other systems, you
might get a "bus error" from the invalid access.

Many modern operating systems mimic the situation of a program being placed
into ROM. They load programs into memory, and then use the virtual memory
hardware to mark the memory write-protected.

If you want a modifiable static character array, you have to define an array object.

static char a[] = "hello";
char *s = a; /* perhaps unnecessary; just use a in place of s */

Also note that different occurences of "Hello world!" are not necessarily
distinct objects.

Given these definitions

char *s1 = "hello", *s2 = "hello", *s3 = "lo";

It's quite possible that s1 == s2, and that s3 == s1 + 3. If that is
the case and the run-time lets you successfully execute this:

s1[3] = 'f'

then s1 changes to "helfo", s2 changes to "helfo" and s3 changes to "fo".

This kind of thing can lead to surprising differences in behavior when someone
port the program to another kind of computer, or just another kind of compiler,
or even just changes some code generation options on the same compiler. In
other words, it is highly non-portable.
 
J

jack jack

在 2014å¹´3月1日星期六UTC+8下åˆ4æ—¶42分46秒,Kaz Kylheku写é“:
recently i learn the pointer of c. wrote a programme below

1 #include<stdio.h>
3 int main(void)
5 char *s = "Hello world!";



You are not creating a new string object here. The string literal "Hello

world!" is created at compile time and integrated into the program's

run-time image. It is effectively a piece of the program.


6 printf("%s\n", s);
7 *s = 'h';



Consequently, this assignment to s[0] constitutes self-modifying code:

your program is trying to change itself.

It is somewhat like trying to do 3 = 4.



This doesn't have any standard-defined behavior.



It cannot work for programs that are stored in ROM, including their string

literals. If you run this on some embedded systems, it might run without a

diagnostic, but the write to the ROM has no effect. On some other systems, you

might get a "bus error" from the invalid access.



Many modern operating systems mimic the situation of a program being placed

into ROM. They load programs into memory, and then use the virtual memory

hardware to mark the memory write-protected.



If you want a modifiable static character array, you have to define an array object.



static char a[] = "hello";

char *s = a; /* perhaps unnecessary; just use a in place of s */



Also note that different occurences of "Hello world!" are not necessarily

distinct objects.



Given these definitions



char *s1 = "hello", *s2 = "hello", *s3 = "lo";



It's quite possible that s1 == s2, and that s3 == s1 + 3. If that is

the case and the run-time lets you successfully execute this:



s1[3] = 'f'



then s1 changes to "helfo", s2 changes to "helfo" and s3 changes to "fo".



This kind of thing can lead to surprising differences in behavior when someone

port the program to another kind of computer, or just another kind of compiler,

or even just changes some code generation options on the same compiler. In

other words, it is highly non-portable.

i see, 3q!
 
E

Edward A. Falk

recently i learn the pointer of c. wrote a programme below

5 char *s = "Hello world!";
7 *s = 'h';

Any modern compiler/linker will treat the string "Hello world!" as a
constant and put it into read-only memory. Attempting to over-write the
first byte will probably cause the system to throw a wobbly.

Frankly, I don't even know why the compiler lets you assign this string
to a non-const variable, except perhaps that too much legacy code would
break if the compiler forbade it.
 
E

Eric Sosman

Any modern compiler/linker will treat the string "Hello world!" as a
constant and put it into read-only memory. Attempting to over-write the
first byte will probably cause the system to throw a wobbly.

Frankly, I don't even know why the compiler lets you assign this string
to a non-const variable, except perhaps that too much legacy code would
break if the compiler forbade it.

That's exactly the reason. Quoth the Rationale:

"A large body of C code exists of considerable commercial
value. Every attempt has been made to ensure that the bulk
of this code will be acceptable to any implementation
conforming to the Standard. The C89 Committee did not want
to force most programmers to modify their C programs just
to have them accepted by a conforming translator."

Since `const' was not part of the language prior to C89, existing
code hadn't used it and functions that worked with strings took
`char*' parameters -- even if they didn't intend to modify anything.
Adding `const' to string literals would have meant such functions
could not be called with string literal arguments, and would have
produced precisely the kind of forcing the Committee tried to avoid.
 
J

James Kuyper

Any modern compiler/linker will treat the string "Hello world!" as a
constant and put it into read-only memory. Attempting to over-write the
first byte will probably cause the system to throw a wobbly.

Frankly, I don't even know why the compiler lets you assign this string
to a non-const variable, except perhaps that too much legacy code would
break if the compiler forbade it.

A pointer to the string was used to initialize 's'; the only assignment
in that code was of a character, not a character string.

The fact that the compiler allowed the initialization might have to do
with the fact that the it doesn't violate any of C's rules. The
assignment of 'h' to *s violates 6.4.5p7, but that only specifies that
the behavior is undefined, no diagnostic is required.

The reason the initialization doesn't violate any rules is because
"Hello world!" has the type char[13], whereas it should properly have
the type "const char[13]". That decision was indeed made because 'const'
was a relatively late addition to the language, and by the time it was
added, there was indeed too much legacy code that would break. However,
that decision was made by the C committee, not by the implementor of
that particular compiler.
 
J

John Bode

recently i learn the pointer of c. wrote a programme below

1 #include<stdio.h>
2
3 int main(void)
4 {
5 char *s = "Hello world!";
6 printf("%s\n", s);
7 *s = 'h';
8 printf("%s\n", s);
9
10 return 0;
11 }
i think it should print
Hello world!
hello world!
but it could not change the 'H' to 'h', and it run error!.
why it could not change the value although without the key const

Quoting from the 2011 language standard (online draft
at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf):
6.4.5 String literals
...
6 In translation phase 7, a byte or code of value zero is
appended to each multibyte character sequence that results
from a string literal or literals.78) The multibyte character
sequence is then used to initialize an array of static storage
duration and length just sufï¬cient to contain the sequence. For
character string literals, the array elements have
type char, and are initialized with the individual bytes of the
multibyte character sequence. [snip remainder of paragraph]

7 It is unspeciï¬ed whether these arrays are distinct provided
their elements have the appropriate values. If the program attempts
to modify such an array, the behavior is undeï¬ned.

Basically, tring literals like "Hello world!" are stored as arrays of
char in such a way that they exist over the lifetime of the program; for
example, if you wrote something like

void foo ()
{
char *blah = "this is a test";
...
}

the string literal "this is a test" exists outside of the lifetime of
the function foo(), unlike the variable blah.

Implementations are free to place these arrays in a read-only memory
segment; attempting to modify the contents of a string literal on such
an implementation will lead to a runtime error. This behavior isn't
unversal though, and some implementations such as gcc allow you to
specify that string literals be stored in writable memory.

Since the behavior on modifying a string literal can vary, the language
definition leaves the result *undefined*; the implementation isn't required
to handle it in any particular way. In your case, you get a runtime
error.

If you change your code as follows:

#include<stdio.h>

int main(void)
{
char s[] = "Hello world!";
printf("%s\n", s);
*s = 'h';
printf("%s\n", s);

return 0;
}

then your code will behave as expected. In this case, s is an array that
contains a *copy* of the contents of the string literal. The contents
of s are modifiable by your code. The expression *s is equivalent to
s[0].
 
K

Kenny McCormack

John Bode said:
an implementation will lead to a runtime error. This behavior isn't
unversal though, and some implementations such as gcc allow you to
specify that string literals be stored in writable memory.

Actually, it doesn't do so anymore.

....
If you change your code as follows:

#include<stdio.h>

int main(void)
{
char s[] = "Hello world!";
printf("%s\n", s);
*s = 'h';
printf("%s\n", s);

return 0;
}

then your code will behave as expected. In this case, s is an array that
contains a *copy* of the contents of the string literal. The contents
of s are modifiable by your code. The expression *s is equivalent to
s[0].

Is it really a "*copy*" of the contents of the string literal?

I.e., do both actually exist - the array and the string literal?

Or, is it just an array, that is initialized as if it had been written out
as:

s[0] = 'H';
s[1] = 'e';
s[2] = 'l';
s[3] = 'l';
s[4] = 'o';
s[5] = ' ';
s[6] = 'w';
s[7] = 'o';
s[8] = 'r';
s[9] = 'l';
s[10] = 'd';
s[11] = '!';
s[12] = '\0';

I.e., the string literal doesn't actually exist...
 
K

Ken Brody

recently i learn the pointer of c. wrote a programme below [...]
5 char *s = "Hello world!";
6 printf("%s\n", s);
7 *s = 'h'; [...]
but it could not change the 'H' to 'h', and it run error!.
why it could not change the value although without the key const

Quoting from the 2011 language standard (online draft
at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf): [...]
Basically, tring literals like "Hello world!" are stored as arrays of
char in such a way that they exist over the lifetime of the program; for
example, if you wrote something like

void foo ()
{
char *blah = "this is a test";
...
}

the string literal "this is a test" exists outside of the lifetime of
the function foo(), unlike the variable blah.

Implementations are free to place these arrays in a read-only memory
[...]

Consider, too, that, as immutable constants, the compiler is free to merge
identical strings, as in:

void foo()
{
char *blah = "this is a test";
...
}

void bar()
{
char *plugh = "this is a test";
...
}

Both blah and plugh could point to the same (possibly read-only) memory
address. Imagine the fun (FSVO) if this were allowed in foo():

blah[0] = 'T';

Magically, a totally unrelated variable in a totally unrelated function is
no longer pointing to "this is a test", even on the very first call to bar().

I seem to recall some obfuscated FORTRAN code that relied on
pass-by-reference, even for constants. One subroutine would modify the
parameter passed to it, thereby causing a subsequently-called subroutine to
get passed the modified "constant". Imagine debugging a call "foo(3)" only
to discover that foo() was passed a 4.


BTW, I ran into this years ago with the construct:

char *filename = "prefix.XXXXXXXX";
....
strcpy(filename+7,suffix);

This had "worked" for years until a system came along which actually put the
string into read-only memory.
 
K

Kenny McCormack

Ken Brody said:
Consider, too, that, as immutable constants, the compiler is free to merge
identical strings, as in:

void foo()
{
char *blah = "this is a test";
...
}

void bar()
{
char *plugh = "this is a test";
...
}

Both blah and plugh could point to the same (possibly read-only) memory
address. Imagine the fun (FSVO) if this were allowed in foo():

blah[0] = 'T';

Magically, a totally unrelated variable in a totally unrelated function is
no longer pointing to "this is a test", even on the very first call to bar().

FWIW, I don't think a compiler would permit itself to "merge strings"
unless it could verify that this sort of thing can't happen at runtime.

I assume that a side effect of gcc's old (no longer available) writable
strings option would be that string merging is disabled. I can't test this
ATM because I don't have a machine setup with an old enough version of gcc.

--
One of the best lines I've heard lately:

Obama could cure cancer tomorrow, and the Republicans would be
complaining that he had ruined the pharmaceutical business.

(Heard on Stephanie Miller = but the sad thing is that there is an awful lot
of direct truth in it. We've constructed an economy in which eliminating
cancer would be a horrible disaster. There are many other such examples.)
 
K

Ken Brody

If you change your code as follows:

#include<stdio.h>

int main(void)
{
char s[] = "Hello world!";
printf("%s\n", s);
*s = 'h';
printf("%s\n", s);

return 0;
}

then your code will behave as expected. In this case, s is an array that
contains a *copy* of the contents of the string literal. The contents
of s are modifiable by your code. The expression *s is equivalent to
s[0].

Is it really a "*copy*" of the contents of the string literal?

I.e., do both actually exist - the array and the string literal?

Or, is it just an array, that is initialized as if it had been written out
as:

s[0] = 'H';
s[1] = 'e'; [...]
s[11] = '!';
s[12] = '\0';

I.e., the string literal doesn't actually exist...

It's probably up to the compiler, but a quick test with MSVC shows that it
creates the string, and does the equivalent of an inline memcpy(). A quick
test with gcc shows it actually calls memcpy().
 
G

glen herrmannsfeldt

Kenny McCormack said:
Actually, it doesn't do so anymore.

One more lost feature from the C past.

(snip)
char s[] = "Hello world!";
(snip)
then your code will behave as expected. In this case, s is an array that
contains a *copy* of the contents of the string literal. The contents
of s are modifiable by your code. The expression *s is equivalent to
s[0].
Is it really a "*copy*" of the contents of the string literal?
I.e., do both actually exist - the array and the string literal?
Or, is it just an array, that is initialized as if it had been written out
as:
s[0] = 'H';
s[1] = 'e';
s[2] = 'l';
s[3] = 'l';
s[4] = 'o';
s[5] = ' ';
s[6] = 'w';
s[7] = 'o';
s[8] = 'r';
s[9] = 'l';
s[10] = 'd';
s[11] = '!';
s[12] = '\0';
I.e., the string literal doesn't actually exist...

How would you find out? Consider the case:

printf("hi there!\n");

Is that a string literal that actually exists? How do you know?

Following the "as if" rule, if it does what C says, that is all
that is required.

There are many cases where programs print out strings that don't
exist as consecutive characters in the program. For one, they might
be stored compressed for efficiency. The only thing you need to know
is that it works "as if" there was a string literal copied into the
array.

(In the case of static arrays, most systems will read in the array
data along with the executable program. The storage format of the
object program and load module are system dependent.)

-- glen
 
J

James Kuyper

Kenny McCormack said:
char s[] = "Hello world!";
(snip)
then your code will behave as expected. In this case, s is an array that
contains a *copy* of the contents of the string literal. The contents
of s are modifiable by your code. The expression *s is equivalent to
s[0].
Is it really a "*copy*" of the contents of the string literal?
I.e., do both actually exist - the array and the string literal?
Or, is it just an array, that is initialized as if it had been written out
as:
s[0] = 'H';
s[1] = 'e';
s[2] = 'l';
s[3] = 'l';
s[4] = 'o';
s[5] = ' ';
s[6] = 'w';
s[7] = 'o';
s[8] = 'r';
s[9] = 'l';
s[10] = 'd';
s[11] = '!';
s[12] = '\0';
I.e., the string literal doesn't actually exist...

How would you find out? ...

In principle, there's no way for a strictly conforming program to tell.
... Consider the case:

printf("hi there!\n");

Is that a string literal that actually exists? How do you know?

Following the "as if" rule, if it does what C says, that is all
that is required.

Correct. However, in practice, if the named array has static storage
duration, the unnamed array is no more than a convenient fiction for
simplifying the standard's description of the required behavior. I'd
expect it to be dropped by any decent compiler.

However, if the array has automatic storage duration, and there is any
chance that the block it's defined in might be entered a second time
after the contents of the array have been modified, then the array is
supposed to be re-initialized. The easiest way to arrange this is for
the nameless array associated with the string literal become a real
thing, that can be copied from, not just a fiction. Short strings might
be stored in the code itself, rather than a separate data segment, but
the information needed to perform the re-initialization has to be stored
somewhere.
 
G

glen herrmannsfeldt

(snip, regarding)

(snip, someone wrote)
In principle, there's no way for a strictly conforming program to tell.
(snip)

Correct. However, in practice, if the named array has static storage
duration, the unnamed array is no more than a convenient fiction for
simplifying the standard's description of the required behavior. I'd
expect it to be dropped by any decent compiler.

The systems I know initialize static data by loading from disk,
along with the rest of the program, but that would be system
dependent.
However, if the array has automatic storage duration, and there is any
chance that the block it's defined in might be entered a second time
after the contents of the array have been modified, then the array is
supposed to be re-initialized. The easiest way to arrange this is for
the nameless array associated with the string literal become a real
thing, that can be copied from, not just a fiction. Short strings might
be stored in the code itself, rather than a separate data segment, but
the information needed to perform the re-initialization has to be stored
somewhere.

Yes. A convenient way is to copy from a string literal, but a given
system might find another way to do it. If the initial value had
repeats, a system might store a single copy of the repeat and copy
it multiple times.

And that could be done anywhere a string literal is used, but no
copies of a pointer to the actual stored string literal are made.

Consider:

strcpy(a,"abcabcabc");

Does the standard require that a string literal constant containing
those 10 bytes actually exist anywhere?

-- glen
 
K

Kaz Kylheku

Kenny McCormack said:
One more lost feature from the C past.

It hink that if you want this badly enough, you can probably coax it out of the
GNU toolchain with a custom linker script. You should need some way to tell
apart the string literal symbols from the other things in the text section, and
then reassign them to a data section in the output file. That's the general
gist. The devil is in the details, but if I absolutely had to have this, that
would be one of the avenues I might explore first.

A source-to-source translator can translate literals into static arrays
(say, file-scope ones with machine-generated names like s__lit0042).
Occurences of the literals are then replaced with these symbols:

if (foo == "abcd") ...

turns into

if (foo == s__lit0042) ...

with a definition of s__list0042 deposited into a section of the file scope
somewhere near the top of the output file:

char s__list0042[] = "abcd";
 
A

anish kumar

It hink that if you want this badly enough, you can probably coax it out of the

GNU toolchain with a custom linker script. You should need some way to tell

apart the string literal symbols from the other things in the text section, and

then reassign them to a data section in the output file. That's the general

gist. The devil is in the details, but if I absolutely had to have this, that
Can you explain more on this idea? I have seen in linux kernel they make use of
this feature quite a lot such as creating one more section and doing all sorts
of magic which i never understood.
would be one of the avenues I might explore first.



A source-to-source translator can translate literals into static arrays

(say, file-scope ones with machine-generated names like s__lit0042).

Occurences of the literals are then replaced with these symbols:



if (foo == "abcd") ...



turns into



if (foo == s__lit0042) ...



with a definition of s__list0042 deposited into a section of the file scope

somewhere near the top of the output file:



char s__list0042[] = "abcd";
 
I

Ian Collins

Can you explain more on this idea? I have seen in linux kernel they make use of
this feature quite a lot such as creating one more section and doing all sorts
of magic which i never understood.

Please fix your quoting to remove all of the blank lines that awful
google interface adds to your quotes!

You should try asking on a Linux or gcc group, if you can find one. A
lot of kernel code exploits non-standard compiler specific options to
improve performance (often, as you have found, at the expense of
readability!).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top