pointer and storage

J

Jack Klein

Ark said:
Richard Heathfield wrote: [...]
The behaviour resulting from modifying a constant string is
undefined. A segmentation fault is one possible result. The absence
of a segmentation fault is another possible result. And the
destruction of Rome by fire is another possible result.

It would be clearer to use the term "string literal" rather than
"constant string".
I am confused profoundly.
I always thought that where the string literals are stored (RO vs. RW)
is implementation-defined
Yes.

(and decent compilers would allow me to
choose my way with a command-line switch).

That's debatable. I don't see much advantage in allowing string
literals to be modifiable (except *maybe* to handle old and broken
code).
However, the /type/ of (a pointer to) a string literal is char *,
regardless of the switch, or so I read the standard a while ago.
Yes.

So the statement
*a='a';
must compile OK *without diagnostics* and then cause or not cause
undefined behavior depending on implementation-defined behavior.

The following:
char *a = "hello";
*a = 'a';
(assuming it appears in an appropriate context) is legal (it violates
no syntax rules or constraints), and a conforming compiler must accept
it. But, as always, a compiler is free to issue any diagnostics it
likes. The standard requires diagnostics in certain cases; it never
forbids them.

No, a conforming compiler is not required to accept it, although I
don't know of any that will not. If the compiler can determine, at
compile time, that a statement or expression producing undefined
behavior will be executed by all possible paths through the program,
it is free to do anything at all at compile time.

For example:

#include <stdlib.h>
#include <time.h>

int main(void)
{
char *a = "I'm a string literal";
srand(time(0));
if (rand() > (RAND_MAX / 2))
{
*a = 'a';
}
return 0;
}

A compiler must translate the program above.

However:

int main(void)
{
char *a = "I'm a string literal";
*a = 'a';
return 0;
}

....a compiler is not required to translate the second form.

The really interesting question is if the call to srand() is omitted
from the first example. Is a compiler allowed to "know" that its
version of rand() will return a value greater than RAND_MAX / 2 with
default initialization, equivalent to srand(1)?
If the initialization is executed, it invokes undefined behavior. The
^^^^^^^^^^^^^^
ITYM assignment.
 
J

Jack Klein

Keith said:
Ark said:
Richard Heathfield wrote: [...]
The behaviour resulting from modifying a constant string is
undefined. A segmentation fault is one possible result. The absence
of a segmentation fault is another possible result. And the
destruction of Rome by fire is another possible result.

It would be clearer to use the term "string literal" rather than
"constant string".
I am confused profoundly.
I always thought that where the string literals are stored (RO vs. RW)
is implementation-defined
Yes.

(and decent compilers would allow me to
choose my way with a command-line switch).

That's debatable. I don't see much advantage in allowing string
literals to be modifiable (except *maybe* to handle old and broken
code).
However, the /type/ of (a pointer to) a string literal is char *,
regardless of the switch, or so I read the standard a while ago.
Yes.

So the statement
*a='a';
must compile OK *without diagnostics* and then cause or not cause
undefined behavior depending on implementation-defined behavior.

The following:
char *a = "hello";
*a = 'a';
(assuming it appears in an appropriate context) is legal (it violates
no syntax rules or constraints), and a conforming compiler must accept
it. But, as always, a compiler is free to issue any diagnostics it
likes. The standard requires diagnostics in certain cases; it never
forbids them.

If the initialization is executed, it invokes undefined behavior. The
undefined behavior is unconditional, though the effects of the
undefined behavior can be literally anything. There is no
implementation-defined behavior involved (implementation-defined
behavior must be documented by the implementation, and there is no
documentation requirement here).
That's exactly where my comprehension fails me.
After
char *a = "hello";
the pointer /is/ initialized, and if, as Keith writes,
*a = 'a';
produces the UB unconditionally, it means that the initialization of the
pointer is unconditionally bad (for the type), isn't it? There must be a
reason (like "old broken code"? or something else?) why the type of
"hello" is not const char *.

The simple fact is that string literals existed in the early C
language long before the const keyword appeared. So sufficiently old
code that assigned the address of a string literal to a plain old
ordinary pointer to char is not necessarily "broken", it was the only
character pointer type available at the time.

Having the const keyword available officially now for almost 17 years
does make it easier to avoid accidental errors, if it is used
properly. Attempting to write through a "pointer to const type" is a
constraint violation requiring a diagnostic.
OK, I can drill this case down my brain, but this leaves the following
question:
What are (all) legal initializations of char *a such that assigning to
*a is UB-free?

I'm too lazy to think hard about it right now, but assigning the
address of a modifiable array and using dynamic allocation come to
mind, without getting into type punning.

char ok [] = "hello";
char *a = ok;

....results in a pointing to characters that can be modified.
 
J

Jack Klein

(e-mail address removed) posted:



This is a definition of a non-const pointer to a non-const char. It also
initialises the pointer to the address of a string literal (which is il-
advised.)




The following two programs are equivalent:

No they are not. The type of a string literal in C is "array of
char", and most specifically not "array of const char".
/* Program 1 */

int main(void)
{
char const *p = "Hello"; return 0;
}

/* Program 2 */

char const str_literal1[] = {'H','e','l','l','o',0};

#define LITERAL1 (*(char(*)[sizeof str_literal1])&str_literal1)

int main(void)
{
char const *p = LITERAL1; return 0;
}
 
K

Keith Thompson

Ark said:
Keith Thompson wrote: [...]
The following:
char *a = "hello";
*a = 'a';
(assuming it appears in an appropriate context) is legal (it violates
no syntax rules or constraints), and a conforming compiler must accept
it. But, as always, a compiler is free to issue any diagnostics it
likes. The standard requires diagnostics in certain cases; it never
forbids them.
If the initialization is executed, it invokes undefined behavior.
The
undefined behavior is unconditional, though the effects of the
undefined behavior can be literally anything. There is no
implementation-defined behavior involved (implementation-defined
behavior must be documented by the implementation, and there is no
documentation requirement here).
That's exactly where my comprehension fails me.
After
char *a = "hello";
the pointer /is/ initialized, and if, as Keith writes,
*a = 'a';
produces the UB unconditionally, it means that the initialization of
the pointer is unconditionally bad (for the type), isn't it?

No, it isn't, but it's a bad idea.

Initializing a char* object ("a" in this case) to point to the first
character of a string literal is perfectly legal. For example, you
can read the elements of the array through the pointer will work just
fine. Undefined behavior occurs only if you try to *modify* elements
of the array.
There
must be a reason (like "old broken code"? or something else?) why the
type of "hello" is not const char *.

It's to avoid breaking old code that may have been written before
"const" was introduced to the language (a *long* time ago). For example:

#include <stdio.h>

void print_string(char *s)
{
printf("print_string(\"%s\")\n", s);
}

int main(void)
{
char *message = "hello";
print_string(message);
return 0;
}

In old versions of the C language, before "const" was introduced, this
kind of thing was common. The language didn't provide a way to have
the compiler warn you if you tried to modify something that shouldn't
be modified.

Once "const" was introduced, it might have made sense to make string
literals const, but it would have broken existing code, which was
considered unacceptable. The alternative would have required all the
existing code to be modified by adding "const" qualifiers -- which
would have meant it would fail to compile under old compilers. It was
considered too high a price to pay.
OK, I can drill this case down my brain, but this leaves the following
question:
What are (all) legal initializations of char *a such that assigning to
*a is UB-free?

There are infinitely many such initializations. As long as a points
to modifiable memory, you can modify it.

Here's one example:

char str[] = "hello";
char *s = str;

The first line creates str as a non-const array. The second
initializes s to point to the first character of the array.
 
K

Keith Thompson

Jack Klein said:
in comp.lang.c: [...]
The following:
char *a = "hello";
*a = 'a';
(assuming it appears in an appropriate context) is legal (it violates
no syntax rules or constraints), and a conforming compiler must accept
it. But, as always, a compiler is free to issue any diagnostics it
likes. The standard requires diagnostics in certain cases; it never
forbids them.

No, a conforming compiler is not required to accept it, although I
don't know of any that will not. If the compiler can determine, at
compile time, that a statement or expression producing undefined
behavior will be executed by all possible paths through the program,
it is free to do anything at all at compile time.

You're right.

[snip]
^^^^^^^^^^^^^^
ITYM assignment.

Yes, thanks.
 
F

Frederick Gotham

Jack Klein posted:
No they are not. The type of a string literal in C is "array of
char", and most specifically not "array of const char".


Hence the macro which casts away the constness. (The actual underlying array
is defined as const to reflect that the behaviour is undefined to modify a
string literal.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,150
Latest member
MakersCBDReviews
Top