... "the [contents of a] string [literal] may be stored in read-only
memory" ... [so] where is the area?, who decide the area?
I mean does the compiler tell to someone (OS?) like
"please let this program use this meory area as to
be const"? or does OS decide like "Oh, you, the
program, I'll keep the string into the safety area
so that you can't modify later"?
The answer is (not surprisingly, if you think about it) implementation
dependent.
What happens if there is no operating system at all? In this
case, the compiler is the *only* entity involved, so it must be
the compiler that decides.
On the other hand, suppose there is a strict operating system,
in which programs -- including compilers -- must beg and plead,
as it were, for every resource? In this case, *only* the OS
can create read-only regions containing "precooked" data (such
as the characters in the string). The compiler can ask, but
the OS decides.
One thing is clear enough, though: the compiler has to at least
ask, in some fashion or another. Suppose the OS (assuming one
exists) is simply presented with "here is a bunch of data", e.g.,
the contents of both of these arrays:
char modifiable[] = "hello";
const char unmodifiable[] = "world";
so that the OS sees an undifferentiated sequence of data:
hello\0world\0
How will this OS determine which of these is supposed to be read-only?
In other word, does the compiler knows where the
"read-only memory", or only OS knows where it is?
(is only OS able to decide where it is)?
Again, this is implementation-dependent.
And where is the area actually? Is is this area so
called "heap"?
The term "heap" is used for (at least) two incompatible purposes:
a data structure (see, e.g., <
http://c2.com/cgi/wiki?HeapDataStructure>),
and what the C99 standard refers to as "allocated storage" -- memory
managed via malloc() and free(). (The C++ standard has a different,
and I think better, term for the latter.)
There are at least three (or more, depending on how you count)
different ways that C strings are commonly implemented, depending
on OS (if any) and compiler and object-file format. None of them
are called "heap", at least, not unless you want to confuse other
people
.
One method is to have, in the object file (".o" or ".OBJ", in many
cases) format, a section or region-type-marker called a "read-only
data area" or "read-only data segment" or something along those
lines. All read-only data is marked this way, including the contents
of string literals that are not used to initialize read/write data.
(A short name for this is "rodata" or "the rodata section".)
Another method is to have a special "strings" section. String
literals are placed in a strings section, and identical string
literals in separate files can then be coalesced. (If string
literal contents are in ordinary rodata sections, it becomes more
difficult to merge them across separate object files -- "translation
units", in C-Standard-ese. In particular, by having a separate
"strings" section, there is no longer any need to mark particular
objects as "must be unique". [C requires that &a != &b, even if
a and b are both const char arrays containing the same text.])
A third method is to put strings into the "text" (read-only,
code-only) section, and rely on the fact that code happens to be
readable as data on the system in question.
A fourth method is simply to allow string literals to be write-able.
In some cases, the object file format might allow for separate
read-only data and/or string sections, but the executable file
format might not. In this case, a compiler could move the rodata
back into either the text or the data (as desired).
Similarly, for OS-less systems, the final executable may be loaded
into some kind of ROM (PROM, EEPROM, flash memory, etc.). Typically
*all* text *and* data segments must be stored in some sort of
nonvolatile memory, with initialized-data copied to RAM by some
startup code. Here rodata can be left in the ROM, rather than
copied to (possibly precious) RAM (although RAM has gotten awfully
cheap -- the days of shaving a few bucks off the price of a TRS-80
by leaving out one 21L02 chip are long gone...).
Note that if string literals and other rodata are in a ROM, and
the OS-less or tiny-OS system is run on a device without memory
protection, attempts to overwrite this data simply fail silently:
char *p = "hello";
strcpy(p, "world"); /* ERROR */
printf("result: %s\n", p);
prints "result: hello", because each attempt to overwrite the
contents of the ROM was completely ignored in hardware.
All that the C standard says is that attempts to overwrite string
literals produce undefined behavior. Actual behavior varies, but
tends to be one of these three: "segmentation fault - core dumped"
(or local equivalent), "attempt ignored", or "literal overwritten".