why segmentation fault when copying a character?

C

Cocy

Hi,
This might be a sort of FAQ, but I don't see why,
so I would someone help me to understand what's wrong?

I've just created following code which wold trim
white space(s) in a (given) string.
But, it resulted the Segmentation fault, and so as
when running in gdb (saying "Program received signal
SIGSEGV, Segmentaion fault at *p++ = *st++").
The platform is Linux kernel 2.4.27, gcc version
2.95.4 20011002.

/*-------------------------------------*/
#include <stdio.h>
#include <ctype.h>

int main(int argc, char ** argv)
{
char *st = "Hey, how are you?";
char *p, *s;

p = s = st;

while( *st )
{
if ( isspace( (int)*st ) )
st++;
else
*p++ = *st++;
}
*p = '/0';

printf("whitespace trimed : %s\n", s);
}

Compile itself doesn't complain anything, and I don't
see what's wrong with copying the character in the
string to the other place (in memory) where should not
overlap the end of the string ('/0') of the original
string.
Colud someone help me to understand what's wrong with
this code?

Thanks and Best Regards,
Cocy
 
C

Christopher Benson-Manica

Cocy said:
char *st = "Hey, how are you?";
char *p, *s;
p = s = st;

p, s, and st all point at the same string literal. You may not
attempt to modify a string literal; this is most likely the source of
your troubles.

<ot>You can pass an option to gcc to make the code work as is.</ot>
 
M

Mark A. Odell

(e-mail address removed) (Cocy) wrote in

Hi,
This might be a sort of FAQ, but I don't see why,
so I would someone help me to understand what's wrong?

I've just created following code which wold trim
white space(s) in a (given) string.
But, it resulted the Segmentation fault, and so as
when running in gdb (saying "Program received signal
SIGSEGV, Segmentaion fault at *p++ = *st++").
The platform is Linux kernel 2.4.27, gcc version
2.95.4 20011002.

/*-------------------------------------*/
#include <stdio.h>
#include <ctype.h>

int main(int argc, char ** argv)
{
char *st = "Hey, how are you?";

If you defined this as it really is:

const char *st = "Hey, I cannot be modified!";

the reason should jump out at you (hint: note the 'const' keyword).
 
C

Cocy

Dear Christopher, Mark, and Brian.

Thank you very much for you guys comment, I really appreciate.
I got it, and I have a lot to learn.

Thank you again!
Cocy
 
R

Richard Bos

Christopher Benson-Manica said:
Cocy <[email protected]> spoke thus:

p, s, and st all point at the same string literal.

<ot>You can pass an option to gcc to make the code work as is.</ot>

But this is a very bad idea, because it doesn't get rid of the bug.

Richard
 
J

Jack Klein

(e-mail address removed) (Cocy) wrote in



If you defined this as it really is:

const char *st = "Hey, I cannot be modified!";

Bad example. In C, a pointer to string literal has the type 'pointer
to char', and specifically does not have the type 'pointer to const
char'. String literals may or may not be const. Attempting to modify
a string literal in C produces undefined behavior because the standard
specifically says so, not because the type of the string literal is
'array of const char'.
the reason should jump out at you (hint: note the 'const' keyword).

Nope, just plain wrong. There is absolutely no const keyword implied
in any way for string literals in C. You should not modify them
because the standard says you should not, not because they are const
qualified.
 
C

CBFalconer

Jack said:
.... snip ...


Nope, just plain wrong. There is absolutely no const keyword
implied in any way for string literals in C. You should not modify
them because the standard says you should not, not because they are
const qualified.

However you are well advised to define those strings as const
anyhow, because then you will probably get an alert from the
compiler if you misuse them. gcc has a -Wconstant-strings
available to do it automatically.
 
C

Cocy

Dear folks,

Thank you for you guys comment.

Ok, I understood that I can't safely modify the
strings after I've initialized it because it's
"const".
The page
http://www.eskimo.com/~scs/C-faq/q1.32.html
says like "the string may be stored in read-only
memory", and I suppose this explains the string
may be stored in "the memory area which is
assigned to be read-only", by the expression
"read-only memory".

If it is correct (I don't care whether it's correct
or not), where is the area?, who decide the area?
I mean does the compiler tell to someone (OS?) like
"please let this program use this meory area as to
be const"? or does OS decide like "Oh, you, the
program, I'll keep the string into the safety area
so that you can't modify later"?
In other word, does the compiler knows where the
"read-only memory", or only OS knows where it is?
(is only OS able to decide where it is)?

And where is the area actually? Is is this area so
called "heap"?
I guess I have to learn a lot to fully understand
my question. Is there any source to learn about
these things? (please don't say "take the class in
school" :) Does the assembly code tell me those
stuffs?

Thanks and Best Regards,
Cocy
 
J

Jonathan Adams

Dear folks,

Thank you for you guys comment.

Ok, I understood that I can't safely modify the
strings after I've initialized it because it's
"const".
The page
http://www.eskimo.com/~scs/C-faq/q1.32.html
says like "the string may be stored in read-only
memory", and I suppose this explains the string
may be stored in "the memory area which is
assigned to be read-only", by the expression
"read-only memory".

If it is correct (I don't care whether it's correct
or not), where is the area?, who decide the area?
I mean does the compiler tell to someone (OS?) like
"please let this program use this meory area as to
be const"? or does OS decide like "Oh, you, the
program, I'll keep the string into the safety area
so that you can't modify later"?
In other word, does the compiler knows where the
"read-only memory", or only OS knows where it is?
(is only OS able to decide where it is)?

Generally, the compiler and linker have almost complete control over
this, and merely leave instructions for the OS and dynamic linker to
follow.
And where is the area actually? Is is this area so
called "heap"?

On UNIX, it's typically the "read-only data segment", or ".rodata".
During linking, this will usually be merged (along with ".init",
".fini", and any number of other read-only segments) with "text"
segment into a "program header" covering all of the read-only loadable
segments. You can view this by using (on linux):

% objdump -x /path/to/binary

or, (on Solaris):

% elfdump /path/to/binary

Look for "Program Header" and "Section", and start lining things up.
I guess I have to learn a lot to fully understand
my question. Is there any source to learn about
these things? (please don't say "take the class in
school" :) Does the assembly code tell me those
stuffs?

It's more the linker -- _Linkers and Loaders_ by John R. Levine (ISBN
1558604960) is the best book I've run across for these sorts of details.
You could also google for documentation on ELF (Executable Linking
Format), which is the UNIX standard format.

Cheers,
- jonathan
 
C

Chris Torek

... "the [contents of a] string [literal] may be stored in read-only
memory" ... [so] where is the area?, who decide the area?
I mean does the compiler tell to someone (OS?) like
"please let this program use this meory area as to
be const"? or does OS decide like "Oh, you, the
program, I'll keep the string into the safety area
so that you can't modify later"?

The answer is (not surprisingly, if you think about it) implementation
dependent.

What happens if there is no operating system at all? In this
case, the compiler is the *only* entity involved, so it must be
the compiler that decides.

On the other hand, suppose there is a strict operating system,
in which programs -- including compilers -- must beg and plead,
as it were, for every resource? In this case, *only* the OS
can create read-only regions containing "precooked" data (such
as the characters in the string). The compiler can ask, but
the OS decides.

One thing is clear enough, though: the compiler has to at least
ask, in some fashion or another. Suppose the OS (assuming one
exists) is simply presented with "here is a bunch of data", e.g.,
the contents of both of these arrays:

char modifiable[] = "hello";
const char unmodifiable[] = "world";

so that the OS sees an undifferentiated sequence of data:

hello\0world\0

How will this OS determine which of these is supposed to be read-only?
In other word, does the compiler knows where the
"read-only memory", or only OS knows where it is?
(is only OS able to decide where it is)?

Again, this is implementation-dependent.
And where is the area actually? Is is this area so
called "heap"?

The term "heap" is used for (at least) two incompatible purposes:
a data structure (see, e.g., <http://c2.com/cgi/wiki?HeapDataStructure>),
and what the C99 standard refers to as "allocated storage" -- memory
managed via malloc() and free(). (The C++ standard has a different,
and I think better, term for the latter.)

There are at least three (or more, depending on how you count)
different ways that C strings are commonly implemented, depending
on OS (if any) and compiler and object-file format. None of them
are called "heap", at least, not unless you want to confuse other
people :) .

One method is to have, in the object file (".o" or ".OBJ", in many
cases) format, a section or region-type-marker called a "read-only
data area" or "read-only data segment" or something along those
lines. All read-only data is marked this way, including the contents
of string literals that are not used to initialize read/write data.
(A short name for this is "rodata" or "the rodata section".)

Another method is to have a special "strings" section. String
literals are placed in a strings section, and identical string
literals in separate files can then be coalesced. (If string
literal contents are in ordinary rodata sections, it becomes more
difficult to merge them across separate object files -- "translation
units", in C-Standard-ese. In particular, by having a separate
"strings" section, there is no longer any need to mark particular
objects as "must be unique". [C requires that &a != &b, even if
a and b are both const char arrays containing the same text.])

A third method is to put strings into the "text" (read-only,
code-only) section, and rely on the fact that code happens to be
readable as data on the system in question.

A fourth method is simply to allow string literals to be write-able.

In some cases, the object file format might allow for separate
read-only data and/or string sections, but the executable file
format might not. In this case, a compiler could move the rodata
back into either the text or the data (as desired).

Similarly, for OS-less systems, the final executable may be loaded
into some kind of ROM (PROM, EEPROM, flash memory, etc.). Typically
*all* text *and* data segments must be stored in some sort of
nonvolatile memory, with initialized-data copied to RAM by some
startup code. Here rodata can be left in the ROM, rather than
copied to (possibly precious) RAM (although RAM has gotten awfully
cheap -- the days of shaving a few bucks off the price of a TRS-80
by leaving out one 21L02 chip are long gone...).

Note that if string literals and other rodata are in a ROM, and
the OS-less or tiny-OS system is run on a device without memory
protection, attempts to overwrite this data simply fail silently:

char *p = "hello";
strcpy(p, "world"); /* ERROR */
printf("result: %s\n", p);

prints "result: hello", because each attempt to overwrite the
contents of the ROM was completely ignored in hardware.

All that the C standard says is that attempts to overwrite string
literals produce undefined behavior. Actual behavior varies, but
tends to be one of these three: "segmentation fault - core dumped"
(or local equivalent), "attempt ignored", or "literal overwritten".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top