Crazy stuff

C

Chumbo

If I have this(please bear with me - this is a case of a former java
guy going back to C ;-)

int main () {
char *tokconvert(char*);
char str[] = "dot.delimited.str";
char *result;

result = tokconvert(str);
return 0;
}

char *tokconvert(char *strToConvert) {

char *token;
char *tokDelim =",.";

token=strtok(strToConvert, tokDelim);

while(token != NULL) {
printf("Token -> %s \n", token);
token=strtok(NULL, tokDelim);
}
}

the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?

more craziness: if I declare/initialise

char *str;
str = strdup("dot.delimited.str)

inside tokconvert (instead of main) and pass that to strtok - it runs
fine!!

any thoughts are most welcome..

chumbo.

P.S.
btw this is on FreeBSD 5.1.2 (using gcc 3.3.3)
 
J

Jason Whitehurst

Chumbo said:
the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?

char *str = "blah";

is allowed to be placed in read-only memory. The data cannot be modified.

char str[], however, must be placed in memory modifiable by you (i.e. the
stack).
 
J

Jason Whitehurst

Jason said:
is allowed to be placed in read-only memory. The data cannot be
modified.

Err, and in case you don't know, strtok(3) modifies the input string. Thus,
your problem.
 
D

Dave Vandervies

If I have this(please bear with me - this is a case of a former java
guy going back to C ;-)

int main () {
char *tokconvert(char*);

It will confuse people less if you put function prototypes at file scope
instead of inside functions. (In this particular case, you can avoid
needing the prototype altogether by putting tokconvert before main(),
but that's not always possible or reasonable.)
char str[] = "dot.delimited.str";

This allocates an array of char and populates it with the characters
in the string "dot.delimited.str" (including the terminating '\0').
The array is in the automatic-allocation space (we can call this "the
stack", but we prefer not to, because the function-invocation stack that
it exists in is not the same stack as the "processor's stack segment"
stack (which need not even exist) that people typically assume we're
talking about), and therefore that we can do pretty much whatever we
want with it (the relevant bit of that here is that we can write to it)
until the function returns.
char *result;

result = tokconvert(str);

When you pass str (an array) to a function (or do most other things with
it, notable exceptions being applying & or sizeof to it), the array name
decays to a pointer to the array's first element. In this case, this
is exactly what you want - a pointer to the first character of the string.
return 0;
}

char *tokconvert(char *strToConvert) {

char *token;
char *tokDelim =",.";

token=strtok(strToConvert, tokDelim);

while(token != NULL) {
printf("Token -> %s \n", token);
token=strtok(NULL, tokDelim);
}

You're not returning anything here. Your compiler should have warned
you about that.
(I'm not sure what you'd've wanted to return; possibly this is a leftover
from doing something with the tokens other than just printing them?)
}

the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?

When you say `char *str="a string literal"', the string literal (like
any other string literal in the program[1]) refers to an anonymous
not-const-but-not-writeable array of characters containing the string.
In English, that means you're allowed to point a pointer that you're
allowed to write through at it (remember, arrays decay to pointers),
but you're not actually allowed to write to it.

So, you have a pointer pointing at a string literal that you're not
allowed to write to... and then you try to write to it (indirectly,
by passing it to strtok, which writes to its first argument). That's
what's causing your problem. The solution is to Don't Do That, Then:
Allocate writeable space for the string you give strtok as its first
argument, either as an automatic variable (like in your code above)
or as dynamically allocated memory (f'rexample, memory from strdup as
below), and put the string into that.

more craziness: if I declare/initialise

char *str;
str = strdup("dot.delimited.str)

inside tokconvert (instead of main) and pass that to strtok - it runs
fine!!

Note that strdup is a unixism and not part of the C language.[2]

What strdup does is allocate (with malloc) enough memory to hold the
string you give it, and copy the string into that memory, and return a
pointer to the copy of the string. This memory is writeable, so giving
strdup a pointer to it isn't a problem, for the same reason that giving
it a pointer to the automatically allocated array isn't a problem.

P.S.
btw this is on FreeBSD 5.1.2 (using gcc 3.3.3)

If I'd needed to know that, then your question would have been
inappropriate for comp.lang.c and would have been better off asked in
a FreeBSD or GCC newsgroup.

But given that you're using GCC: If you'd compiled with:
gcc -W -Wall -ansi -pedantic -O myprog.c
then GCC would have warned you about failing to return a value from
tokconvert, along with a bunch of other warnings about things that you
typically don't want to do. This makes debugging some problems a lot
easier - they go away when the compiler stops warning you about them.


dave

[1] In language-lawyer-ese, the string in the array initialization
`char buf[]="a string"' is an initializer, not a string literal;
this is, as far as I know, the only place you can have a string in
the source code that doesn't represent an anonymous array of char.

[2] It's in the implementation's namespace, though, which means you're
not allowed to define it yourself. The solution in comp.lang.c is
to use my_strdup, which can portably be defined and have the same
behavior as the unix strdup; the usual solution outside comp.lang.c
is to use the library's strdup if it exists, and otherwise to apply
knowledge beyond that found in the language definition to establish
that the programmer is allowed to define a function by that name on
that implementation and do so.
 
M

Malcolm

Chumbo said:
token=strtok(strToConvert, tokDelim);


the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?
strtok()s first argument is a char *, and is overwritten to produce the
token (yes, this is a terrible way of doing things, as an ex-Java man you
probably expect something like the Java string tokeniser).
char *str = "My string"; creates a constant string in read-only memory.
char str[] = "My string"; creates a string in read-write memory.

The result of passing a constant string to strtok is undefined.
 
S

Siddharth Taneja

so I am a little confused...how does the compiler actually go about creating
read-only memory and read-write memory? Hows the distinction based?

Malcolm said:
Chumbo said:
token=strtok(strToConvert, tokDelim);


the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?
strtok()s first argument is a char *, and is overwritten to produce the
token (yes, this is a terrible way of doing things, as an ex-Java man you
probably expect something like the Java string tokeniser).
char *str = "My string"; creates a constant string in read-only memory.
char str[] = "My string"; creates a string in read-write memory.

The result of passing a constant string to strtok is undefined.
 
M

Mike Wahler

Siddharth Taneja said:
so I am a little confused...how does the compiler actually go about creating
read-only memory and read-write memory?

However it wants.
Hows the distinction based?

Depends upon the compiler. Each one does things its own
way. If you want to know how yours does it, consult
its support resources.

BTW please don't top-post. Thanks.

-Mike
 
M

Malcolm

Siddharth Taneja said:
so I am a little confused...how does the compiler actually go about creating
read-only memory and read-write memory? Hows the distinction based?
This one of the problems of C.

On a desktop system, typically the whole program will be loaded into
physical RAM. On many systems, pages may be marked as "read only" or
"executable only", so attempts to write to those pages cause runtime faults.
However on other systems, such as older microcomputers, there was no
mechanism for doing this, so a write to a constant area of memory would
change the contents and cause mysterious malfunctions.

On embedded systems, it is quite common for the executable code and the
constant data to be held in physical ROM. Obviously any attempt to write to
this will not possibly alter the contents.
 
B

Barry Schwarz

If I have this(please bear with me - this is a case of a former java
guy going back to C ;-)

int main () {
char *tokconvert(char*);
char str[] = "dot.delimited.str";
char *result;

result = tokconvert(str);
return 0;
}

char *tokconvert(char *strToConvert) {

char *token;
char *tokDelim =",.";

This is an example of where using static might an impact on
performance if the function were called often.
token=strtok(strToConvert, tokDelim);

while(token != NULL) {
printf("Token -> %s \n", token);
token=strtok(NULL, tokDelim);
}
}

How can this function compile fine? It is required to return a
pointer to char yet there is no return statement in the function at
all.
the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?

more craziness: if I declare/initialise

char *str;
str = strdup("dot.delimited.str)

This is not a standard function.
inside tokconvert (instead of main) and pass that to strtok - it runs
fine!!

Others have explained why.


<<Remove the del for email>>
 
J

Jack Klein

Chumbo said:
token=strtok(strToConvert, tokDelim);


the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?
strtok()s first argument is a char *, and is overwritten to produce the
token (yes, this is a terrible way of doing things, as an ex-Java man you
probably expect something like the Java string tokeniser).

So far, so good.
char *str = "My string"; creates a constant string in read-only memory.

The line above is completely wrong. "My string" is a string literal,
and the C standard states that it has the type array of char.
Specifically NOT array of const char. So it is not a constant string.
Furthermore, C does not define any such thing as "read-only memory".
char str[] = "My string"; creates a string in read-write memory.

The line above creates an array of chars that may be modified,
provided that its bounds are not exceeded. C does not define any such
thing as "read-write memory".
The result of passing a constant string to strtok is undefined.

That is true, but does not apply here. The result of attempting to
modify a string literal is undefined behavior. This is true not
because the type of the string literal is array of const char, but
because the C standard specifically states that it is so.
 
G

Gordon Burditt

strtok()s first argument is a char *, and is overwritten to produce the
So far, so good.


The line above is completely wrong. "My string" is a string literal,
and the C standard states that it has the type array of char.
Specifically NOT array of const char. So it is not a constant string.
Furthermore, C does not define any such thing as "read-only memory".

C *DOES* define certain memory areas, such as string literals, which
cannot be written on without invoking undefined behavior. This
is something the implementation is free to put in "read-only memory".
(Or it might not, on implementations with no memory protection.)
char str[] = "My string"; creates a string in read-write memory.

The line above creates an array of chars that may be modified,
provided that its bounds are not exceeded. C does not define any such
thing as "read-write memory".

C *DOES* define certain memory areas, such as non-const variables,
that can be written on without invoking undefined behavior.
It is reasonable to conclude that the implementation must put
this in memory that can be read and written, hence "read-write memory".

C does not define a (human) name, address, social security number,
or quantity of currency, but it is still possible to talk about
programs that handle data described like this.

Gordon L. Burditt
 
E

Endymion Ponsonby-Withermoor III

Chumbo said:
the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str") the
whole thing falls on its ass real bad (bus error) - strtok is defined
as char *strtok(char *, const char *) - what's going on?

This is because doing the declaration as
char *str = "dot.delimited.str";
embeds a CONSTANT string somewhere, and then sets the pointer to
point to it. You can not WRITE to constant strings. (Well, you shouldn't
and shouldn't be able to, but some old C implementations allowed this).

Doing it as
char str[]="dot.delimited.str";
creates a character array on the stack, of exactly the right
length, and initialises it with the given string (by something
similar to strcpy()). There are thus TWO copies of "dot.delimited.str":
one in the CONSTANT STRING space, and a second in the data space. You
can freely write the latter, but not the former.

Pointers and arrays are not interchangeable.

Richard [in PE12]
 
F

Flash Gordon

Chumbo said:
the thing compiles and runs fine, however if I declare char *str =
"dot.delimited.str" (instead of char str[] = "dot.delimited.str")
the whole thing falls on its ass real bad (bus error) - strtok is
defined as char *strtok(char *, const char *) - what's going on?

This is because doing the declaration as
char *str = "dot.delimited.str";
embeds a CONSTANT string somewhere, and then sets the pointer to
point to it. You can not WRITE to constant strings. (Well, you
shouldn't and shouldn't be able to, but some old C implementations
allowed this).

The C standard says it is not const. The reason you should not write to
it is that the C standard says that writing to a string literal invokes
undefined behaviour. This meens that the compiler is allowed to let you
write to it, and one common effect of doing this would be to also change
the string literal "blah blah dot.delimited.str"
Doing it as
char str[]="dot.delimited.str";
creates a character array on the stack,

1) C does not require a stack.
2) that line could be placed outside of any function definitions in
which case the string is unlikely to be stored on the stack if the
machine has a stack.
of exactly the right
length, and initialises it with the given string
True.

(by something
similar to strcpy()). There are thus TWO copies of
"dot.delimited.str": one in the CONSTANT STRING space, and a second in
the data space. You can freely write the latter, but not the former.

Possibly true of it is an automatic array, definitely not always strue
if it is not an automatic array.

All you know is that the object gets initialised, this could be done
with code equivalent to
str[0]='d';
str[1]='o';
...

For a non-automatic other options are available.
Pointers and arrays are not interchangeable.

True.
 
O

Old Wolf

Barry Schwarz said:
How can this function compile fine? It is required to return a
pointer to char yet there is no return statement in the function at
all.

It must compile fine. There is only undefined behaviour if
control actually reaches the end of the function at runtime.
 
J

Jack Klein

C *DOES* define certain memory areas, such as string literals, which
cannot be written on without invoking undefined behavior. This
is something the implementation is free to put in "read-only memory".
(Or it might not, on implementations with no memory protection.)

C does not define "memory areas" at all. Modifying a string literal,
or modifying any object defined with the const qualifier, is undefined
behavior. There is not even a requirement that such an operation
fails, merely that the behavior is undefined if the attempt is made.
char str[] = "My string"; creates a string in read-write memory.

The line above creates an array of chars that may be modified,
provided that its bounds are not exceeded. C does not define any such
thing as "read-write memory".

C *DOES* define certain memory areas, such as non-const variables,
that can be written on without invoking undefined behavior.
It is reasonable to conclude that the implementation must put
this in memory that can be read and written, hence "read-write memory".

C still does not define "memory areas" at all. It defines objects
that may be freely modified by a program. Almost all of them, in
fact, other than those defined const and string literals. Nowhere
does the standard mention that these must be stored in a special
"memory area". It also doesn't specify whether any particular
modifiable object is in SRAM, DRAM, or virtual memory that might be
swapped out to a page file at any particular time.
C does not define a (human) name, address, social security number,
or quantity of currency, but it is still possible to talk about
programs that handle data described like this.

It is indeed quite possible for a C program to use data objects as
representations of "real world" concepts.

But it is indeed quite impossible to force a C implementation to
provide "read-only memory" just because you define a string literal.
Gordon L. Burditt

The poster to whom I replied wrote a sentence that contained two very
specific factual errors, directly in contradiction to the C language
standard:
char *str = "My string"; creates a constant string in read-only memory.

The two errors are:

1. The type of "My string" in the snippet is 'array of char', most
specifically not 'array of constant char'. See 6.4.5 P5.

2. The standard states that an implementation may (note MAY) place
certain const objects and string literals in "a read-only region of
storage" (footnote 112). Note that footnotes are not normative, and
the term "read-only" is not specifically defined in the standard.
Most certainly, there is no requirement or guarantee that "My string"
in the snippet above will be placed in any sort of specially qualified
memory.

If you think either of my corrections is inaccurate, kindly quote
chapter and verse from the standard to contradict them.
 
E

Endymion Ponsonby-Withermoor III

Flash Gordon said:
The C standard says it is not const. The reason you should not write to
it is that the C standard says that writing to a string literal invokes
undefined behaviour.

OK. I sit corrected.
write to it, and one common effect of doing this would be to also change
the string literal "blah blah dot.delimited.str"
Yes. That is because "dot.delimited.str" can be duplicate-merged with any
quoted-string that ends with it.


Richard [in PE12]
 
M

Malcolm

Jack Klein said:
Furthermore, C does not define any such thing as "read-only memory".
C does not define any such thing as "read-write memory".

That is true, but does not apply here. The result of attempting to
modify a string literal is undefined behavior. This is true not
because the type of the string literal is array of const char, but
because the C standard specifically states that it is so.
So C defines various terms, but how does it define them? In the Standard,
which is an English-language document. So sometimes C will also define the
words used to define these terms.
See the problem?
Ultimately we have to use terms like "read-only memory", which are not
defined by the C standard, but which have meanings given to them by usage.

If the OP doesn't understand why an attempt to modify a string literal
causes a crash, then its unlikely that he will be familiar with the term
"string literal". He may also be shaky on "undefined behaviour". So a
technically more accurate explanation is in fact more confusing.
 
B

Barry Schwarz

It must compile fine. There is only undefined behaviour if
control actually reaches the end of the function at runtime.

I couldn't find in the standard where the missing return requires a
diagnostic. All the compilers I have used did provide one so I guess
I've been spoiled.


<<Remove the del for email>>
 
L

Lawrence Kirby

On 8 Nov 2004 16:13:46 -0800, (e-mail address removed) (Old Wolf) wrote:
....


I couldn't find in the standard where the missing return requires a
diagnostic. All the compilers I have used did provide one so I guess
I've been spoiled.

It doesn't. I guess this goes back to the days when C didn't have a void
type.

Also just reaching the end of the function doesn't invoke undefined
behaviour, in the words of C99 6.9.1:

"If the } that terminate a function is reached, and the value of the
function call is used by the caller, the behavior is undefined."

In C90 you can also use return; i.e. with no expression in a function with
a non-void return type. C99 makes this a constraint violation but still
allows falling off the end of the function.
 
D

Dave Thompson


<snip: modification, namely strtok'enizing, of string literal value vs
array initialized to contain string>

FAQ 1.32, 16.6 at the usual places and
http://www.eskimo.com/~scs/C-faq/top.html
char str[] = "dot.delimited.str";

This allocates an array of char and populates it with the characters
in the string "dot.delimited.str" (including the terminating '\0').
The array is in the automatic-allocation space (we can call this "the
stack", but we prefer not to, because the function-invocation stack that
it exists in is not the same stack as the "processor's stack segment"
stack (which need not even exist) that people typically assume we're
talking about), and therefore that we can do pretty much whatever we

Huh? Unless I completely misunderstand what you are saying:
while the C standard does not specify implementation techniques, and
so can't topically rely on "the stack", on every machine I know of
that has a "processor stack segment", or indeed just a "processor
(memory) stack" anywhere, it was designed to be and in fact is used
for C (and other HLL) function-invocation frames in a stack fashion.

<snip rest>

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top