Constant strings

B

BartC

As far as I can gather from my experiment below, a string constant in source
code has a 'char*' type, not 'const char*'. Why is that?

Here, I can't get a compiler to complain about passing a string constant as
a char* parameter where it is clearly going to be modified. But it doesn't
like the q=p line which does the same. The q="Bart" line shows up the issue
more simply:

char* change_initial(char* s,char c){
*s=c;
return s;
}

int main (void) {
const char *p;
char* q;

change_initial("Bart",'C');

q=p;
q="Bart";
*q='C';

}
 
G

G G

As far as I can gather from my experiment below, a string constant in source
char* change_initial(char* s,char c)
{
*s=c;
return s;
}

int main (void)
{
const char *p;
char* q;

change_initial("Bart",'C');

q=p;

q="Bart";

*q='C';
}
i'm learning but, i have a question about the code. (that's the alert, i'll be no real help, sorry)

the program calls a function,

change_initial("Bart",'C');

change_initial("Bart",'C'); is to return a char *, but the return value is not assigned to anything.

so, well, not looking at intent, i quess?, q is just asked to point to another char * that is const.
then q is ask to point to a character string, q is not defined as a constant so shouldn't it be allowed to change and point to another char *?

another question please, in practice when a const char * is declare should it not also defined.
or it really doesn't make a difference, because when it is latter defined it can then not be changed.

using my compiler is does issue a warning.

thanks everyones for taking time out to teach a bit.

g.
 
J

James Kuyper

As far as I can gather from my experiment below, a string constant in source
code has a 'char*' type, not 'const char*'. Why is that?

That's not quite correct. A string literal actually has the type "array
of char of length n", where n is the number of characters in the string
literal plus 1 for the terminating null character. However, in most
contexts, lvalue expressions of array type get automatically converted
into a pointer to the first element of the array, giving the impression
that string literals have the type "char*". The only two contexts where
that is not the case are sizeof("string"), which is equivalent to
sizeof(char[7]), and

char array[] = "array_initializer";

which would be a constraint violation if the string literal actually had
the type "char*".

The fact that the type is not "array of const char of length n", like
many other poorly designed features of C, was the result of the fact
that it was not all designed at the same time, combined with the need
for backwards compatibility. 'const' was not added to C until long after
string literals were, and giving string literals that type would have
broken an unacceptably large amount of existing code. I don't think
there's anyone who'd recommend copying this feature in a new language
that didn't require backwards compatibility with C.

Even C++, for which compatibility with C was a major design goal,
corrected this one error. There really was no choice about that: with
the correct overloaded function being automatically chosen based upon
the argument's type, C++ couldn't afford to allow string literals to
have the wrong type.
 
K

Keith Thompson

James Kuyper said:
As far as I can gather from my experiment below, a string constant in source
code has a 'char*' type, not 'const char*'. Why is that?

That's not quite correct. A string literal actually has the type "array
of char of length n", where n is the number of characters in the string
literal plus 1 for the terminating null character. However, in most
contexts, lvalue expressions of array type get automatically converted
into a pointer to the first element of the array, giving the impression
that string literals have the type "char*". The only two contexts where
that is not the case are sizeof("string"), which is equivalent to
sizeof(char[7]), and

char array[] = "array_initializer";

which would be a constraint violation if the string literal actually had
the type "char*".
[...]

There's a third context: &"hello" is a pointer value of type char(*)[6]
(pointer to array 6 of char).
 
B

BartC

The fact that the type is not "array of const char of length n", like
many other poorly designed features of C, was the result of the fact
that it was not all designed at the same time, combined with the need
for backwards compatibility. 'const' was not added to C until long after
string literals were, and giving string literals that type would have
broken an unacceptably large amount of existing code. I don't think
there's anyone who'd recommend copying this feature in a new language
that didn't require backwards compatibility with C.

OK, but I would have expected some warning at least to have been added over
the last few decades, considering the number of minor matters that compilers
do pounce on.

I've used gcc -Wall -Wpedantic -Wextra, and not a peep out of it!
(Apparently, -Wwrite-strings is needed to enable the warning; I only
discovered that by running g++ which does warn.)

(I'm not that concerned, just wondering how seriously compilers take the
issue of const qualifiers. Because if I can write:

char* q;

q="ABC";
*q='X'; /* Crashes in Windows and Linux */

with nothing at all emitted by the compiler unless I go considerably out of
my way, then the answer seems to be not very. It just seems a gaping
loop-hole in the const-qualifier system.)
 
K

Keith Thompson

BartC said:
As far as I can gather from my experiment below, a string constant in source
code has a 'char*' type, not 'const char*'. Why is that?

Here, I can't get a compiler to complain about passing a string constant as
a char* parameter where it is clearly going to be modified.

If you happen to be using gcc, the "-Wwrite-strings" option causes
string literals to be treated as if they were const. This causes gcc to
be non-conforming if you use it along with "-pedantic-errors", but it's
useful for detecting certain potential errors.

Why is it non-conforming? Because this program:

#include <stdio.h>

void print(char *s) { /* "const char *s" would be better */
puts(s);
}

int main(void) {
print("hello");
}

is perfectly legal because it doesn't actually attempt to modify the
string literal, but it would be illegal if string literals were const
(as they are in C++).
 
K

Kaz Kylheku

As far as I can gather from my experiment below, a string constant in source
code has a 'char*' type

I conducted an experiment today that shows you can stick two expressions
together with a comma, and the value that comes out is evidently the right one.

With a bit more time, I will have this whole mysterious C thing
reverse-engineered, and then I will document it for everyone.
 
B

BartC

G G said:
the program calls a function,

change_initial("Bart",'C');

change_initial("Bart",'C'); is to return a char *, but the return value
is not assigned to anything.

It did do. But the I code I posted was simplified to remove any
processing/display of the result (because it was tested with a writeable
string before passing the const string, when it didn't return anyway because
it had crashed.)

The return value of char* allows it to be used like this:

printf("New string = %s\n",change_initial("bart etc",'C'));

Not using it for some calls doesn't matter (not using it ever, then perhaps
the return value could be eliminated).
 
K

Keith Thompson

BartC said:
OK, but I would have expected some warning at least to have been added over
the last few decades, considering the number of minor matters that compilers
do pounce on.

The standard does not specify warnings. It requires *diagnostics* in
many cases, but in all those cases the diagnostics are permitted to be
fatal error messages.

Individual compilers may issue whatever additional warnings they like.
The gcc documentation explains the rationale for not warning about
non-const pointers to string literals by default:

These warnings will help you find at compile time code that can try
to write into a string constant, but only if you have been very
careful about using `const' in declarations and prototypes.
Otherwise, it will just be a nuisance. This is why we did not make
`-Wall' request these warnings.

(Personally, I think programmers *should* be very careful about using
"const", but that doesn't mean a compiler should enforce it by default.)
I've used gcc -Wall -Wpedantic -Wextra, and not a peep out of it!
(Apparently, -Wwrite-strings is needed to enable the warning; I only
discovered that by running g++ which does warn.)

(I'm not that concerned, just wondering how seriously compilers take the
issue of const qualifiers. Because if I can write:

char* q;

q="ABC";
*q='X'; /* Crashes in Windows and Linux */

with nothing at all emitted by the compiler unless I go considerably out of
my way, then the answer seems to be not very. It just seems a gaping
loop-hole in the const-qualifier system.)

Yes, it's a gaping loophole in the const-qualifier system.
It's unfortunate that it wasn't practical to close it when "const"
was added to the language by the 1989 ANSI C standard, but we're
stuck with it.

The strchr() and memchr() functions can also be used to violate
const-correctness, since they can quietly return a non-const pointer
into a const array. That could have been fixed by splitting both
functions into a const version and a non-const version. (C++
uses overloading to do that.)

I'm not aware of any other such loopholes, but I could be missing
something.

There's not much point in arguing that this is a flaw in the
language; I think everyone here agrees with you. We're not defending
the rule, we're merely explaining why it (unfortunately) exists.
 
B

BartC

Kaz Kylheku said:
I conducted an experiment today that shows you can stick two expressions
together with a comma, and the value that comes out is evidently the right
one.

With a bit more time, I will have this whole mysterious C thing
reverse-engineered, and then I will document it for everyone.

If you want me to stop posting in this group, just say the word.

I had been in a slow process of migrating away from using C to implement my
projects, but thanks to your piss-taking, that is now a priority.
 
K

Keith Thompson

BartC said:
If you want me to stop posting in this group, just say the word.

I had been in a slow process of migrating away from using C to implement my
projects, but thanks to your piss-taking, that is now a priority.

If one particular person is annoying you, a killfile might be a better
solution than leaving the group. (Kaz happens to be in mine.)
 
K

Keith Thompson

BartC said:
The return value of char* allows it to be used like this:

printf("New string = %s\n",change_initial("bart etc",'C'));

Not using it for some calls doesn't matter (not using it ever, then perhaps
the return value could be eliminated).

And the int value returned by printf is discarded.

Ignoring the value returned by a function is quite common. Many
functions return a value that's likely to be ignored more often than not
(memset, for example).
 
K

Kaz Kylheku

If you want me to stop posting in this group, just say the word.

To hell with the newsgroup. Look around, it's mostly insipid twits who
can't program their way out of a paper bag.

What I'd like to see you do is (I mean, for pete's sake!): stop reverse
engineering things *that are documented*, and even done the same way by
multiple implementations.

Yes, we need experimentation desperately in our daily work: to make progress in
uncharted regions, like uncovering the root causes of bugs. No piece of
documentation will tell me why this USB driver I'm trying to fix is locking up
the device.

But we don't need to experiment to find out the type of a literal constant;
that's a waste of time.

Moreover, when you experiment, you're only discovering facts about one
dialect, and sometimes not even that.

Experiment shows that gcc will accept ({int x = 3; x}) as an expression,
which returns 3. Yet that's not in standard C, which is useful to know.

Experimenting could also convince you that i = i++ has a stable behavior.

String literals being char * could just be a bug in your compiler,
for all you know, or a feature of the default dialect.
I had been in a slow process of migrating away from using C to implement my
projects, but thanks to your piss-taking, that is now a priority.

Why. I am not C.

Kiki has me in his killfile because I told him to go **** himself.

And look, he still uses C!

Sheesh ...
 
G

G G

It did do. But the I code I posted was simplified to remove any
processing/display of the result (because it was tested with a writeable
string before passing the const string, when it didn't return anyway because
it had crashed.)
The return value of char* allows it to be used like this:
printf("New string = %s\n",change_initial("bart etc",'C'));
Not using it for some calls doesn't matter (not using it ever, then perhaps
the return value could be eliminated).

ok. thanks Bartc.
 
I

Ian Collins

BartC said:
OK, but I would have expected some warning at least to have been added over
the last few decades, considering the number of minor matters that compilers
do pounce on.

I've used gcc -Wall -Wpedantic -Wextra, and not a peep out of it!
(Apparently, -Wwrite-strings is needed to enable the warning; I only
discovered that by running g++ which does warn.)

(I'm not that concerned, just wondering how seriously compilers take the
issue of const qualifiers. Because if I can write:

char* q;

q="ABC";
*q='X'; /* Crashes in Windows and Linux */

with nothing at all emitted by the compiler unless I go considerably out of
my way, then the answer seems to be not very. It just seems a gaping
loop-hole in the const-qualifier system.)

Maybe you should start compiling your C with a C++ compiler? The const
rules in C++ are much closer to what you are expecting.

cat /tmp/x.c

int main()
{
char* q="ABC";
}

g++ /tmp/x.c
/tmp/x.c: In function ‘int main()’:
/tmp/x.c:3:11: warning: deprecated conversion from string constant to
‘char*’
 
M

Malcolm McLean

Yes, we need experimentation desperately in our daily work: to make progress
in uncharted regions, like uncovering the root causes of bugs. No piece of
documentation will tell me why this USB driver I'm trying to fix is locking up
the device.

But we don't need to experiment to find out the type of a literal constant
that's a waste of time.
Awful documentation is a fact of life in programming. Yesterday I was trying
to warp the pointer on a Apple computer for example. They have lots of
co-ordinate systems going, plus two types of 2D point, and what should be a
simple process is in fact very involved. You need to warp the pointer
experimentally to see where it goes.

It is a waste of time. There's endless timewasting involved in learning the
intricacies of usually proprietary systems, most of which do largely the
same thing, just with slightly different syntax, identifiers, and so on.
 
B

BartC

Ian Collins said:
BartC wrote:

Maybe you should start compiling your C with a C++ compiler? The const
rules in C++ are much closer to what you are expecting.

This particular issue isn't bothering me at the minute. I was just intrigued
at the more rigorous enforcement of const types on one hand, compared with
the lax approach used with string constants. It's never really come up
before because I refuse to use 'const' anywhere.

(And I have tried using g++ to compile my code (with a view to simplifying
using libraries only having a C++ interface), but it's even more of a
nightmare getting it to compile my C code, most of which is auto-generated
in various ways.)
 
K

Kaz Kylheku

This particular issue isn't bothering me at the minute. I was just intrigued
at the more rigorous enforcement of const types on one hand, compared with
the lax approach used with string constants. It's never really come up
before because I refuse to use 'const' anywhere.

Although C++ has "const char *" string literals, that was a relatively late
decision in in the history of C++.

C++ has the function overloading tools to make this nicer.

In C, a the safety benefits go out the window as soon as you use a function
like:

char *strchr(const char *str, int c);

the function returns a pointer with the const qualifier stripped.

In C++, the #include <cstring> compatibility library provides overloads:

const char *strchr(const char *str, int c);
char *strchr(char *str, int c);

A const char * argument selects the first overload; a char * argument
selects the second overload.

You can benefit from these overloads if you write in "Clean C": the hybrid
dialect which compiles as C or C++. Just put this somewhere:

#ifdef __cplusplus
#include <cstring>
#else
#include <string.h>
#endif

Suddenly your strchr calls and whatnot are more type safe, with situations
like:

char *t = strchr("abc", x);

being nicely diagnosed your C++ compiler. (Here, the const char * returning
overload is selected because of "abc", and so the initialization of t strips
qualifiers, which requires a diagnostic.)
(And I have tried using g++ to compile my code (with a view to simplifying
using libraries only having a C++ interface), but it's even more of a
nightmare getting it to compile my C code, most of which is auto-generated
in various ways.)

That wouldn't be a problem if your auto-generator spits out "Clean C".

"Clean C" isn't formalized anywhere: if you know C and C++ well (or at
least the C-like subset of C++ well), you can write in it.

The name "Clean C" was coined in _C: A Reference Manual_ by Harbison and
Steele.
 
M

Malcolm McLean

Although C++ has "const char *" string literals, that was a relatively late
decision in in the history of C++.

In C, a the safety benefits go out the window as soon as you use a function
like:

char *strchr(const char *str, int c);
And you very rarely want to call strchr with an explicit string literal, but
you quite often want to parse a string literal passed from above.
The poor rules for string literals are seldom much of a practical problem,
because if in a real program a function modifies a string passed to it, the
caller is going to want to examine the result. So he can't pass a string
literal.
const rules mean that you have to have two versions of strchr, which isn't
too bad. But they also mean that anyone writing a strchr-like function to
parse a string needs to write two versions. That's unacceptable.
 
I

Ian Collins

Malcolm said:
And you very rarely want to call strchr with an explicit string literal, but
you quite often want to parse a string literal passed from above.
The poor rules for string literals are seldom much of a practical problem,
because if in a real program a function modifies a string passed to it, the
caller is going to want to examine the result. So he can't pass a string
literal.
const rules mean that you have to have two versions of strchr, which isn't
too bad. But they also mean that anyone writing a strchr-like function to
parse a string needs to write two versions. That's unacceptable.

No, they don't.

They would only need two versions if one were to modify the input. If
that were the case, two functions with different names would be in order.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top