Merging of string literals guaranteed by C std?

J

Johannes Bauer

Hi group,

I have a question about string literals and the address that they point
to. Does the standard *guarantee* that two identical string literals
actually point to the same address. I.e. can we safely assert:

assert("foo" == "foo");

Or can it maybe only be asserted if the literal occurs in one
compilation unit (i.e. not across compilation units)?

My gut feeling tells me that I cannot rely on the addresses being
identical, but I cannot find it in N1124. It would make things much
easier/cooler if the standard would assert that in my situation, but I
don't want to rely on compiler behavior alone (gcc merges the string
literals into one address even with -O0).

Best regards,
Johannes


--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
N

Noob

Johannes said:
I have a question about string literals and the address that they point
to. Does the standard *guarantee* that two identical string literals
actually point to the same address. I.e. can we safely assert:

assert("foo" == "foo");

No, this cannot be asserted. AFAIU, it is a QoI issue.
A "dumb" implementation is allowed to store every string
literal in a separate location.

C89 states: (3.1.4 String literals)
"Identical string literals of either form [wide or regular]
need not be distinct."

"need not be distinct" thus they may be distinct.

Regards.
 
J

James Kuyper

Hi group,

I have a question about string literals and the address that they point
to. Does the standard *guarantee* that two identical string literals
actually point to the same address. I.e. can we safely assert:

assert("foo" == "foo");

No, the standard neither mandates nor forbids that. Note: the same is
true of

"watergate" + 5 == "gate"
 
E

Eric Sosman

Hi group,

I have a question about string literals and the address that they point
to. Does the standard *guarantee* that two identical string literals
actually point to the same address. I.e. can we safely assert:

assert("foo" == "foo");
No.

Or can it maybe only be asserted if the literal occurs in one
compilation unit (i.e. not across compilation units)?
No.

My gut feeling tells me that I cannot rely on the addresses being
identical, but I cannot find it in N1124. It would make things much
easier/cooler if the standard would assert that in my situation, but I
don't want to rely on compiler behavior alone (gcc merges the string
literals into one address even with -O0).

Your gut is right: The two "foo" may resolve to a single
nameless array, or to two. One or both or neither of them
may also share storage with the tail end of "barfoo". It's
the compiler's choice, and I don't even think the compiler is
required to document it (except in the sense that you can
compare the pointers at run time).

Some compilers have a mode in which each appearance of a
literal is guaranteed *not* to overlap others, usually to allow
the program to change the contents of the literal's nameless
array. In old gcc versions the "-fwriteable-strings" flag did
this; I think the option has been discontinued.
 
J

Johannes Bauer


Thank you and the other two posters for your clarification. Going to
think of something else then :)

Best regards,
Joe

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
J

Johannes Bauer

/*
** What are you trying to do?
*/
char *foo = "foo";

assert(foo == foo);

I'm writing a large application with a debugging facility. I want to
enable or disable certain (debugging) outputs at *compile* time (since
some of them are in inner loops), so that if they're disabled there's no
residue in the code anywhere that there even was a output.

Moreover, I'd like to avoid defines in these loops (they really hinder
the ability to read the code IMO). So the usual approach to something
like this (and which would fulfill almost all requirements):

#define FACILITY_FOO (1 << 0)
#define FACILITY_BAR (1 << 1)
#define FACILITY_KOO (1 << 2)
....

#define ENABLED_FACILITIES (FACILITY_FOO | FACILITY_KOO)

and then in the code

#define debug(fcl, msg, ...) if (fcl & ENABLED_FACILITIES) dump(msg);

This is then resolved by the compiler and optimized out completely (i.e.
FACILITY_BAR & (FACILITY_FOO | FACILITY_KOO) == 0).

Now the problem is: I have very fine granularity of "facilities". More
than 32 to be sure (hundreds to be exact). I'd like to have a solution
with an arbitrary amount of facilities.

Therefore I was thinking of some check like

#define FACILITY_FOO "foo"
#define FACILITY_BAR "bar"
#define FACILITY_KOO "koo"

and a debug implementation like this

#define debug(fcl, msg, ...)
if ((fcl == FACILITY_FOO) || (fcl == FACILITY_BAR)) dump(msg);

Seems like this is not the way to go, though. If there was something
like "constexpr" in C, this could easily be done. Now I'm a bit puzzled
but will figure something out (and if nothing else works, I'll have
Python generate some C code which does the right switching on/off of
debugging instructions).

Best regards,
Joe


--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
E

Eric Sosman

I'm writing a large application with a debugging facility. I want to
enable or disable certain (debugging) outputs at *compile* time (since
some of them are in inner loops), so that if they're disabled there's no
residue in the code anywhere that there even was a output.

Moreover, I'd like to avoid defines in these loops (they really hinder
the ability to read the code IMO). So the usual approach to something
like this (and which would fulfill almost all requirements):

#define FACILITY_FOO (1<< 0)
#define FACILITY_BAR (1<< 1)
#define FACILITY_KOO (1<< 2)
...

#define ENABLED_FACILITIES (FACILITY_FOO | FACILITY_KOO)

and then in the code

#define debug(fcl, msg, ...) if (fcl& ENABLED_FACILITIES) dump(msg);

This is then resolved by the compiler and optimized out completely (i.e.
FACILITY_BAR& (FACILITY_FOO | FACILITY_KOO) == 0).

Now the problem is: I have very fine granularity of "facilities". More
than 32 to be sure (hundreds to be exact). I'd like to have a solution
with an arbitrary amount of facilities.

Therefore I was thinking of some check like

#define FACILITY_FOO "foo"
#define FACILITY_BAR "bar"
#define FACILITY_KOO "koo"

and a debug implementation like this

#define debug(fcl, msg, ...)
if ((fcl == FACILITY_FOO) || (fcl == FACILITY_BAR)) dump(msg);

Seems like this is not the way to go, though. If there was something
like "constexpr" in C, this could easily be done. Now I'm a bit puzzled
but will figure something out (and if nothing else works, I'll have
Python generate some C code which does the right switching on/off of
debugging instructions).

Why not use numeric constants instead of strings?

#define FACILITY_FOO 1
#define FACILITY_BAR 2
#define FACILITY_KOO 42
// ... or use enum constants

#define debug(fcl, msg) \
if ((fcl) == FACILITY_FOO || (fcl) == FACILITY_BAR) \
dump(msg)
// see also "the do-while hack" for a better

Alternatively,

#define FACILITY_FOO 1 // enable FOO debugging
#define FACILITY_BAR 0 // suppress BAR debugging
#define FACILITY_KOO 1 // enable KOO debugging

#define debug(fcl, msg) if (fcl) dump(msg)

.... leading to a much briefer macro that you needn't change when
changing the state of "hundreds" of facilities.
 
J

Johannes Bauer

Why not use numeric constants instead of strings?

#define FACILITY_FOO 1
#define FACILITY_BAR 2
#define FACILITY_KOO 42
// ... or use enum constants

#define debug(fcl, msg) \
if ((fcl) == FACILITY_FOO || (fcl) == FACILITY_BAR) \
dump(msg)
// see also "the do-while hack" for a better

You mean do { } while(0)? That's in the original definition, I just
posted the shortcut from memory :)
Alternatively,

#define FACILITY_FOO 1 // enable FOO debugging
#define FACILITY_BAR 0 // suppress BAR debugging
#define FACILITY_KOO 1 // enable KOO debugging

#define debug(fcl, msg) if (fcl) dump(msg)

... leading to a much briefer macro that you needn't change when
changing the state of "hundreds" of facilities.

Yes, I think I'll take that approach, which is much more sensible. The
reason I tried to use strings is beacuse the facility (unlike in the
abbreviated example) is also passed to the debugging command for proper
redirection of logging (i.e. separate things in separate files). For
display, having the name is nice.

I tried to kill two birds with one stone: essentially making the
facility's name to it's variables value.

Thanks for the hints!
Best regards,
Joe

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
E

Eric Sosman

[...]
Yes, I think I'll take that approach, which is much more sensible. The
reason I tried to use strings is beacuse the facility (unlike in the
abbreviated example) is also passed to the debugging command for proper
redirection of logging (i.e. separate things in separate files). For
display, having the name is nice.

Stringize the FACILITY_xxx piece in the debug() macro, and
pass it to the message-writing function, along with __FILE__ and
__LINE__ and whatever else suits your fancy.
 
J

Johannes Bauer

[...]
Yes, I think I'll take that approach, which is much more sensible. The
reason I tried to use strings is beacuse the facility (unlike in the
abbreviated example) is also passed to the debugging command for proper
redirection of logging (i.e. separate things in separate files). For
display, having the name is nice.

Stringize the FACILITY_xxx piece in the debug() macro, and
pass it to the message-writing function, along with __FILE__ and
__LINE__ and whatever else suits your fancy.

Ah, stringifying that piece is a smart idea. Thanks for the pointer!

Best regards,
Joe

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
B

BGB

You mean do { } while(0)? That's in the original definition, I just
posted the shortcut from memory :)


Yes, I think I'll take that approach, which is much more sensible. The
reason I tried to use strings is beacuse the facility (unlike in the
abbreviated example) is also passed to the debugging command for proper
redirection of logging (i.e. separate things in separate files). For
display, having the name is nice.

I tried to kill two birds with one stone: essentially making the
facility's name to it's variables value.


this can be done, actually, just it requires a little more work, namely
"interning" the strings (basically, using a function which merges the
strings using a table or similar).

something like:
static char *foo_somevalue;
....
foo_somevalue=FOO_InternName("somevalue");

....
if(value==foo_somevalue)
...

or, one can use "!strcmp()" if feeling too lazy to intern the strings.



this basic strategy ended up being done for a major subsystem in my case
(the dynamic type-system facilities).

originally, it started out using magic numbers, and later precomputed
string hashes, but there was a problem in that the system needed to be
expanded in a decentralized manner, and both magic numbers and hashes
are weak in this case (magic numbers generally need some sort of
centralized assignment, and the hashes fail about as soon as there is a
collision, which was hardly uncommon with a 12 bit value).

so, in this case, the strings became the canonical values.
 
E

Eric Sosman

[...]
this can be done, actually, just it requires a little more work, namely
"interning" the strings (basically, using a function which merges the
strings using a table or similar).

From the O.P.'s explanation of his aims:
I want to
enable or disable certain (debugging) outputs at *compile* time

(emphasis his). There is no certainty that every if() of an
integer constant expression will be optimized away, but it's
quite unlikely that tests of a NON-constant expression will
disappear.
 
B

BGB

[...]
this can be done, actually, just it requires a little more work, namely
"interning" the strings (basically, using a function which merges the
strings using a table or similar).

From the O.P.'s explanation of his aims:
I want to
enable or disable certain (debugging) outputs at *compile* time

(emphasis his). There is no certainty that every if() of an
integer constant expression will be optimized away, but it's
quite unlikely that tests of a NON-constant expression will
disappear.

ok, probably missed / forgot this while reading the thread.


but, yes, ok then.

at least interned strings work fairly well for run-time checks (albeit,
sadly, they don't work in "switch()").
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top