Merging of string literals guaranteed by C std?

Discussion in 'C Programming' started by Johannes Bauer, May 25, 2012.

  1. Hi group,

    I have a question about string literals and the address that they point
    to. Does the standard *guarantee* that two identical string literals
    actually point to the same address. I.e. can we safely assert:

    assert("foo" == "foo");

    Or can it maybe only be asserted if the literal occurs in one
    compilation unit (i.e. not across compilation units)?

    My gut feeling tells me that I cannot rely on the addresses being
    identical, but I cannot find it in N1124. It would make things much
    easier/cooler if the standard would assert that in my situation, but I
    don't want to rely on compiler behavior alone (gcc merges the string
    literals into one address even with -O0).

    Best regards,
    Johannes


    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, May 25, 2012
    #1
    1. Advertising

  2. Johannes Bauer

    Noob Guest

    Johannes Bauer wrote:

    > I have a question about string literals and the address that they point
    > to. Does the standard *guarantee* that two identical string literals
    > actually point to the same address. I.e. can we safely assert:
    >
    > assert("foo" == "foo");


    No, this cannot be asserted. AFAIU, it is a QoI issue.
    A "dumb" implementation is allowed to store every string
    literal in a separate location.

    C89 states: (3.1.4 String literals)
    "Identical string literals of either form [wide or regular]
    need not be distinct."

    "need not be distinct" thus they may be distinct.

    Regards.
     
    Noob, May 25, 2012
    #2
    1. Advertising

  3. Johannes Bauer

    James Kuyper Guest

    On 05/25/2012 07:33 AM, Johannes Bauer wrote:
    > Hi group,
    >
    > I have a question about string literals and the address that they point
    > to. Does the standard *guarantee* that two identical string literals
    > actually point to the same address. I.e. can we safely assert:
    >
    > assert("foo" == "foo");


    No, the standard neither mandates nor forbids that. Note: the same is
    true of

    "watergate" + 5 == "gate"
    --
    James Kuyper
     
    James Kuyper, May 25, 2012
    #3
  4. Johannes Bauer

    Eric Sosman Guest

    On 5/25/2012 7:33 AM, Johannes Bauer wrote:
    > Hi group,
    >
    > I have a question about string literals and the address that they point
    > to. Does the standard *guarantee* that two identical string literals
    > actually point to the same address. I.e. can we safely assert:
    >
    > assert("foo" == "foo");


    No.

    > Or can it maybe only be asserted if the literal occurs in one
    > compilation unit (i.e. not across compilation units)?


    No.

    > My gut feeling tells me that I cannot rely on the addresses being
    > identical, but I cannot find it in N1124. It would make things much
    > easier/cooler if the standard would assert that in my situation, but I
    > don't want to rely on compiler behavior alone (gcc merges the string
    > literals into one address even with -O0).


    Your gut is right: The two "foo" may resolve to a single
    nameless array, or to two. One or both or neither of them
    may also share storage with the tail end of "barfoo". It's
    the compiler's choice, and I don't even think the compiler is
    required to document it (except in the sense that you can
    compare the pointers at run time).

    Some compilers have a mode in which each appearance of a
    literal is guaranteed *not* to overlap others, usually to allow
    the program to change the contents of the literal's nameless
    array. In old gcc versions the "-fwriteable-strings" flag did
    this; I think the option has been discontinued.

    --
    Eric Sosman
    d
     
    Eric Sosman, May 25, 2012
    #4
  5. On 25.05.2012 14:38, Eric Sosman wrote:

    >> assert("foo" == "foo");

    >
    > No.
    >
    >> Or can it maybe only be asserted if the literal occurs in one
    >> compilation unit (i.e. not across compilation units)?

    >
    > No.


    Thank you and the other two posters for your clarification. Going to
    think of something else then :)

    Best regards,
    Joe

    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, May 26, 2012
    #5
  6. On 26.05.2012 13:37, pete wrote:

    > /*
    > ** What are you trying to do?
    > */
    > char *foo = "foo";
    >
    > assert(foo == foo);


    I'm writing a large application with a debugging facility. I want to
    enable or disable certain (debugging) outputs at *compile* time (since
    some of them are in inner loops), so that if they're disabled there's no
    residue in the code anywhere that there even was a output.

    Moreover, I'd like to avoid defines in these loops (they really hinder
    the ability to read the code IMO). So the usual approach to something
    like this (and which would fulfill almost all requirements):

    #define FACILITY_FOO (1 << 0)
    #define FACILITY_BAR (1 << 1)
    #define FACILITY_KOO (1 << 2)
    ....

    #define ENABLED_FACILITIES (FACILITY_FOO | FACILITY_KOO)

    and then in the code

    #define debug(fcl, msg, ...) if (fcl & ENABLED_FACILITIES) dump(msg);

    This is then resolved by the compiler and optimized out completely (i.e.
    FACILITY_BAR & (FACILITY_FOO | FACILITY_KOO) == 0).

    Now the problem is: I have very fine granularity of "facilities". More
    than 32 to be sure (hundreds to be exact). I'd like to have a solution
    with an arbitrary amount of facilities.

    Therefore I was thinking of some check like

    #define FACILITY_FOO "foo"
    #define FACILITY_BAR "bar"
    #define FACILITY_KOO "koo"

    and a debug implementation like this

    #define debug(fcl, msg, ...)
    if ((fcl == FACILITY_FOO) || (fcl == FACILITY_BAR)) dump(msg);

    Seems like this is not the way to go, though. If there was something
    like "constexpr" in C, this could easily be done. Now I'm a bit puzzled
    but will figure something out (and if nothing else works, I'll have
    Python generate some C code which does the right switching on/off of
    debugging instructions).

    Best regards,
    Joe


    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, May 26, 2012
    #6
  7. Johannes Bauer

    Eric Sosman Guest

    On 5/26/2012 7:47 AM, Johannes Bauer wrote:
    > On 26.05.2012 13:37, pete wrote:
    >
    >> /*
    >> ** What are you trying to do?
    >> */
    >> char *foo = "foo";
    >>
    >> assert(foo == foo);

    >
    > I'm writing a large application with a debugging facility. I want to
    > enable or disable certain (debugging) outputs at *compile* time (since
    > some of them are in inner loops), so that if they're disabled there's no
    > residue in the code anywhere that there even was a output.
    >
    > Moreover, I'd like to avoid defines in these loops (they really hinder
    > the ability to read the code IMO). So the usual approach to something
    > like this (and which would fulfill almost all requirements):
    >
    > #define FACILITY_FOO (1<< 0)
    > #define FACILITY_BAR (1<< 1)
    > #define FACILITY_KOO (1<< 2)
    > ...
    >
    > #define ENABLED_FACILITIES (FACILITY_FOO | FACILITY_KOO)
    >
    > and then in the code
    >
    > #define debug(fcl, msg, ...) if (fcl& ENABLED_FACILITIES) dump(msg);
    >
    > This is then resolved by the compiler and optimized out completely (i.e.
    > FACILITY_BAR& (FACILITY_FOO | FACILITY_KOO) == 0).
    >
    > Now the problem is: I have very fine granularity of "facilities". More
    > than 32 to be sure (hundreds to be exact). I'd like to have a solution
    > with an arbitrary amount of facilities.
    >
    > Therefore I was thinking of some check like
    >
    > #define FACILITY_FOO "foo"
    > #define FACILITY_BAR "bar"
    > #define FACILITY_KOO "koo"
    >
    > and a debug implementation like this
    >
    > #define debug(fcl, msg, ...)
    > if ((fcl == FACILITY_FOO) || (fcl == FACILITY_BAR)) dump(msg);
    >
    > Seems like this is not the way to go, though. If there was something
    > like "constexpr" in C, this could easily be done. Now I'm a bit puzzled
    > but will figure something out (and if nothing else works, I'll have
    > Python generate some C code which does the right switching on/off of
    > debugging instructions).


    Why not use numeric constants instead of strings?

    #define FACILITY_FOO 1
    #define FACILITY_BAR 2
    #define FACILITY_KOO 42
    // ... or use enum constants

    #define debug(fcl, msg) \
    if ((fcl) == FACILITY_FOO || (fcl) == FACILITY_BAR) \
    dump(msg)
    // see also "the do-while hack" for a better

    Alternatively,

    #define FACILITY_FOO 1 // enable FOO debugging
    #define FACILITY_BAR 0 // suppress BAR debugging
    #define FACILITY_KOO 1 // enable KOO debugging

    #define debug(fcl, msg) if (fcl) dump(msg)

    .... leading to a much briefer macro that you needn't change when
    changing the state of "hundreds" of facilities.

    --
    Eric Sosman
    d
     
    Eric Sosman, May 26, 2012
    #7
  8. On 26.05.2012 14:11, Eric Sosman wrote:

    > Why not use numeric constants instead of strings?
    >
    > #define FACILITY_FOO 1
    > #define FACILITY_BAR 2
    > #define FACILITY_KOO 42
    > // ... or use enum constants
    >
    > #define debug(fcl, msg) \
    > if ((fcl) == FACILITY_FOO || (fcl) == FACILITY_BAR) \
    > dump(msg)
    > // see also "the do-while hack" for a better


    You mean do { } while(0)? That's in the original definition, I just
    posted the shortcut from memory :)

    > Alternatively,
    >
    > #define FACILITY_FOO 1 // enable FOO debugging
    > #define FACILITY_BAR 0 // suppress BAR debugging
    > #define FACILITY_KOO 1 // enable KOO debugging
    >
    > #define debug(fcl, msg) if (fcl) dump(msg)
    >
    > ... leading to a much briefer macro that you needn't change when
    > changing the state of "hundreds" of facilities.


    Yes, I think I'll take that approach, which is much more sensible. The
    reason I tried to use strings is beacuse the facility (unlike in the
    abbreviated example) is also passed to the debugging command for proper
    redirection of logging (i.e. separate things in separate files). For
    display, having the name is nice.

    I tried to kill two birds with one stone: essentially making the
    facility's name to it's variables value.

    Thanks for the hints!
    Best regards,
    Joe

    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, May 26, 2012
    #8
  9. Johannes Bauer

    Eric Sosman Guest

    On 5/26/2012 8:24 AM, Johannes Bauer wrote:
    > [...]
    > Yes, I think I'll take that approach, which is much more sensible. The
    > reason I tried to use strings is beacuse the facility (unlike in the
    > abbreviated example) is also passed to the debugging command for proper
    > redirection of logging (i.e. separate things in separate files). For
    > display, having the name is nice.


    Stringize the FACILITY_xxx piece in the debug() macro, and
    pass it to the message-writing function, along with __FILE__ and
    __LINE__ and whatever else suits your fancy.

    --
    Eric Sosman
    d
     
    Eric Sosman, May 26, 2012
    #9
  10. On 26.05.2012 14:41, Eric Sosman wrote:
    > On 5/26/2012 8:24 AM, Johannes Bauer wrote:
    >> [...]
    >> Yes, I think I'll take that approach, which is much more sensible. The
    >> reason I tried to use strings is beacuse the facility (unlike in the
    >> abbreviated example) is also passed to the debugging command for proper
    >> redirection of logging (i.e. separate things in separate files). For
    >> display, having the name is nice.

    >
    > Stringize the FACILITY_xxx piece in the debug() macro, and
    > pass it to the message-writing function, along with __FILE__ and
    > __LINE__ and whatever else suits your fancy.


    Ah, stringifying that piece is a smart idea. Thanks for the pointer!

    Best regards,
    Joe

    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, May 26, 2012
    #10
  11. Johannes Bauer

    BGB Guest

    On 5/26/2012 7:24 AM, Johannes Bauer wrote:
    > On 26.05.2012 14:11, Eric Sosman wrote:
    >
    >> Why not use numeric constants instead of strings?
    >>
    >> #define FACILITY_FOO 1
    >> #define FACILITY_BAR 2
    >> #define FACILITY_KOO 42
    >> // ... or use enum constants
    >>
    >> #define debug(fcl, msg) \
    >> if ((fcl) == FACILITY_FOO || (fcl) == FACILITY_BAR) \
    >> dump(msg)
    >> // see also "the do-while hack" for a better

    >
    > You mean do { } while(0)? That's in the original definition, I just
    > posted the shortcut from memory :)
    >
    >> Alternatively,
    >>
    >> #define FACILITY_FOO 1 // enable FOO debugging
    >> #define FACILITY_BAR 0 // suppress BAR debugging
    >> #define FACILITY_KOO 1 // enable KOO debugging
    >>
    >> #define debug(fcl, msg) if (fcl) dump(msg)
    >>
    >> ... leading to a much briefer macro that you needn't change when
    >> changing the state of "hundreds" of facilities.

    >
    > Yes, I think I'll take that approach, which is much more sensible. The
    > reason I tried to use strings is beacuse the facility (unlike in the
    > abbreviated example) is also passed to the debugging command for proper
    > redirection of logging (i.e. separate things in separate files). For
    > display, having the name is nice.
    >
    > I tried to kill two birds with one stone: essentially making the
    > facility's name to it's variables value.
    >



    this can be done, actually, just it requires a little more work, namely
    "interning" the strings (basically, using a function which merges the
    strings using a table or similar).

    something like:
    static char *foo_somevalue;
    ....
    foo_somevalue=FOO_InternName("somevalue");

    ....
    if(value==foo_somevalue)
    ...

    or, one can use "!strcmp()" if feeling too lazy to intern the strings.



    this basic strategy ended up being done for a major subsystem in my case
    (the dynamic type-system facilities).

    originally, it started out using magic numbers, and later precomputed
    string hashes, but there was a problem in that the system needed to be
    expanded in a decentralized manner, and both magic numbers and hashes
    are weak in this case (magic numbers generally need some sort of
    centralized assignment, and the hashes fail about as soon as there is a
    collision, which was hardly uncommon with a 12 bit value).

    so, in this case, the strings became the canonical values.
     
    BGB, May 26, 2012
    #11
  12. Johannes Bauer

    Eric Sosman Guest

    On 5/26/2012 2:38 PM, BGB wrote:
    > [...]
    > this can be done, actually, just it requires a little more work, namely
    > "interning" the strings (basically, using a function which merges the
    > strings using a table or similar).


    From the O.P.'s explanation of his aims:

    > I want to
    > enable or disable certain (debugging) outputs at *compile* time


    (emphasis his). There is no certainty that every if() of an
    integer constant expression will be optimized away, but it's
    quite unlikely that tests of a NON-constant expression will
    disappear.

    --
    Eric Sosman
    d
     
    Eric Sosman, May 26, 2012
    #12
  13. Johannes Bauer

    BGB Guest

    On 5/26/2012 2:17 PM, Eric Sosman wrote:
    > On 5/26/2012 2:38 PM, BGB wrote:
    >> [...]
    >> this can be done, actually, just it requires a little more work, namely
    >> "interning" the strings (basically, using a function which merges the
    >> strings using a table or similar).

    >
    > From the O.P.'s explanation of his aims:
    >
    > > I want to
    > > enable or disable certain (debugging) outputs at *compile* time

    >
    > (emphasis his). There is no certainty that every if() of an
    > integer constant expression will be optimized away, but it's
    > quite unlikely that tests of a NON-constant expression will
    > disappear.
    >


    ok, probably missed / forgot this while reading the thread.


    but, yes, ok then.

    at least interned strings work fairly well for run-time checks (albeit,
    sadly, they don't work in "switch()").
     
    BGB, May 26, 2012
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Goche
    Replies:
    8
    Views:
    16,501
  2. Newsgroup - Ann
    Replies:
    0
    Views:
    407
    Newsgroup - Ann
    Aug 15, 2003
  3. Jacek Dziedzic
    Replies:
    1
    Views:
    299
    Moonlit
    Nov 1, 2003
  4. Peter Jansson
    Replies:
    5
    Views:
    6,348
    Ivan Vecerina
    Mar 17, 2005
  5. Jeffrey Walton
    Replies:
    10
    Views:
    955
    Mathias Gaunard
    Nov 26, 2006
Loading...

Share This Page