Pointer and string literal question

Discussion in 'C Programming' started by Tagore, Dec 10, 2009.

  1. Tagore

    Tagore Guest

    hi,

    #include <stdio.h>
    int main(void){
    char *s="LET";
    char *t="LET";
    if(s==t)
    printf("same");
    else
    printf("different");
    return 0;
    }

    In above code, output is "same".
    but I expected output to be "different". I think that s and t points
    to string literals present at different addresses.
    Can any one please help me in understanding its output.

    Regards,
    Tagore, Dec 10, 2009
    #1
    1. Advertising

  2. "bartc" <> writes:
    > "Tagore" <> wrote in message
    > news:...
    >> #include <stdio.h>
    >> int main(void){
    >> char *s="LET";
    >> char *t="LET";
    >> if(s==t)
    >> printf("same");
    >> else
    >> printf("different");
    >> return 0;
    >> }
    >>
    >> In above code, output is "same".
    >> but I expected output to be "different". I think that s and t points
    >> to string literals present at different addresses.
    >> Can any one please help me in understanding its output.

    >
    > Because the literals are identical, perhaps only a single copy is used.


    Right. Compilers are explicitly permitted, but not required, to do
    this. C99 6.4.5p6:

    It is unspecified whether these arrays are distinct provided their
    elements have the appropriate values. If the program attempts to
    modify such an array, the behavior is undefined.

    Your program shouldn't assume either that they're the same, or that
    they aren't.

    For example, for this program:

    #include <stdio.h>
    int main(void)
    {
    char *s0 = "abcde";
    char *s1 = "abcde";
    char *s2 = "Xabcde";
    if (s0 == s1) {
    puts("s0 == s1");
    }
    else {
    puts("s0 != s1");
    }
    if (s0 == s2+1) {
    puts("s0 == s2+1");
    }
    else {
    puts("s0 != s2+1");
    }
    return 0;
    }

    all 4 possible results are valid. (The compiler I'm using prints
    s0 == s1, s0 != s2+1
    without optimization,
    s0 == s1, s0 == s2+1
    with optimization.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Dec 10, 2009
    #2
    1. Advertising

  3. Tagore <> wrote:
    > #include <stdio.h>
    > int main(void){
    > char *s="LET";
    > char *t="LET";
    > if(s==t)
    > printf("same");
    > else
    > printf("different");
    > return 0;
    > }


    > In above code, output is "same".
    > but I expected output to be "different". I think that s and t points
    > to string literals present at different addresses.


    Why do you think so? It's correct that both 's' and 't' point to
    string literals - but since the strings they point to are identical
    it's one of the most simple (memory-related) optimizations for the
    compiler to make them point to the same location. Actually, that's
    the very reason why you aren't allowed to change string literals -
    i.e. if you would do e.g.

    s[1] = 'x'; /* not allowed by the C standard! */

    then this would also change the content of what 't' is poin-
    ting to. The guys writing the C standard had two alternatives:
    allow changes to string literals - in which case 's' couldn't
    point to the same place as 't', thus making a certain kind of
    optimization impossible - or allow for optimization like the
    one you are seeing here and thus forbid changing string lite-
    rals. They went with the second one, which to me seems to be
    in the spirit of C, i.e. go for compact, fast and least resour-
    ce-hungry compiled programs.

    But if you don't like it your compiler may have a flag to make
    it less standard-compliant and force it to produce code where
    's' is pointing to a different location than 't' (and where you
    thus may change string literals).
    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Dec 10, 2009
    #3
  4. Tagore

    Kaz Kylheku Guest

    On 2009-12-10, Jens Thoms Toerring <> wrote:
    > Tagore <> wrote:
    >> #include <stdio.h>
    >> int main(void){
    >> char *s="LET";
    >> char *t="LET";
    >> if(s==t)
    >> printf("same");
    >> else
    >> printf("different");
    >> return 0;
    >> }

    >
    >> In above code, output is "same".
    >> but I expected output to be "different". I think that s and t points
    >> to string literals present at different addresses.

    >
    > Why do you think so? It's correct that both 's' and 't' point to
    > string literals - but since the strings they point to are identical
    > it's one of the most simple (memory-related) optimizations for the
    > compiler to make them point to the same location. Actually, that's
    > the very reason why you aren't allowed to change string literals -
    > i.e. if you would do e.g.


    It's not the only reason.

    Literals are effectively pieces of the program text made available to itself as
    data, so that modifying a literal de facto constitutes self-modifying code.
    Self-modifying code can't be placed into read-only storage, such as a ROM, or
    write-protected virtual pages.

    > s[1] = 'x'; /* not allowed by the C standard! */


    This undefinedness also means that once you perform s[1] = 'x', a subsequent
    statement of the form

    if (s[1] == 'x') ...

    could go either way (if it ever gets to execute at all). It's not just about
    other copies of the i literal being affected by the change.

    The translated program is also simply not required to be aware of
    self-modifications like this.

    Not only can another instance of the literal share the same space as s, but the
    expression s[1] can be optimized to a constant which does not respond to
    changes to s.
    Kaz Kylheku, Dec 10, 2009
    #4
  5. Kaz Kylheku <> writes:
    > On 2009-12-10, Jens Thoms Toerring <> wrote:
    >> Tagore <> wrote:
    >>> char *s="LET";
    >>> char *t="LET";

    [...]
    >> Why do you think so? It's correct that both 's' and 't' point to
    >> string literals - but since the strings they point to are identical
    >> it's one of the most simple (memory-related) optimizations for the
    >> compiler to make them point to the same location. Actually, that's
    >> the very reason why you aren't allowed to change string literals -
    >> i.e. if you would do e.g.

    >
    > It's not the only reason.
    >
    > Literals are effectively pieces of the program text made available
    > to itself as data, so that modifying a literal de facto constitutes
    > self-modifying code. Self-modifying code can't be placed into
    > read-only storage, such as a ROM, or write-protected virtual pages.
    >
    >> s[1] = 'x'; /* not allowed by the C standard! */

    >
    > This undefinedness also means that once you perform s[1] = 'x', a subsequent
    > statement of the form
    >
    > if (s[1] == 'x') ...
    >
    > could go either way (if it ever gets to execute at all). It's not just about
    > other copies of the i literal being affected by the change.
    >
    > The translated program is also simply not required to be aware of
    > self-modifications like this.
    >
    > Not only can another instance of the literal share the same space as
    > s, but the expression s[1] can be optimized to a constant which does
    > not respond to changes to s.


    Agreed.

    In addition, it's also likely (but not required) that attempting:

    s[1] = 'x';

    will cause your program to crash. (In fact, this is the *best*
    outcome, since it shows you where the error is.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Dec 10, 2009
    #5
  6. Tagore

    Eric Sosman Guest

    On 12/10/2009 5:51 PM, Tagore wrote:
    > hi,
    >
    > #include<stdio.h>
    > int main(void){
    > char *s="LET";
    > char *t="LET";
    > if(s==t)
    > printf("same");
    > else
    > printf("different");
    > return 0;
    > }
    >
    > In above code, output is "same".
    > but I expected output to be "different". I think that s and t points
    > to string literals present at different addresses.
    > Can any one please help me in understanding its output.


    As others have explained, the compiler might choose
    to create only one nameless "LET" string, and aim both
    pointers at that single instance.

    A compiler might go further still:

    char *s = "LET";
    char *t = "NUMBER ONE WITH A BULLET";
    if (s == t)
    ... obviously false ...
    if (s == t+21)
    ... ??? ...

    --
    Eric Sosman
    lid
    Eric Sosman, Dec 11, 2009
    #6
  7. Eric Sosman <> wrote:
    <snip>
    >      As others have explained, the compiler might choose
    > to create only one nameless "LET" string, and aim both
    > pointers at that single instance.
    >
    >      A compiler might go further still:
    >
    >         char *s = "LET";
    >         char *t = "NUMBER ONE WITH A BULLET";
    >         if (s == t)
    >             ... obviously false ...
    >         if (s == t+21)
    >             ... ??? ...


    Are you sure that's allowed in C89/90? I thought the
    string literals had to be 'identical' before they could
    share the same address.

    --
    Peter
    Peter Nilsson, Dec 11, 2009
    #7
  8. Peter Nilsson <> writes:
    > Eric Sosman <> wrote:
    > <snip>
    >>      As others have explained, the compiler might choose
    >> to create only one nameless "LET" string, and aim both
    >> pointers at that single instance.
    >>
    >>      A compiler might go further still:
    >>
    >>         char *s = "LET";
    >>         char *t = "NUMBER ONE WITH A BULLET";
    >>         if (s == t)
    >>             ... obviously false ...
    >>         if (s == t+21)
    >>             ... ??? ...

    >
    > Are you sure that's allowed in C89/90? I thought the
    > string literals had to be 'identical' before they could
    > share the same address.


    The wording did change between C90 and C99.

    C90 6.1.4:

    Identical string literals of either form need not be distinct. If
    the program attempts to modify a string literal of either form,
    the behavior is undefined.

    where "either form" refers to character string literals and wide
    string literals.

    C99 6.4.5p6:

    It is unspecified whether these arrays are distinct provided their
    elements have the appropriate values. If the program attempts to
    modify such an array, the behavior is undefined.

    But the C90 standard didn't say that string literals that aren't
    identical *can't* overlap (and I can't think of any good reason to
    assume that they can't). I think C99 mostly just improved the
    wording.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Dec 11, 2009
    #8
  9. Tagore

    Eric Sosman Guest

    On 12/10/2009 9:23 PM, Peter Nilsson wrote:
    > Eric Sosman<> wrote:
    > <snip>
    >> As others have explained, the compiler might choose
    >> to create only one nameless "LET" string, and aim both
    >> pointers at that single instance.
    >>
    >> A compiler might go further still:
    >>
    >> char *s = "LET";
    >> char *t = "NUMBER ONE WITH A BULLET";
    >> if (s == t)
    >> ... obviously false ...
    >> if (s == t+21)
    >> ... ??? ...

    >
    > Are you sure that's allowed in C89/90? I thought the
    > string literals had to be 'identical' before they could
    > share the same address.


    The word "identical" doesn't seem to appear in any part
    of the C99 Standard that's relevant. But perhaps I've missed
    something; can you cite which "identical" you're thinking of?

    In C99, 6.4.5p6 says "It is unspecified whether these arrays
    are distinct provided their elements have the appropriate values."
    The word "appropriate" does not seem to me to imply "identical."

    C89/ANSI 3.1.4 says "Identical string literals of either form
    need not be distinct," but doesn't seem to say anything at all
    about non-identical literals. (It doesn't even say that "X"
    and "FOOBAR" are distinct.)

    I don't have a copy of C90 to consult, but others have said
    it's the same as C89 except for section and paragraph numbers.

    --
    Eric Sosman
    lid
    Eric Sosman, Dec 11, 2009
    #9
  10. Tagore

    Nick Guest

    Keith Thompson <> writes:

    > Peter Nilsson <> writes:
    >> Eric Sosman <> wrote:
    >> <snip>
    >>>      As others have explained, the compiler might choose
    >>> to create only one nameless "LET" string, and aim both
    >>> pointers at that single instance.
    >>>
    >>>      A compiler might go further still:
    >>>
    >>>         char *s = "LET";
    >>>         char *t = "NUMBER ONE WITH A BULLET";
    >>>         if (s == t)
    >>>             ... obviously false ...
    >>>         if (s == t+21)
    >>>             ... ??? ...

    >>
    >> Are you sure that's allowed in C89/90? I thought the
    >> string literals had to be 'identical' before they could
    >> share the same address.

    >
    > The wording did change between C90 and C99.
    >
    > C90 6.1.4:
    >
    > Identical string literals of either form need not be distinct. If
    > the program attempts to modify a string literal of either form,
    > the behavior is undefined.
    >
    > where "either form" refers to character string literals and wide
    > string literals.
    >
    > C99 6.4.5p6:
    >
    > It is unspecified whether these arrays are distinct provided their
    > elements have the appropriate values. If the program attempts to
    > modify such an array, the behavior is undefined.
    >
    > But the C90 standard didn't say that string literals that aren't
    > identical *can't* overlap (and I can't think of any good reason to
    > assume that they can't). I think C99 mostly just improved the
    > wording.


    There is, presumably, nothing to stop the compiler pointing s at four
    bytes of machine code that happen to make up part of the body of your
    program and which constitute codes for L,E and T followed by a 0 byte.
    If they should so happen to appear, of course.
    --
    Online waterways route planner: http://canalplan.org.uk
    development version: http://canalplan.eu
    Nick, Dec 11, 2009
    #10
  11. On Dec 11, 6:38 am, Kaz Kylheku <> wrote:
    > Literals are effectively pieces of the program text made available to itself as
    > data, so that modifying a literal de facto constitutes self-modifying code.
    > Self-modifying code can't be placed into read-only storage, such as a ROM, or
    > write-protected virtual pages.


    A related reason for "read-only when possible" concerns text-sharing.

    One might have dozens of copies of the same program (e.g. interpreter)
    running on one machine; the interpreter's data might include hundreds
    of messages; there's a very big savings if the messages can be moved
    to a read-only, sharable memory section. (There used to be a
    complicated
    pre-processor that accomplished this, also looking for string matches;
    it became obsolete when compilers started treating string literals as
    read-only by default.)

    James Dow Allen
    James Dow Allen, Dec 11, 2009
    #11
  12. On 10 Dec, 23:20, (Jens Thoms Toerring) wrote:
    > Tagore <> wrote:


    > > #include <stdio.h>
    > > int main(void){
    > >         char *s="LET";
    > >         char *t="LET";
    > >         if(s==t)
    > >               printf("same");
    > >         else
    > >               printf("different");
    > >         return 0;
    > > }

    >
    > > In above code, output is "same".
    > > but I expected output to be "different". I think that s and t points
    > > to string literals present at different addresses.

    >
    > Why do you think so? It's correct that both 's' and 't' point to
    > string literals - but since the strings they point to are identical
    > it's one of the most simple (memory-related) optimizations for the
    > compiler to make them point to the same location.


    <snip>

    > But if you don't like it your compiler may have a flag to make
    > it less standard-compliant and force it to produce code where
    > 's' is pointing to a different location than 't' (and where you
    > thus may change string literals).


    why is this not-compliant?
    Nick Keighley, Dec 11, 2009
    #12
  13. Tagore

    Richard Bos Guest

    Keith Thompson <> wrote:

    > Kaz Kylheku <> writes:
    > > On 2009-12-10, Jens Thoms Toerring <> wrote:
    > >> Tagore <> wrote:
    > >>> char *s="LET";
    > >>> char *t="LET";

    > [...]
    > >> Why do you think so? It's correct that both 's' and 't' point to
    > >> string literals - but since the strings they point to are identical
    > >> it's one of the most simple (memory-related) optimizations for the
    > >> compiler to make them point to the same location. Actually, that's
    > >> the very reason why you aren't allowed to change string literals -
    > >> i.e. if you would do e.g.

    > >
    > > It's not the only reason.
    > >
    > > Literals are effectively pieces of the program text made available
    > > to itself as data, so that modifying a literal de facto constitutes
    > > self-modifying code. Self-modifying code can't be placed into
    > > read-only storage, such as a ROM, or write-protected virtual pages.
    > >
    > >> s[1] = 'x'; /* not allowed by the C standard! */

    > >
    > > This undefinedness also means that once you perform s[1] = 'x', a subsequent
    > > statement of the form
    > >
    > > if (s[1] == 'x') ...
    > >
    > > could go either way (if it ever gets to execute at all). It's not just about
    > > other copies of the i literal being affected by the change.
    > >
    > > The translated program is also simply not required to be aware of
    > > self-modifications like this.
    > >
    > > Not only can another instance of the literal share the same space as
    > > s, but the expression s[1] can be optimized to a constant which does
    > > not respond to changes to s.

    >
    > Agreed.
    >
    > In addition, it's also likely (but not required) that attempting:
    >
    > s[1] = 'x';
    >
    > will cause your program to crash. (In fact, this is the *best*
    > outcome, since it shows you where the error is.)


    It's even possible that a later

    if (ch == 'L')

    is compiled to compare to the first character of your string literal,
    instead of to a literal 'L', on systems where this is faster. It is even
    allowed that, if you do try to change the string, that comparison fails
    when ch is 'L', at a point which _appears_ to have nothing whatsoever to
    do with the original string literal.
    I have never seen an implementation which goes that far in its
    optimisations (in fact, I've never seen one where it would make sense),
    but I would not be very surprised to find one. It would certainly be
    perfectly legal.

    Richard
    Richard Bos, Dec 12, 2009
    #13
  14. Nick Keighley <> wrote:
    > On 10 Dec, 23:20, (Jens Thoms Toerring) wrote:
    > > Tagore <> wrote:


    > > > #include <stdio.h>
    > > > int main(void){
    > > >         char *s="LET";
    > > >         char *t="LET";
    > > >         if(s==t)
    > > >               printf("same");
    > > >         else
    > > >               printf("different");
    > > >         return 0;
    > > > }

    > >
    > > > In above code, output is "same".
    > > > but I expected output to be "different". I think that s and t points
    > > > to string literals present at different addresses.

    > >
    > > Why do you think so? It's correct that both 's' and 't' point to
    > > string literals - but since the strings they point to are identical
    > > it's one of the most simple (memory-related) optimizations for the
    > > compiler to make them point to the same location.


    > <snip>


    > > But if you don't like it your compiler may have a flag to make
    > > it less standard-compliant and force it to produce code where
    > > 's' is pointing to a different location than 't' (and where you
    > > thus may change string literals).


    > why is this not-compliant?


    Sorry, that was badly expressed. What I meant was that there might
    be a flag that gets the compiler to emit a working program (in the
    sense of "as maybe expected by the user") for non-compliant code
    (i.e. that allows for changing of string literals, which otherwise
    results in undefined behaviour). But on thinking about it a bit
    more even that doesn't guarantee that 's' and 't' will point to dif-
    ferent locations, what one would need for that is a flag that sup-
    presses the kind of optimization that merges identical (parts of)
    string literals.
    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Dec 12, 2009
    #14
  15. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Tagore wrote:

    > hi,
    >
    > #include <stdio.h>
    > int main(void){
    > char *s="LET";
    > char *t="LET";
    > if(s==t)
    > printf("same");
    > else
    > printf("different");
    > return 0;
    > }
    >


    As string literals are really "const" char *, there are read-only and the
    compiler is free to place them at the same or at different addresses.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEARECAAYFAksuDYoACgkQG6NzcAXitM8q8QCggCk8nbniYiOayL/SLP3qQIxE
    eWMAoIgO0i2qL7Sf4PE9rmye3xp3IfK3
    =nz2o
    -----END PGP SIGNATURE-----
    Michael Tsang, Dec 20, 2009
    #15
  16. Michael Tsang <> writes:

    > Tagore wrote:

    <snip>
    >> #include <stdio.h>
    >> int main(void){
    >> char *s="LET";
    >> char *t="LET";
    >> if(s==t)
    >> printf("same");
    >> else
    >> printf("different");
    >> return 0;
    >> }

    >
    > As string literals are really "const" char *, there are read-only and the
    > compiler is free to place them at the same or at different
    > addresses.


    It's worth pointing out (as I think you know from the quotes you used
    round "const") that string literals are not actually const objects in
    C. They are not modifiable (in that the effect of doing so is
    undefined) but if they were really const, you'd get a compiler
    diagnostic from the initialisations in the program above.

    Also (and this is very much a small point) a literal like "same" is
    really of type char[5] since sizeof will report the array object's
    size not the size of a char *.

    --
    Ben.
    Ben Bacarisse, Dec 20, 2009
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Prakash Prabhu

    String literal and String Object

    Prakash Prabhu, Aug 27, 2003, in forum: Java
    Replies:
    3
    Views:
    624
    John C. Bollinger
    Aug 27, 2003
  2. Replies:
    12
    Views:
    416
    Kenny McCormack
    Jan 9, 2006
  3. Replies:
    10
    Views:
    750
    Roland Pibinger
    Jan 24, 2007
  4. bartc

    Re: Pointer and string literal question

    bartc, Dec 10, 2009, in forum: C Programming
    Replies:
    0
    Views:
    361
    bartc
    Dec 10, 2009
  5. Anonieko Ramos

    What's wrong with rpc-literal? Why use doc-literal?

    Anonieko Ramos, Sep 27, 2004, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    362
    Anonieko Ramos
    Sep 27, 2004
Loading...

Share This Page