String constants

Discussion in 'C Programming' started by MQ, Jul 31, 2006.

  1. MQ

    MQ Guest

    wrote:
    > Hi,
    >
    > I have a question about string constants. I compile the following program:
    >
    > #include <stdio.h>
    > #include <string.h>
    >
    > int main(void)
    > {
    > char str1[] = "\007";
    > char str2[] = "\0" "07";
    > char str3[] = { '\0', '0', '7', '\0' };
    >
    > printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
    > printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
    > "sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
    > sizeof(str3));
    > printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
    > "strlen(str3) = %d\n", strlen(str1), strlen(str2),
    > strlen(str3));
    >
    > return 0;
    > }
    >
    > Here is the output:
    >
    > str1 =
    > str2 =
    > str3 =
    > sizeof(str1) = 2
    > sizeof(str2) = 4
    > sizeof(str3) = 4
    > strlen(str1) = 1
    > strlen(str2) = 0
    > strlen(str3) = 0
    >
    > I understand that yet another obscure C feature is the octal character
    > specification so that \ddd is one character. However, should not str1 and str2
    > be the same?


    No, str1 contains a single ASCII character with value 7, followed by a
    null terminator, which gives a length of two. str2 is actually three
    characters, which are '\0', which is a null terminator character,
    followed by the '0' character, followed by the '7' character. With the
    null terminator at the end of the string, you have four characters.

    str1 appears invisible because ASCII 7 is a non-printable character.
    In str2 and str3 you have actually created a string which starts with a
    null terminator, making the string appear to be empty (which is why
    strlen returns 0 in both of these cases)
     
    MQ, Jul 31, 2006
    #1
    1. Advertising

  2. In article <eak2k3$nde$>, <> wrote:

    > char str1[] = "\007";
    > char str2[] = "\0" "07";
    > char str3[] = { '\0', '0', '7', '\0' };


    >I understand that yet another obscure C feature is the octal character
    >specification so that \ddd is one character.


    True (provided the d are all in the range 0 through 7.)

    >However, should not str1 and str2
    >be the same? Obscure feature conflict (\ddd vs string concatenation)?


    Concatenation of adjacent string literals is not done until a
    later point than tokenization of the strings.

    In str3, there is no concatenation taking place: you have
    specified, char by char, exactly what should be put into adjacent
    locations in the array.

    Going back to your second string: would you expect that "\" "007"
    would compile the same as "\007" ? It doesn't of course -- the
    backslash escapes the double-quote, rather than being held in
    suspension in case something is going to show up later.

    The behaviour is well specified in C89: the octal sequence stops
    at the first non-octal character.

    Consider a problem in the hex escape sequences: "\xABCD".
    That is treated as four hex digits, possibly split over several char.
    Suppose, though, that you wanted to stop after the \xAB and you
    wanted literal C and literal D: how would you do it?
    The solution from the standard is that you can use "\xAB" "CD"
    because the sequence ends at the first non-hex character
    (the second double-quote.) But suppose it were otherwise, that
    concatention took place first and then the result was maximally
    tokenized: then in order to get the C to be a C, you would have to
    put in the hex value corresponding to "C", and then you'd have to
    put in the hex value corresponding to "D", and you'd have to
    keep on encoding until finally your text happened to include something
    that wasn't interpretable as hex.
    --
    "It is important to remember that when it comes to law, computers
    never make copies, only human beings make copies. Computers are given
    commands, not permission. Only people can be given permission."
    -- Brad Templeton
     
    Walter Roberson, Jul 31, 2006
    #2
    1. Advertising

  3. MQ

    Guest

    Hi,

    I have a question about string constants. I compile the following program:

    #include <stdio.h>
    #include <string.h>

    int main(void)
    {
    char str1[] = "\007";
    char str2[] = "\0" "07";
    char str3[] = { '\0', '0', '7', '\0' };

    printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
    printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
    "sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
    sizeof(str3));
    printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
    "strlen(str3) = %d\n", strlen(str1), strlen(str2),
    strlen(str3));

    return 0;
    }

    Here is the output:

    str1 =
    str2 =
    str3 =
    sizeof(str1) = 2
    sizeof(str2) = 4
    sizeof(str3) = 4
    strlen(str1) = 1
    strlen(str2) = 0
    strlen(str3) = 0

    I understand that yet another obscure C feature is the octal character
    specification so that \ddd is one character. However, should not str1 and str2
    be the same? Obscure feature conflict (\ddd vs string concatenation)?
     
    , Jul 31, 2006
    #3
  4. MQ

    MQ Guest

    wrote:
    >
    > Also, if you wanted to, for example, use a string containing '\01' '0', how
    > would you do this unambiguously? As in str2? How about '\0' "01" or '\0' '0'???


    I'm not sure what you are trying to acheive, but it seems you are not
    understanding how strings work. '\0' is ASCII 0. You cannot justr
    append this to a string of numbers and get a single character out of
    it. You will get a string with ASCII 0 at the start (the null
    character) plus the string of numbers. Can you explain what you are
    trying to do so we can suggest a better way...

    MQ
     
    MQ, Jul 31, 2006
    #4
  5. MQ

    Guest

    wrote:

    > Also, if you wanted to, for example, use a string containing '\01' '0', how
    > would you do this unambiguously?


    Since '\01' has value 1 I assume that what you want is a
    string whose first byte has the value 1 and second byte the
    value '0'. You do that with char str[] = {1,'0'}

    > How about '\0' "01" or '\0' '0'???


    It would make your thoughts more clear for us if you
    wrote the complete statement you have in mind.

    Spiros Bousbouras
     
    , Jul 31, 2006
    #5
  6. MQ

    Guest

    wrote:
    > Hi,


    > I have a question about string constants. I compile the following program:


    > #include <stdio.h>
    > #include <string.h>


    > int main(void)
    > {
    > char str1[] = "\007";
    > char str2[] = "\0" "07";
    > char str3[] = { '\0', '0', '7', '\0' };


    > printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
    > printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
    > "sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
    > sizeof(str3));
    > printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
    > "strlen(str3) = %d\n", strlen(str1), strlen(str2),
    > strlen(str3));


    > return 0;
    > }


    > Here is the output:


    > str1 =
    > str2 =
    > str3 =
    > sizeof(str1) = 2
    > sizeof(str2) = 4
    > sizeof(str3) = 4
    > strlen(str1) = 1
    > strlen(str2) = 0
    > strlen(str3) = 0


    > I understand that yet another obscure C feature is the octal character
    > specification so that \ddd is one character. However, should not str1 and str2
    > be the same? Obscure feature conflict (\ddd vs string concatenation)?


    Also, if you wanted to, for example, use a string containing '\01' '0', how
    would you do this unambiguously? As in str2? How about '\0' "01" or '\0' '0'???
     
    , Jul 31, 2006
    #6
  7. MQ

    Chris Torek Guest

    In article <eak2k3$nde$> <> wrote:
    >I have a question about string constants.


    There are a number of tricks you need to "get straight in your head"
    in order to deal with this.

    First, a C string is actually a data structure, namely, an array
    of "char"s in which the first zero-byte is considered the end of the
    string.

    Second, escapes like '\007' are interpreted by the compiler, and
    the lexical rules for the octal version are:

    From the backslash, consume up to (but no more than) three
    octal digits, stopping when you run out of digits or when
    the first "invalid" character occurs.

    Hence, if you encounter

    \1\29\00345

    this "means" \1, then \2, then 9, then \003, then 4, then 5.

    Third, string literals usually -- but not always[%] -- mean "generate
    an anonymous array containing the characters given in the literal,
    with a \0 character appended".

    Last, adjacent string literals are concatenated after escape sequence
    interpretation, but before adding the final \0.

    > char str1[] = "\007";


    This string literal has one \7 character inside, so generates an
    array containing two characters, namely \7 and \0.

    > char str2[] = "\0" "07";


    Here there are two adjacent string literals. The first has one
    \0 character inside, and the second has two characters inside,
    '0' and '7'. These are concatenated -- giving '\0' '0' '7'
    in that order -- and a final \0 is added. The result is the same
    as if you wrote either:

    char str2[] = "\00007";

    or the initializer you gave for str3:

    > char str3[] = { '\0', '0', '7', '\0' };


    Both of these create an array of size 4, containing the four
    specified "char"s. Since str2 and str3 both begin with a zero
    byte, their strlen()s are zero, even though both arrays continue
    (always) to hold four "char"s.

    >I understand that yet another obscure C feature is the octal character
    >specification so that \ddd is one character.


    Right -- but only if the digits are uninterrupted, and all octal.
    (The situation is quite different for \x escapes, as someone else
    noted elsethread.)

    >However, should not str1 and str2 be the same?


    No; the order in which the escape-interpretation and
    string-literal-concatenation occurs forbids this.

    [% The two exceptions are: when the literal is not the last in an
    adjacent sequence, so that concatenation occurs before adding the
    \0, or when the literal is used as an initializer for an array
    whose size was specified, and whose specified size is exactly large
    enough to hold the characters in the literal without adding the
    \0. Making use of this second exception is particularly annoying;
    it reminds me of the Bad Old Days of Hollerith constants in Fortran.]
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Jul 31, 2006
    #7
  8. writes:
    [...]
    > Also, if you wanted to, for example, use a string containing '\01'
    > '0', how would you do this unambiguously? As in str2? How about '\0'
    > "01" or '\0' '0'???


    If you want a string containing the characters '\1' and '0', you can
    use "\0010", since an octal escape has at most 3 characters. Or you
    can split it into two string literals: "\1" "0".

    Your second example, '\0' "01" is ill-formed; adjacent string literals
    are concatenated, but character constants are not. Assuming you want
    { '\0', "0", "1" }, you can write "\00001", or, more clearly,
    "\0" "01".

    Similarly, for your third example, you can write "\0000" or "\0" "0".

    In each case, of course, there's an implicit trailing '\0' at the end
    of each string literal (after concatentation), even if the last
    character is an explicit '\0' -- but this is suppressed if the string
    literal is an initializer for a character array of exactly the right
    size. For example:
    const char x[3] = "abc";
    initializes x to { 'a', 'b', 'c' }, but
    const char y[] = "abc"
    initializes y to { 'a', 'b', 'c', '\0' }.


    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Jul 31, 2006
    #8
  9. On 2006-07-31, <> wrote:
    > Hi,
    >
    > I have a question about string constants. I compile the following program:
    >
    > #include <stdio.h>
    > #include <string.h>
    >
    > int main(void)
    > {
    > char str1[] = "\007";

    If you're using ASCII, \007 is an unprintable character. Hence the
    string appears empty.

    > char str2[] = "\0" "07";

    Here you begin a string with a 0, which in C terminates a string. Hence,
    the string /is/ empty.

    > char str3[] = { '\0', '0', '7', '\0' };

    And you have the same problem here: '\0' signifies the end of a string.

    <legal string-printing code snipped>
    > Here is the output:
    >
    > str1 =
    > str2 =
    > str3 =
    > sizeof(str1) = 2
    > sizeof(str2) = 4
    > sizeof(str3) = 4
    > strlen(str1) = 1
    > strlen(str2) = 0
    > strlen(str3) = 0
    >
    > I understand that yet another obscure C feature is the octal character
    > specification so that \ddd is one character. However, should not str1 and
    > str2 be the same? Obscure feature conflict (\ddd vs string concatenation)?


    Concatenation (sp?) occurs before or at the same time as replacing
    escape characters, which includes hexadecimal and octal numbers.

    --
    Andrew Poelstra <website down>
    To reach my email, use <email also down>
    New server ETA: 42
     
    Andrew Poelstra, Jul 31, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brad Wood
    Replies:
    1
    Views:
    383
    =?Utf-8?B?S2xhdXMgSC4gUHJvYnN0?=
    Jul 15, 2005
  2. Matt Garman
    Replies:
    1
    Views:
    306
    =?ISO-8859-2?Q?Milan_=C8erm=E1k?=
    Feb 16, 2004
  3. Veit Wiessner
    Replies:
    5
    Views:
    467
    Veit Wiessner
    Dec 3, 2003
  4. Replies:
    6
    Views:
    444
    Michael Mair
    Jan 26, 2005
  5. Ross

    Binary storage of string constants

    Ross, Jun 29, 2006, in forum: C Programming
    Replies:
    10
    Views:
    588
    Keith Thompson
    Jun 30, 2006
Loading...

Share This Page