sizeof( <string literal> )

D

Don Starr

When applied to a string literal, is the sizeof operator supposed to return the size of the string
(including nul), or the size of a pointer?

For example, assuming a char is 1 byte and a char * is 4 bytes, should the following yield 4, 5, of
something else? (And, if something else, what determines the result?)

char x[] = "abcd";
printf( "%d\n", sizeof( x ) );

-Don
 
E

Eric Sosman

Don said:
When applied to a string literal, is the sizeof operator supposed to return the size of the string
(including nul), or the size of a pointer?

For example, assuming a char is 1 byte and a char * is 4 bytes, should the following yield 4, 5, of
something else? (And, if something else, what determines the result?)

char x[] = "abcd";
printf( "%d\n", sizeof( x ) );

You'll get 5. (Actually you might get undefined behavior
because the `sizeof' operator yields a result of type `size_t',
and "%d" isn't necessarily the right format specifier. I've
used machines on which the above would print 0!)

Why 5? Because a string literal generates an array of
characters. When you apply the `sizeof' operator to an array
(any array), you get the number of bytes in the array. This
is one of the very few cases where an array reference does
*not* turn into a pointer to the zero'th element; the `sizeof'
operator applies to the array as a whole.

See also Questions 6.4 and 6.8 -- in fact, read all of
Section 6 -- in the comp.lang.c Frequently Asked Questions
(FAQ) list

http://www.eskimo.com/~scs/C-faq/top.html
 
A

Artie Gold

Don said:
When applied to a string literal, is the sizeof operator supposed to return the size of the string
(including nul), or the size of a pointer?

Smells like homework.
For example, assuming a char is 1 byte and a char * is 4 bytes, should the following yield 4, 5, of
something else? (And, if something else, what determines the result?)

Yup. Really smells like homework.
char x[] = "abcd";
printf( "%d\n", sizeof( x ) );

Well, what do you think?
[hint: you're not taking the `sizeof' of a string literal here, but,
rather, an array of `char']

HTH,
--ag
 
B

Ben Pfaff

Don Starr said:
When applied to a string literal, is the sizeof operator
supposed to return the size of the string (including nul), or
the size of a pointer?

The sizeof operator yields the size of its operand's type. If
its operand has a pointer type, then it yields the size of the
pointer type. If its operand has an array type, then it yields
the number of bytes in the array. There is no special case for a
string.
For example, assuming a char is 1 byte

A char is always 1 byte.
and a char * is 4 bytes, should the following yield 4, 5, of
something else? (And, if something else, what determines the
result?)

char x[] = "abcd";
printf( "%d\n", sizeof( x ) );

You need to cast the result of sizeof to int here, because sizeof
yields a result of type size_t, which is an unsigned type.

With that correction, this will always print 5, because the array
has five elements of type char.
 
A

Artie Gold

Don said:
No, not homework. Haven't had any of that for about 17 years. It's an attempt to settle a dispute
regarding the output of one compiler vs. several others.

My mistake. No offense meant. ;-)
Well, what do you think?
[hint: you're not taking the `sizeof' of a string literal here, but,
rather, an array of `char']


*I* think 5, and so do all (3) of the compilers I've tried.

Then we're all in agreement!

--ag
 
D

Don Starr

char x[] = "abcd";
printf( "%d\n", sizeof( x ) );

After reading the various responses, I realize that I goofed in my example. _Of course_ I'm asking
for the size of an array of char.

I should have posted originally:
#define _x "abcd"
size_t y = sizeof( _x );

I then should've asked about the resulting value of <y>.

I presume that the answer is still 5, as the string literal is treated here as a nul-terminated
array of char?

-Don
 
S

Simon Biber

Don Starr said:
I presume that the answer is still 5, as the string
literal is treated here as a nul-terminated array of
char?

Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.
 
M

madhukar_bm

Eric Sosman said:
Don Starr wrote:

This is one of the very few cases where an array reference does
*not* turn into a pointer to the zero'th element; the `sizeof'
operator applies to the array as a whole.

Can you please let me know the other cases where the array reference
does not turn into a pointer
 
D

Dan Pop

In said:
Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. A string literal is a purely syntactical construct. The
way it is translated depends on the context where it is used. Compare:

char a[] = "abcd";
char b[4] = "abcd";

The same string literal initialises a and b differently. Furthermore,
there is nothing preventing these arrays from being declared as arrays of
const char.

Dan
 
D

Dan Pop

In said:
Can you please let me know the other cases where the array reference
does not turn into a pointer

Except when it is the operand of the sizeof operator or the unary &
operator, or is a character string literal used to initialize an array
of character type, or is a wide string literal used to initialize an
array with element type compatible with wchar_t, an lvalue that has
type ``array of type '' is converted to an expression that has type
``pointer to type '' that points to the initial member of the array
object and is not an lvalue.

Dan
 
A

Arthur J. O'Dwyer

Simon said:
Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. A string literal is a purely syntactical construct. The
way it is translated depends on the context where it is used. Compare:

char a[] = "abcd";
char b[4] = "abcd";

The same string literal initialises a and b differently. Furthermore,
there is nothing preventing these arrays from being declared as arrays of
const char.

While I've no doubt you have a good point, doesn't Simon's answer more
closely hew to the wording of the Standard that you quoted elsethread?

Except when it is the operand of the sizeof operator or the unary &
operator, or is a character string literal used to initialize an array
^^^^^^^^^^^^^^^^^^^^^^^^
of character type, or is a wide string literal used to initialize an
array with element type compatible with wchar_t, an lvalue that has
type ``array of type '' is converted to an expression that has type
^^^^^^^^^^^^^^^^^^
``pointer to type '' that points to the initial member of the array
object and is not an lvalue.

Doesn't this imply that the standard considers a "string literal" to
be an lvalue with type "array of type" (presumably, "array of char")?
It's just that the language does something magical and weird in order
to initialize arrays of (qualified) char with string literals, that
doesn't involve their conversion to pointers. :)

-Arthur
 
D

Dan Pop

Simon said:
I presume that the answer is still 5, as the string
literal is treated here as a nul-terminated array of
char?

Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. A string literal is a purely syntactical construct. The
way it is translated depends on the context where it is used. Compare:

char a[] = "abcd";
char b[4] = "abcd";

The same string literal initialises a and b differently. Furthermore,
there is nothing preventing these arrays from being declared as arrays of
const char.

While I've no doubt you have a good point, doesn't Simon's answer more
closely hew to the wording of the Standard that you quoted elsethread?

Except when it is the operand of the sizeof operator or the unary &
operator, or is a character string literal used to initialize an array
^^^^^^^^^^^^^^^^^^^^^^^^============================
of character type, or is a wide string literal used to initialize an =================
array with element type compatible with wchar_t, an lvalue that has
type ``array of type '' is converted to an expression that has type
^^^^^^^^^^^^^^^^^^
``pointer to type '' that points to the initial member of the array
object and is not an lvalue.

Doesn't this imply that the standard considers a "string literal" to
be an lvalue with type "array of type" (presumably, "array of char")?

So what? The concept of lvalue is so broken that 123 is an lvalue, too
(in C99).

If a string literal were an array *in any context*, it couldn't be used
as an array initialiser, because arrays cannot be used as array
initialisers.

Unfortunately, the standard has chosen a completely broken way of
defining the semantics of string literals, so it had to introduce the
exception underlined above. The exception really belongs to the
specification of string literal semantics: a string literal used as
character array initialiser is NOT an lvalue.

Dan
 
A

Arthur J. O'Dwyer

Arthur J. O'Dwyer said:
Simon Biber writes:

Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. A string literal is a purely syntactical construct. The
way it is translated depends on the context where it is used. Compare:

char a[] = "abcd";
char b[4] = "abcd";

The same string literal initialises a and b differently. Furthermore,
there is nothing preventing these arrays from being declared as arrays of
const char.

Except when it is the operand of the sizeof operator or the unary &
operator, or is a character string literal used to initialize an array ^^^^^^^^^^^^^^^^^^^^^^^^============================
of character type, or is a wide string literal used to initialize an =================
array with element type compatible with wchar_t, an lvalue that has
type ``array of type '' is converted to an expression that has type
^^^^^^^^^^^^^^^^^^
``pointer to type '' that points to the initial member of the array
object and is not an lvalue.

Doesn't this imply that the standard considers a "string literal" to
be an lvalue with type "array of type" (presumably, "array of char")?

So what? The concept of lvalue is so broken that 123 is an lvalue, too
(in C99).

If a string literal were an array *in any context*, it couldn't be used
as an array initialiser, because arrays cannot be used as array
initialisers.

Sure they can. Lvalues of type "array of type" are converted to
non-lvalue expressions of type "pointer to type"; except when they
are string literals used to initialize character arrays, or in some
other cases. This directly implies that string literals used to
initialize character arrays *are* lvalues of type "array of type", but
are *not* converted to expressions of type "pointer to type".
Unfortunately, the standard has chosen a completely broken way of
defining the semantics of string literals, so it had to introduce the
exception underlined above. The exception really belongs to the
specification of string literal semantics: a string literal used as
character array initialiser is NOT an lvalue.

Nobody claimed it was, except the passage from the Standard. Simon
claimed that a string literal was an array of char; you said that
a string literal used to initialize an array was *not* an array of
char; and the Standard said that a string literal used to initialize
an array *is* an array of char. The Standard also says that such a
string literal is an lvalue, but that's irrelevant to Simon's claim.

String literals are arrays of char.
They are strings; hence they are null-terminated arrays of char.
They are non-modifiable.
They are not const-qualified.

Hence, string literals are null-terminated, non-modifiable but
non-const arrays of char.

-Arthur
 
D

Dan Pop

Arthur J. O'Dwyer said:
Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. A string literal is a purely syntactical construct. The
way it is translated depends on the context where it is used. Compare:

char a[] = "abcd";
char b[4] = "abcd";

The same string literal initialises a and b differently. Furthermore,
there is nothing preventing these arrays from being declared as arrays of
const char.

Except when it is the operand of the sizeof operator or the unary &
operator, or is a character string literal used to initialize an array ^^^^^^^^^^^^^^^^^^^^^^^^============================
of character type, or is a wide string literal used to initialize an =================
array with element type compatible with wchar_t, an lvalue that has
type ``array of type '' is converted to an expression that has type
^^^^^^^^^^^^^^^^^^
``pointer to type '' that points to the initial member of the array
object and is not an lvalue.

Doesn't this imply that the standard considers a "string literal" to
be an lvalue with type "array of type" (presumably, "array of char")?

So what? The concept of lvalue is so broken that 123 is an lvalue, too
(in C99).

If a string literal were an array *in any context*, it couldn't be used
as an array initialiser, because arrays cannot be used as array
initialisers.

Sure they can. Lvalues of type "array of type" are converted to
non-lvalue expressions of type "pointer to type"; except when they
are string literals used to initialize character arrays, or in some
other cases. This directly implies that string literals used to
initialize character arrays *are* lvalues of type "array of type", but
are *not* converted to expressions of type "pointer to type".

This is nothing more than a very contorted way of patching a bug
occurring elsewhere in the standard. Without that bug, all this silliness
would not have been necessary in the first place.

How can I take the address of these lvalues, BTW?
Nobody claimed it was, except the passage from the Standard. Simon
claimed that a string literal was an array of char; you said that
a string literal used to initialize an array was *not* an array of
char; and the Standard said that a string literal used to initialize
an array *is* an array of char. The Standard also says that such a
string literal is an lvalue, but that's irrelevant to Simon's claim.

String literals are arrays of char.
They are strings; hence they are null-terminated arrays of char.

The string literal "abc\0def\0ghi" doesn't look like a string to me. Am
I missing something? How about char a[2] = "ab"; ?
They are non-modifiable.

Then, how come the following code is legal:

char array[] = "abcde";
array[0] = 'A';
They are not const-qualified.

Then, how come the following code is illegal:

const char array[] = "abcde";
array[0] = 'A';
Hence, string literals are null-terminated, non-modifiable but
non-const arrays of char.

ONLY when they are not used as initialisers for character arrays.
Because the standard fails to mention this, we have the current idiotic
situation, where other parts of the standard need to be patched as above.

Dan
 
A

Arthur J. O'Dwyer

Arthur J. O'Dwyer said:
Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. [...]
If a string literal were an array *in any context*, it couldn't be used
as an array initialiser, because arrays cannot be used as array
initialisers.

Sure they can. Lvalues of type "array of type" are converted to
non-lvalue expressions of type "pointer to type"; except when they
are string literals used to initialize character arrays, or in some
other cases. This directly implies that string literals used to
initialize character arrays *are* lvalues of type "array of type", but
are *not* converted to expressions of type "pointer to type".

This is nothing more than a very contorted way of patching a bug
occurring elsewhere in the standard. Without that bug, all this silliness
would not have been necessary in the first place.

That may be. Still, I don't see anything wrong with the patch.
How can I take the address of these lvalues, BTW?

Duh.

char (*bar)[4];
bar = &"foo";

works for me. Or did you have something else in mind? (Note that
while you can take the address of the lvalue "foo", you can't take
the address of otherly-typed constants, like 5 or NULL. This has
never caused me discomfort.)
The string literal "abc\0def\0ghi" doesn't look like a string to me. Am
I missing something?

Apparently so. It's a null-terminated array of char, which is what
most C programmers call a "string."
How about char a[2] = "ab"; ?

That's a variable definition and initialization, not an object.
There are *two* objects referenced by that definition; the variable
'a', which has type char[2], and the string literal "ab", which has
type char[3] but might decay into something else - who knows? - but
not into a "pointer to char", because the Standard says so.

They are non-modifiable.

Then, how come the following code is legal:

char array[] = "abcde";
array[0] = 'A';

The integer constant 42 is non-modifiable. Then how come
the following code is legal:

int boo = 42;
boo = 1;

Take it up with Fortran.

They are not const-qualified.

Then, how come the following code is illegal:

const char array[] = "abcde";
array[0] = 'A';

Objects declared non-const are not const-qualified.
Then, how come the following code is illegal:

int boo = 42;
const int bar = boo;
bar = 1;

ONLY when they are not used as initialisers for character arrays.

Chapter and verse, please.
Because the standard fails to mention this,

....never mind.
we have the current idiotic situation,

Life is what you make it.
where other parts of the standard need to be patched as above.

I don't see any "patch." I see contorted wording, probably
driven by the need to give everything a type, but it looks
like perfectly consistent and reasonable wording to me.

-Arthur
 
S

Simon Biber

Dan Pop said:
The string literal "abc\0def\0ghi" doesn't look like a string
to me. Am I missing something?

No, I missed that. C99 footnote 65 "A character string literal
need not be a string (see 7.1.1), because a null character may
be embedded in it by a \0 escape sequence."
How about char a[2] = "ab"; ?

The initialiser is a string literal, it is a string, and does have
a null terminator, if it ever exists as an object in memory.

The variable defined is not a string literal. When the value is
copied from the initialiser to the defined variable, the null
terminator is left out.
They are non-modifiable.

Then, how come the following code is legal:

char array[] = "abcde";
array[0] = 'A';

It does not modify a string literal!
They are not const-qualified.

Then, how come the following code is illegal:

const char array[] = "abcde";
array[0] = 'A';

The const-qualified object you attempt to modify is not a string
literal. It only has its value initialised from a string literal.
ONLY when they are not used as initialisers for character arrays.

No. Just because you use a string literal as the initialiser for a
character array does not mean that the properties of that string
literal are propagated onto the defined object.
 
D

Dan Pop

Arthur J. O'Dwyer said:
Simon Biber writes:

Yes. A string literal is defined as a null-terminated,
non-modifyable but non-const array of char.

Nope, it ain't. [...]
If a string literal were an array *in any context*, it couldn't be used
as an array initialiser, because arrays cannot be used as array
initialisers.

Sure they can. Lvalues of type "array of type" are converted to
non-lvalue expressions of type "pointer to type"; except when they
are string literals used to initialize character arrays, or in some
other cases. This directly implies that string literals used to
initialize character arrays *are* lvalues of type "array of type", but
are *not* converted to expressions of type "pointer to type".

This is nothing more than a very contorted way of patching a bug
occurring elsewhere in the standard. Without that bug, all this silliness
would not have been necessary in the first place.

That may be. Still, I don't see anything wrong with the patch.
How can I take the address of these lvalues, BTW?

Duh.

char (*bar)[4];
bar = &"foo";

works for me.

I can't see "foo" being used as an initialiser for a character array.
So, this is certainly not the answer to my question.
Apparently so. It's a null-terminated array of char, which is what
most C programmers call a "string."

By this logic, "abc\0def\0ghi" and "abc" are the same string literal.
The C standard itself claims otherwise:

65) A character string literal need not be a string (see 7.1.1),
because a null character may be embedded in it by a \0
escape sequence.
How about char a[2] = "ab"; ?

That's a variable definition and initialization, not an object.
There are *two* objects referenced by that definition; the variable
'a', which has type char[2], and the string literal "ab", which has
type char[3] but might decay into something else - who knows? - but
not into a "pointer to char", because the Standard says so.

It is precisely this "object" that I was talking about. How can I
initialise an object of type char[2] with an object of type char[3]?
object, I must be
They are non-modifiable.

Then, how come the following code is legal:

char array[] = "abcde";
array[0] = 'A';

The integer constant 42 is non-modifiable. Then how come
the following code is legal:

int boo = 42;
boo = 1;

Take it up with Fortran.

Bad analogy. Where does the ``object'' "abcde" reside? Inside array!
They are not const-qualified.

Then, how come the following code is illegal:

const char array[] = "abcde";
array[0] = 'A';

Objects declared non-const are not const-qualified.
Then, how come the following code is illegal:

int boo = 42;
const int bar = boo;
bar = 1;

Another bad analogy. For the same reason.
Chapter and verse, please.

How do I get the address of such an array of characters? How can a
program check the existence of such an array of characters? I mean, of
course, one that it used as initialiser for an array of char.

The only possible answer would be that it resides inside the array being
initialised. But this leads to the above mentioned "paradoxes".
...never mind.


Life is what you make it.

In an ideal world, maybe. In the real one, there are plenty of things
beyond your control. Some of them, not particularly pleasant.
I don't see any "patch." I see contorted wording, probably
driven by the need to give everything a type, but it looks
like perfectly consistent and reasonable wording to me.

Feel free to call the patch "contorted wording". There is no need to
give every syntactical element of the language a type. What is the type
of the {1, "foo", 3.0} initialiser?

With a proper definition of the string literal semantics, there would be
no need for such "contorted wording".

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top