Definition of expression and statement.

D

dspfun

Hi!

The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

This is what I have found (textbooks and own conclusions), please
correct if/where wrong.

-------------------------------------------------
An expression is:
An expression contains data or no data.
Every expression has a type and, if the type is not a void, a value.
An expression can contain zero or more operands, and zero or more
operators.
The simplest expressions consists of a single constant, a variable or
a function call.
An expression can contain an assignment.
An expression never contains a semicolon.
Expressions can be joined with other expressions to form more complex
expressions.
Expressions can serve as operands.
A statement will become an expression if the semicolon is removed
(not true for block statements though).
The values of expressions that starts immediately after a semicolon
and ends immediately before next semicolon are always discarded.

Examples:
4 * 512 //Type: int. Value: 2048.
printf("An example!\n) //Type: int Value: Whatever is returned from
printf.
1.0 + sin(x) //Type: double Value: Whatever is the result of the
expression.
srand((unsigned)time(NULL)) //Type: void. Value: None.
(int*)malloc(sizeof(int)) //Type: int*. Value: The address returned
by malloc.
1++ //Type: int. Value: 2, right?
a++ //Type: Depends on a. Value: One more than a.
x = 5 //Type: depends on the type of variable x, right? Value: 5.
2 * 32767 //Type: depends on INT_MAX, right? Value: 65534
Question: what is the type of the expression above?
a //Type: Depends on a. Value: Depends on a.
1 //Type: int. Value: 1
f() //Type: depends on return type of f(). Value: Depends on what
f() returns.

Right?

In the expressions above the values of the expressions are "thrown
away", right?

Any more examples of expressions which are not the same/variants of
above examples?

-------------------------------------------------

A statement is:
Anything separated by semicolons, unless it's a declaration or an
expression in a for statement.
Statements specify an action to be performed, such as an operation or
function call.
Statements are program constructs followed by a semicolon.
An expression that is executed is a statement, right?
Statements do not have a value or a type.
A statement specifies an action to be performed, such as an
arithmetic operation of a function call.
Everey statement that is not a block is terminated by a semicolon.
A statement is always "atomic", i.e., a statement cannot be broken
down into "sub" statements.
The following are statements:
Assignment(=)
Compound ({...})
break
continue
goto
label
if
do, while and for
return
switch

Examples of statements:
All the above expressions will become statements when a semicolon is
added to the expression.

Question: Is it possible to have a statement with a semicolon, which
will not become an expression
when the semicolon is removed?

-------------------------------------------------
Also,

What is the defintion of an expression statement, and how is it
different from a statement and an expression?
Is it just an expression followed by a semicolon.

What is the definition of a block statement?
Is it just one or more statements within curly braces?

BRs!
 
O

osmium

dspfun said:
The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

I think the only really clear definition comes from a study of the BNF of
the language. (BNF - Backus Normal From/ Backus Naur Form.) Have you
tried Wikipedia?
 
J

James Kuyper

dspfun said:
Hi!

The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

Section 6.5p1 says:
"An _expression_ is a sequence of operators and operands that specifies
computation of a value, or that designates an object or a function, or
that generates side effects, or that performs a combination thereof."


Section 6.8p2 says:
"A _statement_ specifies an action to be performed. ..."

The '_' characters around a word indicate that it was italicized in the
original text. That is the standard's way of indicating that these
clauses count as definitions of those terms.
This is what I have found (textbooks and own conclusions), please
correct if/where wrong.

Note: I've only corrected you where wrong; I've cut out everything you
wrote in which I found no error (which is not to say that there were no
errors, only that I didn't find them).

....
An expression never contains a semicolon.

Technically incorrect: c = ';' is an expression. However, expressions
will never contain a semicolon as a token. In that expression, ';' is a
token, but the semicolon character itself is not.

....
A statement will become an expression if the semicolon is removed
(not true for block statements though).

This true for expression statements, but not necessarily for other
kinds. Example:

return;

....
1++ //Type: int. Value: 2, right?

The left operand of ++ must be an modifiable lvalue. It cannot be an
integer literal.

a++ //Type: Depends on a. Value: One more than a.

The value of that expression is the value of a before it was
incremented. Note that if 'a' is already at it's maximum, the behavior
of that expression is undefined unless a has an unsigned type.

....
A statement is:
Anything separated by semicolons, unless it's a declaration or an
expression in a for statement.

Statements are not separated by semicolons. Statements include the
semicolon. Also, note that a compound statement is terminated by a '}',
not a semicolon. Finally, note that declarations are also terminated by
semicolons.

....
Statements are program constructs followed by a semicolon.

Not in the case of compound statements.

....
An expression that is executed is a statement, right?

No. The three expressions in a for(a; b; c) construct are executed, but
none of them are statements in themselves.
... A statement is always "atomic", i.e., a statement cannot be broken
down into "sub" statements.

Not true for compound, selection, or iteration statements. Each of those
contain sub-statements.
Question: Is it possible to have a statement with a semicolon, which
will not become an expression
when the semicolon is removed?
return;

What is the defintion of an expression statement, and how is it
different from a statement and an expression?

An expression statement is a particular kind of statement. There are
many other kinds. An expression statement contains an expression; it is
not itself an expression.
Is it just an expression followed by a semicolon.
Yes.

What is the definition of a block statement?
Is it just one or more statements within curly braces?

Yes.
 
H

Harald van Dijk

Technically incorrect: c = ';' is an expression. However, expressions
will never contain a semicolon as a token. In that expression, ';' is a
token, but the semicolon character itself is not.

Semicolons can occur in declarations nested within expressions.

(struct S { int member; }) { 0 }

The above is a perfectly valid expression of type struct S.
 
M

manisha

Hi!

The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

This is what I have found (textbooks and own conclusions), please
correct if/where wrong.

-------------------------------------------------
An expression is:
 An expression contains data or no data.
 Every expression has a type and, if the type is not a void, a value.
 An expression can contain zero or more operands, and zero or more
operators.
 The simplest expressions consists of a single constant, a variable or
a function call.
 An expression can contain an assignment.
 An expression never contains a semicolon.
 Expressions can be joined with other expressions to form more complex
expressions.
 Expressions can serve as operands.
 A statement will become an expression if the semicolon is removed
(not true for block statements though).
 The values of expressions that starts immediately after a semicolon
and ends immediately before next semicolon are always discarded.

Examples:
 4 * 512                        //Type: int.    Value: 2048.
 printf("An example!\n)    //Type: int     Value: Whatever is returned from
printf.
 1.0 + sin(x)           //Type: double  Value: Whatever is the result of the
expression.
 srand((unsigned)time(NULL))    //Type: void.   Value: None.
 (int*)malloc(sizeof(int))      //Type: int*.   Value: The address returned
by malloc.
 1++                    //Type: int.    Value: 2, right?
 a++                    //Type: Depends on a. Value: One more than a.
 x = 5                  //Type: depends on the type of variable x, right? Value: 5.
 2 * 32767                      //Type: depends on INT_MAX, right? Value: 65534
 Question: what is the type of the expression above?
 a                      //Type: Depends on a. Value: Depends on a.
 1                      //Type: int.     Value: 1
 f()                    //Type: depends on return type of f(). Value: Depends on what
f() returns.

Right?

In the expressions above the values of the expressions are "thrown
away", right?

Any more examples of expressions which are not the same/variants of
above examples?

-------------------------------------------------

A statement is:
 Anything separated by semicolons, unless it's a declaration or an
expression in a for statement.
 Statements specify an action to be performed, such as an operation or
function call.
 Statements are program constructs followed by a semicolon.
 An expression that is executed is a statement, right?
 Statements do not have a value or a type.
 A statement specifies an action to be performed, such as an
arithmetic operation of a function call.
 Everey statement that is not a block is terminated by a semicolon.
 A statement is always "atomic", i.e., a statement cannot be broken
down into "sub" statements.
The following are statements:
 Assignment(=)
 Compound ({...})
 break
 continue
 goto
 label
 if
 do, while and for
 return
 switch

Examples of statements:
 All the above expressions will become statements when a semicolon is
added to the expression.

Question: Is it possible to have a statement with a semicolon, which
will not become an expression
when the semicolon is removed?

-------------------------------------------------
Also,

What is the defintion of an expression statement, and how is it
different from a statement and an expression?
Is it just an expression followed by a semicolon.

What is the definition of a block statement?
Is it just one or more statements within curly braces?

BRs!

hello,
an expression is a combination of one or more operators, operands and
constants which is arranged according to the precedences of operators
and rules of the corresponding languages, an expression every time
produces a result, expressions are in general of several types such
as..constant expression, integral, float, logical, relational, boolean
and bitwise depending upon the value which is produced by an
expression. On the other hand, a statement may be any instruction
given to the computer it is followed by a semicolon, it may contain
keywords, variables, functions etc. statements are also of different
types for eg. control statements, looping statements, branching
statements, i/o statements, type declaration and etc. When an
expression is followed by a semicolon then such stmt. may be called as
a expression stmt. eg. c=a*b;
A block statement is nothing but a group of statements enclosed within
curly braces sometimes it is also called as compound statement and it
has to be every time placed within two braces, most of the times it is
used in loops and function definitions.
 
G

Gordon Burditt

The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

An expression followed by a semicolon is one type of statement.
It is NOT the only type of statement; there are many others.
An expression is:
An expression contains data or no data.

I'm not sure what you mean by this, but the expression:
""
might be considered to be an exception.

Ok.
Every expression has a type and, if the type is not a void, a value.
An expression can contain zero or more operands, and zero or more
operators. Ok.
The simplest expressions consists of a single constant, a variable or
a function call.

I don't think I'd call a function call "simple", especially since the
arguments can get very complicated..
An expression can contain an assignment. Ok.
An expression never contains a semicolon.

c = ';'
is a valid expression. So is:
message = "H;e;l;l;o;;;W;o;r;l;d;\n";
Expressions can be joined with other expressions to form more complex
expressions.

Ok, but not to an unlimited extent, as there are type rules.
Expressions can serve as operands.
Ok.

A statement will become an expression if the semicolon is removed
(not true for block statements though).

This is only true for expression statements. The following are not
expressions:
return 5
break
int i
continue
and if, for, do-while, while, switch, etc. statements aren't expressions either.
The values of expressions that starts immediately after a semicolon
and ends immediately before next semicolon are always discarded.

This is an expression statement you are describing, and yes, the value
is discarded.
Examples:
4 * 512 //Type: int. Value: 2048.
printf("An example!\n) //Type: int Value: Whatever is returned from
printf.
1.0 + sin(x) //Type: double Value: Whatever is the result of the
expression.
srand((unsigned)time(NULL)) //Type: void. Value: None.
(int*)malloc(sizeof(int)) //Type: int*. Value: The address returned
by malloc.
1++ //Type: int. Value: 2, right?

Error. 1 is not an lvalue. This should not compile.
a++ //Type: Depends on a. Value: One more than a.

Incorrect. The value returned by a++ is the original value of a.
x = 5 //Type: depends on the type of variable x, right? Value: 5.
2 * 32767 //Type: depends on INT_MAX, right? Value: 65534

This is signed int multiplied by signed int, so the result is signed int.
The value might be 65534 if it is representable in signed int, which is
not guaranteed (and won't be if int is 16 bits).
Question: what is the type of the expression above?
a //Type: Depends on a. Value: Depends on a.
1 //Type: int. Value: 1
f() //Type: depends on return type of f(). Value: Depends on what
f() returns.

Right?

In the expressions above the values of the expressions are "thrown
away", right?

Yes, if they are used as expression statements. No, if they are used
as function arguments or part of a larger expression.
Any more examples of expressions which are not the same/variants of
above examples?

-------------------------------------------------

A statement is:
Anything separated by semicolons, unless it's a declaration or an
expression in a for statement.

This is way too simple and does not account for semicolons in character
constants or quoted string constants or comments. It also doesn't account
for things like:

while(borg(foo++) > 0) { }

Statements specify an action to be performed, such as an operation or
function call.

It is debatable whether a null statement (lone semicolon) can be considered
to specify an action. Also a constant as a statement expression doesn't
call for any action:
42;
Statements are program constructs followed by a semicolon.

Some statements don't have their own semicolon but use one in a
statement that's a part of it, for example:

if (foo) printf("Thou hast committed a foo!\n");
An expression that is executed is a statement, right?

An expression that is a part of a larger expression is not a statement.
An expression that is never executed is still an expression:

if (0) {
a++;
} else {
b++;
}
a++ and b++ above are both expression statements. The fact that a++ will
never be executed is irrelevant.
Statements do not have a value or a type.
A statement specifies an action to be performed, such as an
arithmetic operation of a function call.

This depends a little on how loose you are with the definition of "action".
Everey statement that is not a block is terminated by a semicolon.

while (1) { 42; }
is not a block (but contains one) and does not end in a semicolon.
A statement is always "atomic", i.e., a statement cannot be broken
down into "sub" statements.

That gets iffy if you consider that a left brace followed by zero or more
statements followed by a right brace is a statement.
The following are statements:
Assignment(=)
I think you're looking for "expression statement" here.
An assignment need not be an expression statement or in an expression statement:

for (; foo(a = 3, b = 4, c = 5); ) { bar(); }
Compound ({...})
break
continue
goto
label
if
do, while and for
return
switch

Examples of statements:
All the above expressions will become statements when a semicolon is
added to the expression.

Which above expressions? Immediately above I see a list of statements,
not expressions.

An expression followed by a semicolon is an expression statement.
Question: Is it possible to have a statement with a semicolon, which
will not become an expression
when the semicolon is removed?

Yes, and you listed some of them above.
break continue goto if do, while and for return switch
What is the defintion of an expression statement, and how is it
different from a statement and an expression?
Is it just an expression followed by a semicolon.

Yes. A sub-expression of an expression is an expression but it is
not an expression statement.
What is the definition of a block statement?
Is it just one or more statements within curly braces?
Yes.
 
K

Keith Thompson

James Kuyper said:
dspfun said:
The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?
[...]
Note: I've only corrected you where wrong; I've cut out everything you
wrote in which I found no error (which is not to say that there were
no errors, only that I didn't find them). [...]
...
An expression never contains a semicolon.

Technically incorrect: c = ';' is an expression. However, expressions
will never contain a semicolon as a token. In that expression, ';' is
a token, but the semicolon character itself is not.

Harald showed an example of an expression containing a semicolon
token. (I probably wouldn't have though of that one myself.)

[...]
The left operand of ++ must be an modifiable lvalue. It cannot be an
integer literal.

I think a lot of newbie C programmers are so fascinated by the "++"
and "--" operators that they forget that the way to add one to an
expression is simply "... + 1".

[...]
An expression statement is a particular kind of statement. There are
many other kinds. An expression statement contains an expression; it
is not itself an expression.


Yes.

According to the grammar, the expression in an expression statement
is optional; thus a null statement
;
is a special case of an expression statement.

I don't know why it was defined this way. I think it would have been
simpler to define the null statement as a separate kind of statement.

Correction: zero or more statements. Actually, zero or more
"block-items", where a block-item is either a declaration or a
statement. (In C90, all the declarations must precede all the
statements; in C99, they can be mixed.)
 
K

Keith Thompson

manisha said:
an expression is a combination of one or more operators, operands and
constants which is arranged according to the precedences of operators
and rules of the corresponding languages, an expression every time
produces a result,

An expression of type void produces no result.
expressions are in general of several types such
as..constant expression, integral, float, logical, relational, boolean
and bitwise depending upon the value which is produced by an
expression.

Expressions can be classified in a number of ways, e.g., by the type
of the expression (int, void, double*, etc.) or by the *kind* of
expression, determined by the top-most operator. Your list mixes
these two kinds of classification.
On the other hand, a statement may be any instruction
given to the computer it is followed by a semicolon, it may contain
keywords, variables, functions etc. statements are also of different
types for eg. control statements, looping statements, branching
statements, i/o statements, type declaration and etc.

C has no i/o statements; i/o is done by function calls, which
typically appear in expression statements.

Declarations are not statements. said:
When an
expression is followed by a semicolon then such stmt. may be called as
a expression stmt. eg. c=a*b;
A block statement is nothing but a group of statements enclosed within
curly braces sometimes it is also called as compound statement and it
has to be every time placed within two braces, most of the times it is
used in loops and function definitions.

A block statement can also contain declarations, or it can be empty.
 
K

Keith Thompson

manisha said:
an expression is a combination of one or more operators, operands and
constants which is arranged according to the precedences of operators
and rules of the corresponding languages, an expression every time
produces a result,

An expression of type void produces no result.
expressions are in general of several types such
as..constant expression, integral, float, logical, relational, boolean
and bitwise depending upon the value which is produced by an
expression.

Expressions can be classified in a number of ways, e.g., by the type
of the expression (int, void, double*, etc.) or by the *kind* of
expression, determined by the top-most operator. Your list mixes
these two kinds of classification.
On the other hand, a statement may be any instruction
given to the computer it is followed by a semicolon, it may contain
keywords, variables, functions etc. statements are also of different
types for eg. control statements, looping statements, branching
statements, i/o statements, type declaration and etc.

C has no i/o statements; i/o is done by function calls, which
typically appear in expression statements.

Declarations are not statements. said:
When an
expression is followed by a semicolon then such stmt. may be called as
a expression stmt. eg. c=a*b;
A block statement is nothing but a group of statements enclosed within
curly braces sometimes it is also called as compound statement and it
has to be every time placed within two braces, most of the times it is
used in loops and function definitions.

A block statement can also contain declarations, or it can be empty.
 
A

Army1987

dspfun said:
Hi!

The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

This is what I have found (textbooks and own conclusions), please
correct if/where wrong.

-------------------------------------------------
An expression is:
An expression contains data or no data.
Every expression has a type and, if the type is not a void, a value.
An expression can contain zero or more operands, and zero or more
operators.
The simplest expressions consists of a single constant, a variable or
a function call.
An expression can contain an assignment.
An expression never contains a semicolon.
putchar(';') is an expression...
Expressions can be joined with other expressions to form more complex
expressions.
Expressions can serve as operands.
A statement will become an expression if the semicolon is removed
(not true for block statements though).
Not true for return statements, either. Or break statements.
The other way round (an expression becomes a statement when a semicolon is
added) is correct.
The values of expressions that starts immediately after a semicolon
and ends immediately before next semicolon are always discarded.
True, yet a very complicate way to state that.
Simpler and more accurate: "A statement of the form expression; evaluates
the expression for side effects, and discards its value."
Examples:
4 * 512 //Type: int. Value: 2048.
printf("An example!\n) //Type: int Value: Whatever is returned from
printf.
1.0 + sin(x) //Type: double Value: Whatever is the result of the
expression.
srand((unsigned)time(NULL)) //Type: void. Value: None.
(int*)malloc(sizeof(int)) //Type: int*. Value: The address returned
by malloc.
1++ //Type: int. Value: 2, right?
No. You can't modify a constant. (You meant 1+1, right?)
a++ //Type: Depends on a. Value: One more than a. x = 5 //Type:
depends on the type of variable x, right? Value: 5. 2 * 32767 //Type:
depends on INT_MAX, right? Value: 65534
The type is int. Whether it works depends on INT_MAX.
Question: what is the type of
the expression above? a //Type: Depends on a. Value: Depends on a. 1
//Type: int. Value: 1
f() //Type: depends on return type of f(). Value: Depends on what
f() returns.

Right? Yeah.
In the expressions above the values of the expressions are "thrown
away", right? It depends on where they are.
Any more examples of expressions which are not the same/variants of
above examples?
&& || said:
A statement is:
Anything separated by semicolons, unless it's a declaration or an
expression in a for statement. {} is a statement.
Statements specify an action to be performed, such as an operation or
function call.
Not necessarily. ((void)0); is a statement.
Statements are program constructs followed by a semicolon. An
expression that is executed is a statement, right? Statements do not
have a value or a type. A statement specifies an action to be
performed, such as an
arithmetic operation of a function call.
Everey statement that is not a block is terminated by a semicolon. A
statement is always "atomic", i.e., a statement cannot be broken
down into "sub" statements.
Wrong.
if (foo) { bar(); baz(); } is a statement, but even bar(); and baz(); are
themselves statements, and so is { bar(); baz(); }.
The following are statements:
Assignment(=)
Assignments are expression (though they become statements with a ;)
Compound ({...})
break
continue
goto
label
if
do, while and for
return
switch

Examples of statements:
All the above expressions will become statements when a semicolon is
added to the expression.

Question: Is it possible to have a statement with a semicolon, which
will not become an expression
when the semicolon is removed?
return 0;
break;
goto lab;
------------------------------------------------- Also,

What is the defintion of an expression statement, and how is it
different from a statement and an expression? Is it just an expression
followed by a semicolon. Yes.

What is the definition of a block statement? Is it just one or more
statements within curly braces?
Yes, but C99 complicates the rules.
enum {a, b};
int different(void)
{
if (sizeof(enum {b, a}) != sizeof(int))
return a; // a == 1
return b; // which b?
}
In C99 the first two lines after the { form a block, so, unlike in C89,
the b in return b; is 1.
 
C

Chris Torek

The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Others have gone through a lot of examples and given various
corrections. I would just like to emphasize a few details.
Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

As at least one person noted, the real heart of the difference is
actually syntactic. An "expression" is that which is permitted
syntactically by the grammar in the C Standard (whichever standard
you use -- C89 or C99).

In any case, *every* C expression can be turned into a statement
simply by adding a semicolon at the end, but the reverse is not
true. This is because the grammar (C89 or C99, either one) has
various additional things recognized as "statement" that, even if
they end with a semicolon, are not recognized as an "expression"
without that semicolon. For instance, a while loop:

while (expr) statement;

is itself a statement (specifically, an "iteration-statement"),
but removing the semicolon does not turn it into an expression.

The C99 grammar includes the following fragments:

statement:
labeled-statement
compound-statement
expression-statement
selection-statement
iteration-statement
jump-statement

expression-statement:
expression-opt ;

This last (the expression-statement part of the grammar) is why
any expression can be turned into a statement.

The fact that a while loop (like a do-while or for loop) is recognized
only by the "iteration-statement" part of the grammar is why it
does not become a statement when removing the semicolon.

Last, although this is not relevant to the distinction between
"expression" and "statement": There is a key item here that I
think many people miss here as well:
1++ //Type: int. Value: 2, right?
a++ //Type: Depends on a. Value: One more than a.

(As others noted, "1++" is a constraint violation and thus requires
a diagnostic. "a++" is OK -- that is, is not a constraint violation
as long as "a" is a "modifiable lvalue". It may have undefined
behavior, e.g., if a is an "int" variable and is initially set to
INT_MAX, but no diagnostic is required for this, and programmers
should not expect one. The value is not "one more than a", but
rather, "the value a had before the increment took place".)

In C, expressions produce values (with one possible exception:
expressions of type "void" produce no value, or produce "a value
of type void", depending on who you ask; even the C Standard appears
to be a bit confused on this issue :) ). However, expressions
also have "side effects". (A "side effect" is, loosely speaking,
a change in a variable. Things like printing output are also
"side effects" in computing theory, although in C this is simply
done with function calls, e.g., printf(). Side effects are quite
important in computing theory because operations *without* side
effects are always completely reversible. This means that "debugging"
is, at its heart, simply the process of tracking all side effects
-- all other operations can be trivially backed-up-over.)

The various modifier operators, including the prefix and postfix
increment and decrement, have TWO uses: they (a) produce a value,
and (b) have a side effect. Sometimes, in programming in C, we
want a value; sometimes we want a side effect; sometimes we even
want both. We can use these modifier operators for their side
effects, or for both their values *and* their side effects. For
instance, in a loop like:

for (i = 0; i < N; i++)

we have two modifier-operators: initally we set i to 0, and each
time at the end of the loop, we increment i. Here, the "=" operator
is used purely for its side effect: it sets i to 0. The value of
the entire operation is 0, but this value is discarded. Similarly,
the "++" operator produces a value -- in this case, the previous
value of i -- but we throw that value away, as the only thing we
want is the side effect, of increasing i by 1.

Because we only want the side effect, we could use any other operation
that *also* increases i by 1:

for (i = 0; i < N; ++i)

and:

for (i = 0; i < N; i = i + 1)

are all equally valid ways to write the loop.

Examples of places where we want *both* the value *and* the side
effect are not quite as common, but do occur. For instance, if p
points into a string that contains some 'x' characters, and *p is
currently one of the 'x' characters, the following line skips over
that x and any subsequent 'x', so that *p will be whatever character
comes after the "x"s. E.g., if p points into "hexxllo world", *p
will be 'l' after the loop ends; if it points into "magix", *p will
be '\0':

while (*p++ == 'x')
continue;

Here, the "++" operator is used both for the value it produces --
i.e., "give me the value p had before an increment occurs" -- and
for its side effect -- i.e., "and also please increment p before
the next sequence point". (The old value of p is then given to
the unary "*" operator, which fetches the character to which p
pointed before the increment happened. The compiler is free to
arrange for p to be incremented first or last or anywhere in between,
as long as it manages to fetch *(whatever_p_used_to_be). On some
machines, it may make sense to increment p first, then fetch p[-1];
on some, it may make sense to increment p last; on some, it may be
possible to increment p while simultaneously fetching, e.g., using
the auto-increment addressing mode on a PDP-11, or the writeback
feature of the ARM.)

Something some C programmers do, but I claim is dodgy at best, is
use modifier operators purely for their value. For instance,
consider the following rather silly function, and an example of
its use:

int three_more(int x) {
return x += 3;
}

#include <stdio.h>

int main(void) {
printf("%d\n", three_more(39));
return 0;
}

which prints 42. The three_more() function uses the "+=" operator
to modify x (a side effect) *and* produce a value (the value x will
have after the increment-by-3), but -- by returning, in this case
returning the value-after-increment -- immediately throws away the
incremented variable "x". This is valid, "legal" C code, but to
me it "makes more sense" to write:

int better_three_more(int x) {
return x + 3;
}

For some reason, beginning C programmers often seem to be fascinated
by the "double effect" of modifier operators -- especially the
prefix and postfix increment and decrement operators -- that have
both a side effect *and* a value, and wind up "overusing" them (as
in three_more() above). This seems to lead to the desire to write
things like "1++" or "++41", which are not only pointless (a la
the modification to x in three_more()), but invalid (draw a
diagnostic, and usually fail to compile at all).
 
G

Golden California Girls

Chris said:
Others have gone through a lot of examples and given various
corrections. I would just like to emphasize a few details.


As at least one person noted, the real heart of the difference is
actually syntactic. An "expression" is that which is permitted
syntactically by the grammar in the C Standard (whichever standard
you use -- C89 or C99).

In any case, *every* C expression can be turned into a statement
simply by adding a semicolon at the end, but the reverse is not
true. This is because the grammar (C89 or C99, either one) has
various additional things recognized as "statement" that, even if
they end with a semicolon, are not recognized as an "expression"
without that semicolon. For instance, a while loop:

while (expr) statement;

is itself a statement (specifically, an "iteration-statement"),
but removing the semicolon does not turn it into an expression.

The C99 grammar includes the following fragments:

statement:
labeled-statement
compound-statement
expression-statement
selection-statement
iteration-statement
jump-statement

expression-statement:
expression-opt ;

This last (the expression-statement part of the grammar) is why
any expression can be turned into a statement.

int main(int argc, char *argv[])
{ int a;
a+1;
return(0);
};

Legal program. Doesn't do much though. And your compiler may emit a warning
message. And it shows that an expression can be turned into a statement by
putting a semicolon after it.

As Chris points out you can't take a semicolon off a statement and always get an
expression. An expression has a value.

However "goto mess;" is a statement. You can't write "x = goto mess;" because
"goto mess" isn't an expression. It doesn't have a value.
 
D

dspfun

In C, expressions produce values (with one possible exception:
expressions of type "void" produce no value, or produce "a value
of type void", depending on who you ask; even the C Standard appears
to be a bit confused on this issue :) ).  However, expressions
also have "side effects".  (A "side effect" is, loosely speaking,
a change in a variable.   Things like printing output are also
"side effects" in computing theory, although in C this is simply
done with function calls, e.g., printf().  Side effects are quite
important in computing theory because operations *without* side
effects are always completely reversible.  This means that "debugging"
is, at its heart, simply the process of tracking all side effects
-- all other operations can be trivially backed-up-over.)

The various modifier operators, including the prefix and postfix
increment and decrement, have TWO uses: they (a) produce a value,
and (b) have a side effect.  Sometimes, in programming in C, we
want a value; sometimes we want a side effect; sometimes we even
want both.  We can use these modifier operators for their side
effects, or for both their values *and* their side effects.

Thank you Chris and others for great answers!

Because of the *double effect* of modifier operators, is it a good
idea to always convert expressions to void expressions when the
value is not used but only the side effect is used? This way the
discarding of the value is made explicit.

For example:
(void) a++

Instead of:
a++
 
J

James Kuyper

dspfun said:
Because of the *double effect* of modifier operators, is it a good
idea to always convert expressions to void expressions when the
value is not used but only the side effect is used? This way the
discarding of the value is made explicit.

For example:
(void) a++

Instead of:
a++

No, because the value of the expression in an expression-statement is
always discarded, so you'd be putting (void) at the start of every
expression-statement. You should consider that the discarding is
implicit in the ';' at the end of the statement, and therefore doesn't
require a (void) at the beginning.
 
A

Army1987

dspfun said:
Because of the *double effect* of modifier operators, is it a good
idea to always convert expressions to void expressions when the
value is not used but only the side effect is used? This way the
discarding of the value is made explicit.

For example:
(void) a++

Instead of:
a++

It's a matter of style. I have even seen a program with
#define V (void)
and many instances of expressions (especially function calls) whose value
was discarded were written as
V printf("foo");
The only thing that achieves is silencing lint and similar programs. I
don't usually use (void), except when one would naturally think of an
expression as "throwing away" something, e.g. (void)getchar(); throws
away a character, or (void)rand(); (should I ever use it, I haven't so far)
throws away a number from a pseudorandom sequence. On the other hand,
fprintf(stderr, "Cannot open '%s' for reading: %s\n", argv[1],
strerror(errno));
simply prints an error message, and the fact that it does return a value
which is discarded is somewhat irrelevant. So in this case I spare the
(void).
 
S

somenath

dspfun said:
The words "expression" and "statement" are often used in C99 and C-
textbooks, however, I am not sure of the clear defintion of these
words with respect to C.

Others have gone through a lot of examples and given various
corrections. I would just like to emphasize a few details.
Can somebody provide a sharp defintion of "expression" and
"statement"? What is the difference between an expression and a
statement?

As at least one person noted, the real heart of the difference is
actually syntactic. An "expression" is that which is permitted
syntactically by the grammar in the C Standard (whichever standard
you use -- C89 or C99).

In any case, *every* C expression can be turned into a statement
simply by adding a semicolon at the end, but the reverse is not
true. This is because the grammar (C89 or C99, either one) has
various additional things recognized as "statement" that, even if
they end with a semicolon, are not recognized as an "expression"
without that semicolon. For instance, a while loop:

while (expr) statement;

is itself a statement (specifically, an "iteration-statement"),
but removing the semicolon does not turn it into an expression.

The C99 grammar includes the following fragments:

statement:
labeled-statement
compound-statement
expression-statement
selection-statement
iteration-statement
jump-statement

expression-statement:
expression-opt ;

This last (the expression-statement part of the grammar) is why
any expression can be turned into a statement.

The fact that a while loop (like a do-while or for loop) is recognized
only by the "iteration-statement" part of the grammar is why it
does not become a statement when removing the semicolon.

Last, although this is not relevant to the distinction between
"expression" and "statement": There is a key item here that I
think many people miss here as well:
1++ //Type: int. Value: 2, right?
a++ //Type: Depends on a. Value: One more than a.

(As others noted, "1++" is a constraint violation and thus requires
a diagnostic. "a++" is OK -- that is, is not a constraint violation
as long as "a" is a "modifiable lvalue". It may have undefined
behavior, e.g., if a is an "int" variable and is initially set to
INT_MAX, but no diagnostic is required for this, and programmers
should not expect one. The value is not "one more than a", but
rather, "the value a had before the increment took place".)

In C, expressions produce values (with one possible exception:
expressions of type "void" produce no value, or produce "a value
of type void", depending on who you ask; even the C Standard appears
to be a bit confused on this issue :) ). However, expressions
also have "side effects". (A "side effect" is, loosely speaking,
a change in a variable. Things like printing output are also
"side effects" in computing theory, although in C this is simply
done with function calls, e.g., printf(). Side effects are quite
important in computing theory because operations *without* side
effects are always completely reversible. This means that "debugging"
is, at its heart, simply the process of tracking all side effects
-- all other operations can be trivially backed-up-over.)

The various modifier operators, including the prefix and postfix
increment and decrement, have TWO uses: they (a) produce a value,
and (b) have a side effect. Sometimes, in programming in C, we
want a value; sometimes we want a side effect; sometimes we even
want both. We can use these modifier operators for their side
effects, or for both their values *and* their side effects. For
instance, in a loop like:

for (i = 0; i < N; i++)

we have two modifier-operators: initally we set i to 0, and each
time at the end of the loop, we increment i. Here, the "=" operator
is used purely for its side effect: it sets i to 0. The value of
the entire operation is 0, but this value is discarded. Similarly,
the "++" operator produces a value -- in this case, the previous
value of i -- but we throw that value away, as the only thing we
want is the side effect, of increasing i by 1.

Because we only want the side effect, we could use any other operation
that *also* increases i by 1:

for (i = 0; i < N; ++i)

and:

for (i = 0; i < N; i = i + 1)

are all equally valid ways to write the loop.

Examples of places where we want *both* the value *and* the side
effect are not quite as common, but do occur. For instance, if p
points into a string that contains some 'x' characters, and *p is
currently one of the 'x' characters, the following line skips over
that x and any subsequent 'x', so that *p will be whatever character
comes after the "x"s. E.g., if p points into "hexxllo world", *p
will be 'l' after the loop ends; if it points into "magix", *p will
be '\0':

while (*p++ == 'x')
continue;

Here, the "++" operator is used both for the value it produces --
i.e., "give me the value p had before an increment occurs" -- and
for its side effect -- i.e., "and also please increment p before
the next sequence point". (The old value of p is then given to
the unary "*" operator, which fetches the character to which p
pointed before the increment happened. The compiler is free to
arrange for p to be incremented first or last or anywhere in between,
as long as it manages to fetch *(whatever_p_used_to_be). On some
machines, it may make sense to increment p first, then fetch p[-1];
on some, it may make sense to increment p last; on some, it may be
possible to increment p while simultaneously fetching, e.g., using
the auto-increment addressing mode on a PDP-11, or the writeback
feature of the ARM.)

Something some C programmers do, but I claim is dodgy at best, is
use modifier operators purely for their value. For instance,
consider the following rather silly function, and an example of
its use:

int three_more(int x) {
return x += 3;
}

#include <stdio.h>

int main(void) {
printf("%d\n", three_more(39));
return 0;
}

which prints 42. The three_more() function uses the "+=" operator
to modify x (a side effect) *and* produce a value (the value x will
have after the increment-by-3), but -- by returning, in this case
returning the value-after-increment -- immediately throws away the
incremented variable "x". This is valid, "legal" C code, but to
me it "makes more sense" to write:

int better_three_more(int x) {
return x + 3;
}

For some reason, beginning C programmers often seem to be fascinated
by the "double effect" of modifier operators -- especially the
prefix and postfix increment and decrement operators -- that have
both a side effect *and* a value, and wind up "overusing" them (as
in three_more() above). This seems to lead to the desire to write
things like "1++" or "++41", which are not only pointless (a la
the modification to x in three_more()), but invalid (draw a
diagnostic, and usually fail to compile at all).


I would like to request you to explain why you are indicating second
function as better.
I would like to clarify my self why I requested so. I was reading one
C text book which is famous in our country it says as mentioned.


"These instructions increase directly specify the required information
so help in faster execution. 'C' makes
efficient use of this feature by providing compound statements for
which translation can be done directly to
its corresponding machine instruction. For example:
140
a=a+10;
may be converted to,
MOV AX,_a
ADD 10
MOV _a, AX
Whereas a+=10; may be converted directly to,
INC _a, 10
in some machine."

So according to this logic first function "int three_more(int x)"
may be faster then the "int better_three_more(int x)". Is it not
correct ?
 
R

Richard Heathfield

somenath said:

So according to this
(broken)

logic first function "int three_more(int x)"
may be faster then the "int better_three_more(int x)". Is it not
correct ?

The formal answer is that the C Standard doesn't say either way.

In practice:

(a) the difference is likely to be minimal and not worth chasing;
(b) if either one is going to be faster, it is more likely to be the one
that doesn't pointlessly update an object that's about to be destroyed;
(c) a good compiler will in any case optimise any difference away;
(d) you should aim for clear code as a primary goal - write code that best
expresses your algorithmic intent, rather than the code you think will run
fastest, unless to do so would be grossly inefficient (e.g. recursive Fib,
strlen in a loop condition, etc).
 
C

Chris Torek

[and that I think that "better_three_more" is a better-written
function].

I would like to request you to explain why you are indicating second
function as better.
I would like to clarify my self why I requested so. I was reading one
C text book which is famous in our country it says as mentioned.

"These instructions increase directly specify the required information
so help in faster execution. 'C' makes
efficient use of this feature by providing compound statements for
which translation can be done directly to
its corresponding machine instruction. For example:
140
a=a+10;
may be converted to,
MOV AX,_a
ADD 10
MOV _a, AX
Whereas a+=10; may be converted directly to,
INC _a, 10
in some machine."

So according to this logic first function "int three_more(int x)"
may be faster then the "int better_three_more(int x)". Is it not
correct ?

Well, putting aside the fact that there are no guarantees about
what comes out of any given compiler, except that -- if it is a
correct C compiler at all -- it must implement those things the C
Standard requires ... there are, in essence, two "kinds" of compilers
here: the "stupid", directly-literal kind, and the "smart" or
"optimizing" compiler.

The claims above are true only of the "stupid" compiler. It uses
the syntax, rather than [%] the semantics, to pick out instructions.
Since "a = a + 10" syntactically says "get me a, get me 10, do an
add, and store that as the result", a stupid compiler does exactly
that, in exactly that order. Since "a += 10" syntactically says
"get me 10, add that to a", the stupid compiler can use an "add 10"
instruction, if one exists.

[% Actually "in addition to" -- it uses semantics attached to things
like types of variables and constants in order to choose between
"integer add" and "floating point add", for instance. But the first
part of selection is syntax-driven.]

(Note, the above *also* assumes that the machine *has* an "add 10"
instruction. On a load/store machine, the line "a += 10" has to be
compiled into:

load a
add #10
store a

anyway, which is the same thing the stupid compiler produces for the
"a = a + 10" line. So on some machines, even the stupid compiler gets
no benefit from the more-syntactically-compact "a += 10" line.)

A "smart" compiler, by contrast, reads entire blocks of code --
possibly as large as entire functions, source-files, or programs
-- and does a lot of work to figure out the "best" machine code to
implement that. A smart compiler will generally produce the same
machine code for either line (in this case, because "alias analysis"
is terrifically easy for ordinary variables, and the compiler can
see that "a = a + 10" and "a += 10" have exactly the same required
semantics). In other words, a "smart" compiler gets no help from
the += operator.

But let us look at the whole thing in context: we do not have a
simple "a += 10" (or in this case x += 3), but rather:

return x += 3;

So in this case, in the "stupid" compiler (on the machine for which
we got the "INC 3, _x" above), we have to:

- add 3 to the local variable x
- put that value into the return register
- tear down the stack frame
- return to caller

which we do thus:

INC [BP-8], 3 # x += 3
MOV AX, [BP-8] # return reg = x
LEAVE # remove stack frame
RET

Now compare that to the code the stupid compiler emits for "return
x + 3":

MOV AX, [BP-8]
ADD 3
LEAVE
RET

Although this is still four instructions, it is actually runs in
fewer clock cycles on the old versions of the CPU for which the
stupid compiler was written. That is, even with the "stupid"
compiler, the code is either faster, or no slower.

Most compilers these days are reasonably smart, at least when run
with optimization turned on. There *are* reasons to use "stupid"
compilers, or the non-optimizing mode in an otherwise smart compiler:
compilations run faster, for instance, and it is very difficult to
trigger bugs in code that is not there, or not being used. :)
Still, for most purposes, one mostly wants to turn optimization
on. (For instance, in gcc, a number of very useful warnings are
only enabled when optimization is on -- because it is the process
of optimization itself that finds the bugs that the warnings point
out).
 
S

somenath

[and that I think that "better_three_more" is a better-written
function].

somenath   said:
I would like to request you to explain why you are indicating second
function as better.
I would like to clarify my self why I requested so. I was reading one
C text book which is famous in our country it says as mentioned.
"These instructions increase directly specify the required information
so help in faster execution. 'C' makes
efficient use of this feature by providing compound statements for
which translation can be done directly to
its corresponding machine instruction. For example:
140
a=a+10;
may be converted to,
MOV AX,_a
ADD 10
MOV _a, AX
Whereas a+=10; may be converted directly to,
INC _a, 10
in some machine."
So according to this logic first function  "int three_more(int x)"
may be faster then the  "int better_three_more(int x)". Is it not
correct ?

Well, putting aside the fact that there are no guarantees about
what comes out of any given compiler, except that -- if it is a
correct C compiler at all -- it must implement those things the C
Standard requires ... there are, in essence, two "kinds" of compilers
here: the "stupid", directly-literal kind, and the "smart" or
"optimizing" compiler.

The claims above are true only of the "stupid" compiler.  It uses
the syntax, rather than [%] the semantics, to pick out instructions.
Since "a = a + 10" syntactically says "get me a, get me 10, do an
add, and store that as the result", a stupid compiler does exactly
that, in exactly that order.  Since "a += 10" syntactically says
"get me 10, add that to a", the stupid compiler can use an "add 10"
instruction, if one exists.

[% Actually "in addition to" -- it uses semantics attached to things
like types of variables and constants in order to choose between
"integer add" and "floating point add", for instance.  But the first
part of selection is syntax-driven.]

(Note, the above *also* assumes that the machine *has* an "add 10"
instruction.  On a load/store machine, the line "a += 10" has to be
compiled into:

    load a
    add #10
    store a

anyway, which is the same thing the stupid compiler produces for the
"a = a + 10" line.  So on some machines, even the stupid compiler gets
no benefit from the more-syntactically-compact "a += 10" line.)

A "smart" compiler, by contrast, reads entire blocks of code --
possibly as large as entire functions, source-files, or programs
-- and does a lot of work to figure out the "best" machine code to
implement that.  A smart compiler will generally produce the same
machine code for either line (in this case, because "alias analysis"
is terrifically easy for ordinary variables, and the compiler can
see that "a = a + 10" and "a += 10" have exactly the same required
semantics).  In other words, a "smart" compiler gets no help from
the += operator.

But let us look at the whole thing in context: we do not have a
simple "a += 10" (or in this case x += 3), but rather:

    return x += 3;

So in this case, in the "stupid" compiler (on the machine for which
we got the "INC 3, _x" above), we have to:

  - add 3 to the local variable x
  - put that value into the return register
  - tear down the stack frame
  - return to caller

which we do thus:

    INC [BP-8], 3   # x += 3
    MOV AX, [BP-8]  # return reg = x
    LEAVE           # remove stack frame
    RET

Now compare that to the code the stupid compiler emits for "return
x + 3":

    MOV AX, [BP-8]
    ADD 3
    LEAVE
    RET

Although this is still four instructions, it is actually runs in
fewer clock cycles on the old versions of the CPU for which the
stupid compiler was written.  That is, even with the "stupid"
compiler, the code is either faster, or no slower.

Most compilers these days are reasonably smart, at least when run
with optimization turned on.  There *are* reasons to use "stupid"
compilers, or the non-optimizing mode in an otherwise smart compiler:
compilations run faster, for instance, and it is very difficult to
trigger bugs in code that is not there, or not being used. :)
Still, for most purposes, one mostly wants to turn optimization
on.  (For instance, in gcc, a number of very useful warnings are
only enabled when optimization is on -- because it is the process
of optimization itself that finds the bugs that the warnings point
out).

Many thanks. From the above article I understood that
1) We should not use compound assignment expression (i.e +=,-=, ..
etc ) for writing faster code as optimizer emit suitable code for
faster execution.

So where is the real use of such kind of expression ?
Only we should use when the expression is long ?
i.e in x+=10 ; if x is complex to type then only the use of x+=10;
comes into picture?
 
C

Chris Torek

somenath said:
... So where is the real use of such kind of expression ?

Use them when they make the code clearer to human readers, mainly.
Only we should use when the expression is long ?
i.e in x+=10 ; if x is complex to type then only the use of x+=10;
comes into picture?

If the "true purpose" of the computation is to add 10 to x, use x += 10.
If the "true purpose" is simply to calculate 10 more than x, use x + 10.
For instance:

result = (x += 10);
x = 0; /* don't want the 10-greater x anymore */

is "misusing" the += operator, but:

x += 10;
... use the augmented x for a while ...
x++;
... use the augmented x some more ...

is "properly using" the += and ++ operators, because the "..."
sections of the code "want" the incremented variable.

This rule relies on deciding the "true purpose" of code, which is
quite a difficult thing to do -- a human reader, or a computer,
can follow the rules of the language to figure out what happen,
step-by-step, but *why* it happens might be a mystery. (This is,
in part, what comments are for -- the human writing the code can
explain, in a comment, *why* the code is taking some series of
steps.)

This separation of "how" (the various individual steps needed) from
"why" (the ultimate goal of any particular series of steps) is the
heart of abstraction, which in turn is the real essence of computer
programming. One immediate goal might be "tie shoelaces" and the
steps involve manipulating finger-like appendages. Backing up
another step reveals a higher-level goal, "put on shoes", of which
"tie shoelaces" is simply one step. We must go up another level,
though, in order to find out that "put on shoes" is just a step in
the goal of going outside, which in turn is just a step in the goal
of going to the market, and so on.

In the same way, it is obvious enough what "x += 10" does -- it
adds 10 to x and produces, as its value, the x+10 sum -- but we
need to move up a level in order to see *why* someone is adding 10
to x. If the reason is, or at least includes, "we need x to be 10
bigger", then the += operator is quite appropriate.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top