---3

M

Michael Press

int main(void)
{
---3;
return 0;
}

error: invalid lvalue in decrement

???
Why does this have to be an error?
The value could be -4 or -3 or -2.
If somebody [shuffles feet] wants
to use this construction, then who
would stop him?
 
L

Lew Pitcher

int main(void)
{
---3;
return 0;
}

error: invalid lvalue in decrement

???
Why does this have to be an error?

Because you can't pre-decrement a constant?

Read
---3
as either
- --3
or
-- -3
and remember what pre-decrement does.
The value could be -4 or -3 or -2.

Not really.

The symantics of pre-decrement require that the decremented value be stored
somewhere; where should the compiler/runtime store /this/ pre-decremented
value?
If somebody [shuffles feet] wants
to use this construction, then who
would stop him?

Apparently, the compiler (or the compiler-writer). Presumably, with support
from the ISO C committee (by way of the ISO C standard).
 
S

Sjouke Burry

Michael said:
int main(void)
{
---3;
return 0;
}

error: invalid lvalue in decrement

???
Why does this have to be an error?
The value could be -4 or -3 or -2.
If somebody [shuffles feet] wants
to use this construction, then who
would stop him?
Any sensibly designed compiler?

Or a boss firing a programmer who likes to put stupid things
in his software?
 
K

Keith Thompson

Lew Pitcher said:
Because you can't pre-decrement a constant?

Read
---3
as either
- --3
or
-- -3
and remember what pre-decrement does.

The correct reading is
-- -3
due to the "maximal munch" rule. This is the same reason that
x-----x
is a syntax error, rather than being tokenized as
x -- - -- x
 
K

Keith Thompson

Michael Press said:
int main(void)
{
---3;
return 0;
}

error: invalid lvalue in decrement

???
Why does this have to be an error?
The value could be -4 or -3 or -2.
If somebody [shuffles feet] wants
to use this construction, then who
would stop him?

Apart from the fact that "---3;" is a constraint violation, and
that every conforming compiler must issue a diagnostic for it, and
in practice I suspect that every existing compiler will reject it:

Whatever you intended "---3" to mean, there is certainly a
clearer way to express that meaning. For the most nearly sensible
interpretation, just write either "-3 - 1" or "-4". Or, since
you're discarding the result, just delete that line of code.
 
B

Ben Bacarisse

Michael Press said:
A warning is well deserved.
---x can be parsed into
something meaningful in C.

Did you mean "can't" rather than "can"? The rules of C permit only one
parse (which is --(-x)). A compiler that came up with any other parse
is not a C compiler.
 
M

Michael Press

Ben Bacarisse said:
Did you mean "can't" rather than "can"? The rules of C permit only one
parse (which is --(-x)). A compiler that came up with any other parse
is not a C compiler.

Yes, I do not speak ex cathedra.

The question remains why

---x;

precipitates an error.
 
J

Jens Thoms Toerring

Yes, I do not speak ex cathedra.
The question remains why

precipitates an error.

Because, as it has been pointed out in this thread e.g. by
Keith Thompson, it's parsed as

-- ( -x );

and '-x' is the *number* one gets when taking the negative of
what's stored in the variable 'x'. But a value can't be decre-
mented, only a variable can. So it's akin to '---3'. If you want

- ( --x );

(i.e. decrement what's stored in 'x' and, as a result of the ex-
pression, get the negative of the resulting value, i.e. what's
stored in 'x' after the decrement) then you have to write it that
way. What you want with that is a different question, of course,
since it doesn't make much sense because the resulting value will
be thrown away.

If you're just assembling random tokens to see what happens you
shouldn't be too surprised that some combinations don't make any
sense, there's no language (computer or human) where all possible
combinations of tokens are meaningful - normally only a small
subset of all possible combinations is syntactically correct:

* Is brown dog the.
The dog is brown.
* Brown dog the is.
* The brown dog is.
Brown is the dog.
* Is brown the dog.
....

Some of those sentences make sense, some don't (at least for a
"standard english speaker"). Same for C, some combinations of
tokens don't make sense. And e.g. '---3' isn't correct nor is
'---x'. The difference to a human language is that the rules
of what's correct are clearly defined and can be looked up in
the standard for the C language. That's what C compiler writers
have to care for, not random assemblies of tokens.

Regards, Jens
 
B

Ben Bacarisse

Michael Press said:
Yes, I do not speak ex cathedra.

The question remains why

---x;

precipitates an error.

I thought this has come up already. The input broken is up into tokens
using the rules that the next preprocessing token is "the longest
sequence of characters that could constitute a preprocessing token" (6.4
p4). Thus your statements is broken up into four tokens:

<--> <-> <x> <;>

The parser must then "fit" this sequence to the grammar rules. It will
conclude that we have a statement. In particular an expression followed
by a semicolon. The expression is a unary expression whose operand is
another unary expression. The operand of that inner unary expression is
an identifier, 'x'.

The result (--(-x)) is a constraint violation for the same reason ---3
is one: -x is not a "modifiable lvalue". In fact it is not a lvalue at
all let along a modifiable one.

This may be too much or to little explanation. If so, I'm sorry. I am
not 100% clear what part was not already covered.
 
M

Michael Press

Ben Bacarisse said:
I thought this has come up already. The input broken is up into tokens
using the rules that the next preprocessing token is "the longest
sequence of characters that could constitute a preprocessing token" (6.4
p4). Thus your statements is broken up into four tokens:

<--> <-> <x> <;>

Unary minus and pre-decrement associate right to left,
so a maximum valid string length tokenizer would give

(-(--(x)))

No?

If ---x could be parsed meaningfully, then it should.
 
M

Michael Press

Ben Bacarisse said:
I thought this has come up already. The input broken is up into tokens
using the rules that the next preprocessing token is "the longest
sequence of characters that could constitute a preprocessing token" (6.4
p4). Thus your statements is broken up into four tokens:

<--> <-> <x> <;>

The parser must then "fit" this sequence to the grammar rules. It will
conclude that we have a statement. In particular an expression followed
by a semicolon. The expression is a unary expression whose operand is
another unary expression. The operand of that inner unary expression is
an identifier, 'x'.

The result (--(-x)) is a constraint violation for the same reason ---3
is one: -x is not a "modifiable lvalue". In fact it is not a lvalue at
all let along a modifiable one.

This may be too much or to little explanation. If so, I'm sorry. I am
not 100% clear what part was not already covered.

All clear. See my other reply.
 
B

Ben Pfaff

[about ---x]
Unary minus and pre-decrement associate right to left,
so a maximum valid string length tokenizer would give

(-(--(x)))

No?

No. - is shorter than --, so - is not the longest valid token
starting ---x.
 
L

lawrence.jones

Michael Press said:
The question remains why

---x;

precipitates an error.

Because simple rules are easier for everyone involved than complex rules
that try to guess what the programmer might have meant. If you want
triple negation for some obscure purpose, then write it:

- - -x;
 
P

puppi

int main(void)
{
    ---3;
    return 0;

}

error: invalid lvalue in decrement

???
Why does this have to be an error?
The value could be -4 or -3 or -2.  
If somebody [shuffles feet] wants
to use this construction, then who
would stop him?

What did you intend, really? That the constant -3 had its value
modified?
---3;
assert(-3 == -4);
hahaha
 
L

lawrence.jones

Kenneth Brody said:
I believe the term is colloquially called "maximum munch", though that's not
a term I fund anywhere in the Standard. (I'm sure someone else can quote
C&V on this.)

"Maximal munch". It's in 6.4p4 (maybe I should add it to the index):

If the input stream has been parsed into preprocessing tokens
up to a given character, the next preprocessing token is the
longest sequence of characters that could constitute a
preprocessing token.
 
M

Michael Press

Kenneth Brody said:
No, because this requires that you have already parsed it into tokens to
determine that you are using "unary minus" and "pre-decrement".


And what would you say "x---y" means, given that there is more than one way
to "meaningfully parse" that sequence? The rules of the C language make it
clear that there is only one way to parse it, even if there _could_be_ more
than one "meaningful" way to do so.

Unless, that is, you think the rule should be "take the longest sequence of
characters that could constitutes a preprocessing token, and continue on
parsing, but if you get some sort of 'error' later on, then start backing up
and re-parsing previous tokens to see if they could be shortened to some
other preprocessing token, and continue from there, until you either come up
with a valid parse, or you have shortened everything to the shortest
possible preprocessing token, in which case it's an error".

Thank you all for explaining this to me. I heard people
speak of a tokenizer and a semantic parser so I thought
that identifying tokens and finding meaning are two
entirely separate processes in the formal model used to
generate a machine executable. So by my, incorrect,
picture we first identify tokens. ---x produces the
list ("-", "-", "-", "x", ";"). Then the grammar finds
meaning. Since that list could be parsed to a
meaningful C construct, it would be. To my way of
thinking the token identifying phase and the phase that
finds meaning are confused when the token identifier
decides that the meaning of ---x is (--)-x.

Disclaimer: The previous paragraph is for informational
purposes only and must not be construed as a demand
that things be done my way or even frustration that
they are not. Again, thanks.
 
B

Ben Bacarisse

Michael Press said:
Thank you all for explaining this to me. I heard people
speak of a tokenizer and a semantic parser so I thought
that identifying tokens and finding meaning are two
entirely separate processes in the formal model used to
generate a machine executable. So by my, incorrect,
picture we first identify tokens. ---x produces the
list ("-", "-", "-", "x", ";"). Then the grammar finds
meaning.

That description is not wrong. The only part that is wrong it the
actual list of tokens. The process *is* a two-phase one[1]: first find
the tokens and then use the grammar to find the structure (I'd reserve
the word "meaning" for something else, but that really is an unimportant
detail).

It's not clear from your example if you thought that a token was just a
synonym for a character (it would have been clear if you'd used a
multi-character variable rather than 'x') but one way or another all you
got wrong was the details of the rule used for finding the tokens.
Since that list could be parsed to a
meaningful C construct, it would be. To my way of
thinking the token identifying phase and the phase that
finds meaning are confused when the token identifier
decides that the meaning of ---x is (--)-x.

I don't see why you describe this as confusing the phases. The tokens
can be found without any reference to the syntax (what you call the
meaning). A set of rules or patterns for what constitute tokens are
given and the tokeniser proceeds by finding, at each step, the longest
sequence of characters that fit any of the patterns. Both '-' and '--'
are valid tokens so '--abc;' tokenises to '--', '-', 'abc', ';'.

<snip>

[1] The standard describes 8 phases because there are more things to do
that simply tokenise and parse, but the details of all of these phases
wouldn't add anything to this discussion.
 
M

Michael Press

Ben Bacarisse said:
Michael Press said:
Thank you all for explaining this to me. I heard people
speak of a tokenizer and a semantic parser so I thought
that identifying tokens and finding meaning are two
entirely separate processes in the formal model used to
generate a machine executable. So by my, incorrect,
picture we first identify tokens. ---x produces the
list ("-", "-", "-", "x", ";"). Then the grammar finds
meaning.

That description is not wrong. The only part that is wrong it the
actual list of tokens. The process *is* a two-phase one[1]: first find
the tokens and then use the grammar to find the structure (I'd reserve
the word "meaning" for something else, but that really is an unimportant
detail).

It's not clear from your example if you thought that a token was just a
synonym for a character (it would have been clear if you'd used a
multi-character variable rather than 'x')

I know that abc is a token.
---abc -> ("-", "-", "-", "abc").
but one way or another all you
got wrong was the details of the rule used for finding the tokens.

Yes. The reason I got it wrong is that I considered
going from the string "--" to the operator pre-decrement
to be a two step affair.
I don't see why you describe this as confusing the phases.

Because making "--" a token is looking ahead to the phase
where meaning for the C programing language is found.
The tokens
can be found without any reference to the syntax (what you call the
meaning).

I disagree. The token "--" can only be picked out by
defining it to be a token a priori; and we do that only
because we know in a later phase it will be found to have meaning.
A set of rules or patterns for what constitute tokens are
given and the tokeniser proceeds by finding, at each step, the longest
sequence of characters that fit any of the patterns. Both '-' and '--'
are valid tokens so '--abc;' tokenises to '--', '-', 'abc', ';'.
Yes.

[1] The standard describes 8 phases because there are more things to do
that simply tokenise and parse, but the details of all of these phases
wouldn't add anything to this discussion.

Yes, I know there are other phases, both before and after.
 
I

Ike Naar

I don't see why you describe this as confusing the phases. The tokens
can be found without any reference to the syntax (what you call the
meaning). A set of rules or patterns for what constitute tokens are
given and the tokeniser proceeds by finding, at each step, the longest
sequence of characters that fit any of the patterns. Both '-' and '--'
are valid tokens so '--abc;' tokenises to '--', '-', 'abc', ';'.

More confusion. Probably you meant:

... so '---abc;' tokenises to ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,201
Latest member
KourtneyBe

Latest Threads

Top