tokens

M

mdh

Hi all,
From p 125, gives rise to this issue for me.

Is it true that a "token" in C ( philisophically ) is the least amount
of digits/chars/underscores/*s ( and other non blank space that I
have not thought of) that the compiler uses to derive useable
information. So, this would be a token

" ( ) "

but this

"( "

by itself would not?

Thanks as usual.
 
H

Harald van Dijk

Hi all,
From p 125, gives rise to this issue for me.

Is it true that a "token" in C ( philisophically ) is the least amount
of digits/chars/underscores/*s ( and other non blank space that I have
not thought of) that the compiler uses to derive useable information.

In a way.
So, this would be a token

" ( ) "

If you mean including the quotation marks, then yes. Otherwise, no.
but this

"( "

by itself would not?

So, no. A token is a word. A token is the largest string of characters
that cannot have whitespace inserted without breaking it into smaller
tokens, changing its meaning. () is not a token. It is two tokens, which
you can tell from the fact that you can separate the two by writing ( ).
If it were a single token, this would not be allowed. "" is a token,
because you cannot separate the quotation marks by writing " " -- at least
not without changing its meaning.
 
M

mdh

...... A token is a word. A token is the largest string of characters..........
changing its meaning.

So,

(

is a token as it has some meaning to the compiler.... "get some more
characters, see if followed by another ) and if not, there is an
error" etc? or " found second ) so comma delineated list are
arguments" etc.


Is that why my example of ( ) is not a single token as the first "("
could be the start of a lot of different things?
 
H

Harald van Dijk

Is that why my example of ( ) is not a single token as the first "("
could be the start of a lot of different things?

No, your example of ( ) is not a single token because the two can be
separated.
 
M

mdh

No, your example of ( ) is not a single token because the two can be
separated.

Sorry...then I did not make myself clear. I am agreeing with you. ()
are 2 tokens, as each ( has a a meaning to the compiler.

The reason this is somewhat confusing is that on p125 of K&R2, they
define "tokens" ( and the quotation marks are theirs, not mine) as a
pair of parentheses, a pair of brackets perhaps including a number. I
assume that their definition is for the sake of the example, then.
 
H

Harald van Dijk

Sorry...then I did not make myself clear. I am agreeing with you. () are
2 tokens, as each ( has a a meaning to the compiler.

I understood that you agreed that ( ) are two tokens, but I did not agree
with my understanding of your reasoning. It is possible I misunderstood
you, so I will give a different example. - can also be the start of a lot
of different things, such as -= or -- or ->, but those three are three
single tokens, each consisting of two characters. You cannot decrement a
variable using a- - or a- =b. You cannot dereference a pointer to a
structure using a- >b. In other words, you cannot separate the - from the
second character. At the same time, - in 3-2 is a token by itself.
The reason this is somewhat confusing is that on p125 of K&R2, they
define "tokens" ( and the quotation marks are theirs, not mine) as a
pair of parentheses, a pair of brackets perhaps including a number.

This seems strange, but...
I assume that their definition is for the sake of the example, then.

without knowing the context, I cannot be sure. I don't have K&R, so
hopefully someone else will comment.
 
M

mdh

I understood that you agreed that ( ) are two tokens, but I did not agree
with my understanding of your reasoning. It is possible I misunderstood
you, so I will give a different example.

ok... I see what you mean.

This seems strange, but...


without knowing the context, I cannot be sure. I don't have K&R, so
hopefully someone else will comment.

Thank you Harald.
 
B

Barry Schwarz

Sorry...then I did not make myself clear. I am agreeing with you. ()
are 2 tokens, as each ( has a a meaning to the compiler.

The reason this is somewhat confusing is that on p125 of K&R2, they
define "tokens" ( and the quotation marks are theirs, not mine) as a
pair of parentheses, a pair of brackets perhaps including a number. I
assume that their definition is for the sake of the example, then.

On page 125 K&R use the word token to describe the unique processing
of a function they have provided. They enclosed it in quotes to
indicate the word is being used with a meaning other than its normal
one. In fact, it has at least two normal meanings within C.

One is its use to describe the processing of strtok(). In
this context, any string between the specified delimiters is a token.

The other is the meaning compiler writers use when describing
parse algorithms. In C, the expression a+++b is guaranteed to be
evaluated as
a++ + b
and not
a + ++b
because C uses a maximum munch rule for identifying tokens.
 
B

Bartc

Harald van Dijk said:
No, your example of ( ) is not a single token because the two can be
separated.

+= is a single token, but it can be separated into two tokens + =
 
A

Andrew Poelstra

+= is a single token, but it can be separated into two tokens + =

Yes, but that changes its meaning from a single "add and assign" to the
individual tokens "plus or positive" and "assignment", which is what
Harold wrote earlier in this thread.
 
M

mdh

On page 125 K&R use the word token to describe the unique processing
of a function they have provided.  They enclosed it in quotes to
indicate the word is being used with a meaning other than its normal
one.  In fact, it has at least two normal meanings within C.

        One is its use to describe the processing of strtok().  In
this context, any string between the specified delimiters is a token.

        The other is the meaning compiler writers use............



So you are saying that in the example on p125, K&R are defining their
meaning of a token (in their function "gettoken") , in the same way
one can apparently decide how to define a token (via the
delimiters) with strtok in <string.h>

And...

what some posters are using to mean "token" is the usual way compiler
writers use the word "token"?
 
B

Bartc

So you are saying that in the example on p125, K&R are defining their
meaning of a token (in their function "gettoken") , in the same way
one can apparently decide how to define a token (via the
delimiters) with strtok in <string.h>

The K&R function applies it's own meaning to 'token' (I'm not familiar with
strtok()).

When you write token-parsing code, you can make up your own rules. In this
function, () /is/ a single token.
what some posters are using to mean "token" is the usual way compiler
writers use the word "token"?

The C language has it's own set of symbols considered tokens (and it's own
rules for forming them and deciding where one starts and another ends).
Other languages will have their own symbols and rules:

The strtok() seems designed to provide a crude way of parsing text, for
example to separate out numbers and words from user input. Compiler parsers
are more sophisticated and targeted at a specific language.
 
M

mdh

The K&R function applies it's own meaning to 'token' (I'm not familiar with
strtok()).

When you write token-parsing code, you can make up your own rules. In this
function, () /is/ a single token.
.....

snip

......


The C language has it's own set of symbols considered tokens (and it's own
rules for forming them and deciding where one starts and another ends).
Other languages will have their own symbols and rules:


Thanks...that's what I finally realized. viz a token depends upon many
things, but the bottom line is that a token is a string with a
meaning, the length, meaning etc being dependent upon whomever decides
what it is for that particular system.
Thanks again for your help.
 
N

Nick Keighley

On page 125 K&R use the word token to describe the unique processing
of a function they have provided.  They enclosed it in quotes to
indicate the word is being used with a meaning other than its normal
one.  In fact, it has at least two normal meanings within C.

        One is its use to describe the processing of strtok().  In
this context, any string between the specified delimiters is a token.

        The other is the meaning compiler writers use when describing
parse algorithms.  In C, the expression a+++b is guaranteed to be
evaluated as
   a++ + b
and not
   a + ++b
because C uses a maximum munch rule for identifying tokens.

the C programming language definition also specifies
"preprocessor tokens" which (I think) are subtly different
from "tokens".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top