Lexing the ' char

O

Ole Nielsby

I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?
 
K

kennheinrich

I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?

This is where you need a few characters of lookahead in your lex
buffer. If you match ( TICK, char, TICK) you have a character literal.
Otherwise it's the TICK token (attribute or type qualifier).

- Kenn
 
O

Ole Nielsby

This is where you need a few characters of lookahead in your lex
buffer. If you match ( TICK, char, TICK) you have a character literal.
Otherwise it's the TICK token (attribute or type qualifier).

Thanks. That's what I already implemented but I wasn't sure...
 
D

diogratia

I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?

I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?

case '\'': /* IR1045 check */

if ( last_token == DELIM_RIGHT_PAREN ||
last_token == DELIM_RIGHT_BRACKET ||
last_token == KEYWD_ALL ||
last_token == IDENTIFIER_TOKEN ||
last_token == STR_LIT_TOKEN ||
last_token == CHAR_LIT_TOKEN || !
(buff_ptr<BUFSIZ-2) )
token_flag = DELIM_APOSTROPHE;
else if (is_graphic_char(NEXT_CHAR) &&
line_buff[buff_ptr+2] == '\'') {
CHARACTER_LITERAL:
buff_ptr+= 3; /* lead,trailing \'
and char */
last_token = CHAR_LIT_TOKEN;
token_strlen = 3;
return (last_token);
}
else token_flag = DELIM_APOSTROPHE;
break;

See Issue Report IR1045:
http://www.eda-stds.org/isac/IRs-VHDL-93/IR1045.txt

As you can see from the above code fragment, the last token can be
captured and used to di"sambiguate something like:

foo <= std_logic_vector'('a','b','c');

without a large look ahead or backtracking.

Mind you you could try to argue that LRM 13.2:

...

"In some cases an explicit separator is required to separate adjacent
lexical elements (namely when, without separation, interpretation as a
single lexical element is possible). A separator is either a space
character (SPACE or NBSP),a format effector, or the end of a line. A
space character (SPACE or NBSP) is a separator except within a
comment, a string literal, or a space character literal."

could simply require the inclusion of disambiguating whitespace. The
accepted practice would be against you, however.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top