Any help with PLY?

M

mark.green

Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!
 
P

Paul McGuire

Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!
Pyparsing uses the Keyword class for just this purpose. Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :

any_ = Literal("any")
boolean_ = Literal("boolean")
char_ = Literal("char")
double_ = Literal("double")
...

identifier = Word( alphas, alphanums + "_" ).setName("identifier")

real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
+ Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
Word( nums+"+-", nums ) ).setName("int")

udTypeName = delimitedList( identifier, "::",
combine=True ).setName("udType")

# have to use longest match for type, in case a user-defined
# type name starts with a keyword type, like "stringSeq" or
"longArray"
typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
wchar_ ^ wstring_ ^ udTypeName )

This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.

Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression. Now
with Keyword, I can define:

any_ = Keyword("any")
boolean_ = Keyword("boolean")
char_ = Keyword("char")
double_ = Keyword("double")
...
typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
float_ | long_ | octet_ | short_ | string_ |
wchar_ | wstring_ | udTypeName )


Does PLY support greedy matching?

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net .)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top