Any help with PLY?

mark.green · Nov 17, 2005

Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!

Paul McGuire · Nov 17, 2005

Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!

Pyparsing uses the Keyword class for just this purpose. Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :

any_ = Literal("any")
boolean_ = Literal("boolean")
char_ = Literal("char")
double_ = Literal("double")
...

identifier = Word( alphas, alphanums + "_" ).setName("identifier")

real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
+ Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
Word( nums+"+-", nums ) ).setName("int")

udTypeName = delimitedList( identifier, "::",
combine=True ).setName("udType")

# have to use longest match for type, in case a user-defined
# type name starts with a keyword type, like "stringSeq" or
"longArray"
typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
wchar_ ^ wstring_ ^ udTypeName )

This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.

Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression. Now
with Keyword, I can define:

any_ = Keyword("any")
boolean_ = Keyword("boolean")
char_ = Keyword("char")
double_ = Keyword("double")
...
typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
float_ | long_ | octet_ | short_ | string_ |
wchar_ | wstring_ | udTypeName )

Does PLY support greedy matching?

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net .)

ply and threads	0	May 7, 2009
Can't solve problems! please Help	0	Sep 26, 2022
Ply(LALR) and Yacc behaving differently	0	Apr 7, 2005
Help with code plsss	0	Aug 30, 2023
Help with Paypal Live Transactions	1	May 19, 2023
I need help fixing my website	2	Oct 15, 2023
Help with Loop	0	Mar 30, 2023
Identifiers - UnicodeEscapeSequence	6	Feb 15, 2010

Any help with PLY?

mark.green

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads