Mike Samuel said:
I maintain the syntax highlighter for code.google.com and perl support
is rather lacking.
Surprise, surprise. Considering what Google has done to Usenet I wonder
why so many people couldn't care less.
I know perl has a complex grammar, but can someone point me at a
simple lexical grammar for perl 5 that will allow me to at least
identify comment, string, and regex boundaries?
The old saying goes "only perl can parse Perl".
Comments are easy: anything following a # sign in the same line or
anything enclosed as POD.
Strings are a different story, because there is no single set of
characters (like single or double quotes) identifying a string but there
are numerous operations and functions, which turn their argument into a
string, notably the quote and quote-like operators
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes (unless '' is
delimiter)
qw{} Word list no
// m{} Pattern match yes (unless '' is
delimiter)
qr{} Pattern yes (unless '' is
delimiter)
s{}{} Substitution yes (unless '' is
delimiter)
tr{}{} Transliteration no (but see below)
for which you can use any number of delimiter, e.g. in m -foo*bar- the
text 'foo*bar' is a string (and an RE). This cannot be parsed on the
lexical level.
Same goes for regex boundaries. There isn't a given set of characters
like /.../., but a regexp is identified by its position as argument for
a specific operation. The first arguments in s/// and m// are regular
expressions, no matter if you are using the slash or some other
delimiter and the first argument of tr/// is not an RE, although I used
the slash. Again, this cannot be determined on the lexical level.
jue