C
Caleb Clausen
rubylexer version 0.7.1 has been released!
* <http://rubyforge.org/projects/rubylexer/>
* <http://rubylexer.rubyforge.org/>
RubyLexer is a lexer library for Ruby, written in Ruby. Rubylexer is meant
as a lexer for Ruby that's complete and correct; all legal Ruby
code should be lexed correctly by RubyLexer as well. Just enough parsing
capability is included to give RubyLexer enough context to tokenize correctly
in all cases. (This turned out to be more parsing than I had thought or
wanted to take on at first.) RubyLexer handles the hard things like
complicated strings, the ambiguous nature of some punctuation characters and
keywords in ruby, and distinguishing methods and local variables.
Users of rubygems can install with this command:
gem update rubylexer
Or download it here:
* http://rubyforge.org/frs/download.php/42434/rubylexer-0.7.1.gem
* http://rubyforge.org/frs/download.php/42433/rubylexer-0.7.1.tgz
Changes:
### 0.7.1/10-29-2008
* 6 Major Enhancements:
* empty string fragments now more like ruby's; this resolves many warnings
* yet more hacks in aid of string inclusions
* backslashes in strings are no longer interpreted automatically when lexed
* here documents completely rewritten in a tricky way that better mimics MRI
* many more flags for tokens to tell apart the various cases:
* the various different local variable types have to be detected.
* colons which operate like semicolons or thens are marked as such
* { } used in block now flagged as parsing like do and end
* commas now are marked with different types depending on how they're used
* @variables in methods are flagged, so parsetree can come out different
* clearly mark backquoted strings
* further refinement of local var detection/implicit paren placement near:
* when ws between method name and parenthesis
* break/return/next
* ? : << / rescue do
* 5 Minor Enhancements
* colon or star in assignment make it a multi assignment
* presence of unary * or & in param list forces it to be a multi-param list
* errors in string inclusions should now be handled better
* string and similar now record exact chars that open and close the string
* more cases where return/break/next parses different than a method (yuck!)
* 26 Bugfixes
* ~ operator can be followed with an @, like + and -
* ~ is overridable, however :: is not
* raise is not a keyword
* in addition to 0x00, 0x04 and 0x1a are eof in ruby. why? idunno.
* set PROGRESS env var to print input file pos to stderr periodically
* defined? isn't a funclike keyword... more of a unary operator
* $- is a legitimate global variable.
* better parsing of lvalue list following for keyword.
* rescue ctx defines var only right after => and before then.
* better placement of implicit parens around def param list
* (global) variable aliasing now supported
* local vars in END block are NOT scoped to the block!
* lvars in def headers aren't vars til after its initializer
* end of def header is treated like ; even if none is present
* never put here document right after class keyword
* look for start of line directives at end of here document
* oops, mac newlines don't have to be supported
* dos newlines better tolerated around here documents
* less line number/offset confusion around here documents
* nl after (non-op) rescue is hard (but not after INNERBOUNDINGWORDS)
* handling eof in more strange places
* always expect unary op after for
* unary ops should know about the before-but-not-after rule!
* newlines after = should be escaped
* \c? and \C-? are not interpreted the same as other ctrl chars
* \n\r and \r are not recognized as nl sequences
* 18 Internal Changes (not user visible)
* commas cause a :comma event on the parsestack
* some lists of keywords are now arrays of strings instead of regexps
* single and double quote now have separate implementations again
* know whether implicit open or close paren has just been emitted
* put ws around << to keep slickedit happy
* the eof characters are also considered whitespace.
* identifier lexer now uses regexps more heavily
* method formal params are not considered an lvalue context for commas
* class and def now have their own parse contexts
* unary star causes a :splat event on the parsestack
* is_var_name finds var tokens just from token type, not lvars table
* a faster regexp-based implementation of string scanning
* moved yucky side effect out of quote_expected?
* class module def for defined? keywords don't make op context
* a new context for BEGIN/END keywords
* a new context for param list of return/next/break
* new escape sequence processors for regexp and %W list
* numbers now scanned with a regexp
* 15 Enhancements and bug fixes to tests:
* just warn on errors which are also syntax errors for ruby
* a little cleanup of temp files
* rubylexervsruby and tokentest can take input from stdin
* unlexer improvements
* dumptokens now has a --silent cmdline option
* locatetest.rb is significantly enhanced
* --unified option to diff seems to work better than -u
* tokentest better verifies exact token contents...
* tokentest now verifies open and close fields of strings
* CRLF in a string is always treated like just a LF. (CR is elided.)
* allow_ooo hacky flag for token offset errors to be ignored.
* all other offset errors have been downgraded to warnings.
* most of the offset problem I had been seeing have been fixed, tho
* bad offsets in here head/body, symbol and fal always ignored (a hack)
* tokentest has a --loop option, for load testing
* <http://rubyforge.org/projects/rubylexer/>
* <http://rubylexer.rubyforge.org/>
* <http://rubyforge.org/projects/rubylexer/>
* <http://rubylexer.rubyforge.org/>
RubyLexer is a lexer library for Ruby, written in Ruby. Rubylexer is meant
as a lexer for Ruby that's complete and correct; all legal Ruby
code should be lexed correctly by RubyLexer as well. Just enough parsing
capability is included to give RubyLexer enough context to tokenize correctly
in all cases. (This turned out to be more parsing than I had thought or
wanted to take on at first.) RubyLexer handles the hard things like
complicated strings, the ambiguous nature of some punctuation characters and
keywords in ruby, and distinguishing methods and local variables.
Users of rubygems can install with this command:
gem update rubylexer
Or download it here:
* http://rubyforge.org/frs/download.php/42434/rubylexer-0.7.1.gem
* http://rubyforge.org/frs/download.php/42433/rubylexer-0.7.1.tgz
Changes:
### 0.7.1/10-29-2008
* 6 Major Enhancements:
* empty string fragments now more like ruby's; this resolves many warnings
* yet more hacks in aid of string inclusions
* backslashes in strings are no longer interpreted automatically when lexed
* here documents completely rewritten in a tricky way that better mimics MRI
* many more flags for tokens to tell apart the various cases:
* the various different local variable types have to be detected.
* colons which operate like semicolons or thens are marked as such
* { } used in block now flagged as parsing like do and end
* commas now are marked with different types depending on how they're used
* @variables in methods are flagged, so parsetree can come out different
* clearly mark backquoted strings
* further refinement of local var detection/implicit paren placement near:
* when ws between method name and parenthesis
* break/return/next
* ? : << / rescue do
* 5 Minor Enhancements
* colon or star in assignment make it a multi assignment
* presence of unary * or & in param list forces it to be a multi-param list
* errors in string inclusions should now be handled better
* string and similar now record exact chars that open and close the string
* more cases where return/break/next parses different than a method (yuck!)
* 26 Bugfixes
* ~ operator can be followed with an @, like + and -
* ~ is overridable, however :: is not
* raise is not a keyword
* in addition to 0x00, 0x04 and 0x1a are eof in ruby. why? idunno.
* set PROGRESS env var to print input file pos to stderr periodically
* defined? isn't a funclike keyword... more of a unary operator
* $- is a legitimate global variable.
* better parsing of lvalue list following for keyword.
* rescue ctx defines var only right after => and before then.
* better placement of implicit parens around def param list
* (global) variable aliasing now supported
* local vars in END block are NOT scoped to the block!
* lvars in def headers aren't vars til after its initializer
* end of def header is treated like ; even if none is present
* never put here document right after class keyword
* look for start of line directives at end of here document
* oops, mac newlines don't have to be supported
* dos newlines better tolerated around here documents
* less line number/offset confusion around here documents
* nl after (non-op) rescue is hard (but not after INNERBOUNDINGWORDS)
* handling eof in more strange places
* always expect unary op after for
* unary ops should know about the before-but-not-after rule!
* newlines after = should be escaped
* \c? and \C-? are not interpreted the same as other ctrl chars
* \n\r and \r are not recognized as nl sequences
* 18 Internal Changes (not user visible)
* commas cause a :comma event on the parsestack
* some lists of keywords are now arrays of strings instead of regexps
* single and double quote now have separate implementations again
* know whether implicit open or close paren has just been emitted
* put ws around << to keep slickedit happy
* the eof characters are also considered whitespace.
* identifier lexer now uses regexps more heavily
* method formal params are not considered an lvalue context for commas
* class and def now have their own parse contexts
* unary star causes a :splat event on the parsestack
* is_var_name finds var tokens just from token type, not lvars table
* a faster regexp-based implementation of string scanning
* moved yucky side effect out of quote_expected?
* class module def for defined? keywords don't make op context
* a new context for BEGIN/END keywords
* a new context for param list of return/next/break
* new escape sequence processors for regexp and %W list
* numbers now scanned with a regexp
* 15 Enhancements and bug fixes to tests:
* just warn on errors which are also syntax errors for ruby
* a little cleanup of temp files
* rubylexervsruby and tokentest can take input from stdin
* unlexer improvements
* dumptokens now has a --silent cmdline option
* locatetest.rb is significantly enhanced
* --unified option to diff seems to work better than -u
* tokentest better verifies exact token contents...
* tokentest now verifies open and close fields of strings
* CRLF in a string is always treated like just a LF. (CR is elided.)
* allow_ooo hacky flag for token offset errors to be ignored.
* all other offset errors have been downgraded to warnings.
* most of the offset problem I had been seeing have been fixed, tho
* bad offsets in here head/body, symbol and fal always ignored (a hack)
* tokentest has a --loop option, for load testing
* <http://rubyforge.org/projects/rubylexer/>
* <http://rubylexer.rubyforge.org/>