rex: howto use the lexer class?

fdelente · Jun 22, 2008

Hello.

I'm trying to get rex to parse my inputs. After reading some of the sample
files provided with rex, I created this simple(?) file:

file: test.rex------------------------------------------------------------

# -*- ruby -*-
##########################################################################

class Lexer
macro
BLANKS \s+
DIGITS \d+
LETTERS [a-zA-Z]+
rule
{BLANKS}
{LETTERS} { puts "ID: '@{text}'"; [ :ID, text ] }
{DIGITS} { puts "NUMBER: '@{text}'"; [ :NUMBER, text.to_f ] }
.|\n { puts "text: '@{text}'"; [ text, text ] }
inner
end

##########################################################################
lexer=Lexer.new
while 1
str=$stdin.gets.strip
puts "str=@{str}"
lexer.scan_str(str)
puts "--------------------------------------------------------------------------"
end

end of file: test.rex-----------------------------------------------------

After 'rex test.rex', and 'ruby -Ku test.rex.rb', I always get errors like

test.rex.rb:60:in scan_evaluate': can not match: '2' (Lexer::ScanError)

when I type input.

Can anybody tell me why? Thanks.

nicholasmabry · Jun 24, 2008

Hey Fabrice,

The problem you're seeing is due to rex's assumption that you are
generating a parser in tandem with your lexer. The generated method
Lexer::scan_str looks like this:

def scan_str( str )
scan_evaluate str
do_parse
end

While scan_evaluate(str) is the method generated by your token
definitions, do_parse() depends on a racc grammar having been defined
and initialized. The bad news is that the default scan_str() won't
work for your purposes. The good news is that scan_evaluate() will. If
you examine your generated test.rex.rb file, you'll see that
scan_evaluate() identifies your tokens and pushes them one by one into
a queue named @rex_tokens. To pull them out of the queue, simply call
next_token(). Here's a quick replacement for the bottom of your token
definition file:

lexer=Lexer.new
while 1
str=$stdin.gets.strip
puts "str=#{str}"

# Here we're scanning the string for tokens
lexer.scan_evaluate(str)

# And then printing each one out to stdout
while token = lexer.next_token
p token
end
puts
"--------------------------------------------------------------------------"
end

The only other minor change I made was to "@{str}". The ruby string
interpolation escape sequence is actually "#{ }". Let us know if you
have more questions. Happy lexing!

-Nick

rex: multiline comment	0	Mar 4, 2007
Best lexer/parser for Ruby language itself	0	Dec 21, 2005
questions of idiom	3	Jun 7, 2010
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
ANN: 'rex' 0.5, a module for easier creation and use of regular expressions.	0	Jun 27, 2004
[ANN] rubylexer 0.7.1 Released	1	Sep 2, 2008
[SUMMARY] The Turing Machine (#162)	4	May 15, 2008

rex: howto use the lexer class?

fdelente

nicholasmabry

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads