problems with racc: $end token

L

Luke A. Kanies

Hello,

I'm trying to write a simple parser using racc, and I'm apparently
retarded. I've looked at all of the example code I can find, including
rdtool, and I cannot seem to resolve this problem. I am convinced there
is something small I am missing, because it's apparently small enough that
I can't see the difference between my grammar/parser and everyone else's.

I always get the following error when I try to run my parser:

parse error on token '$end' => 'false'

The basic text is mine, but the error is kicked out by racc. For some
reason, racc is considering the $end token to be a parse error, rather
than considering it the end of parsing. I've tried understanding all the
code involved, including racc's parser.rb, the racc script itself, the
generated parser.rb file, and a good bit more. I just can't get it.

Here are the pertinent portions of my grammar file:

class Cricket::parser

token DEFINE NAME STRING PARAM LCURLY RCURLY VALUE

rule
file: objects
;

objects: object { [val[0]] }
| objects object { [val[0], val[1]].flatten }
;

object: DEFINE NAME LCURLY vars RCURLY {
Cricket::Object.create(val[1],val[3]) }
;

vars: var
| vars var
;

var: PARAM VALUE { [val[0],val[1]] }
;

end

----inner

def parse(src)
#puts "src is " + BLUE + src + RESET
@src = src

$invar = false
$inobject = false
$done = false

begin
do_parse
rescue SyntaxError
$stderr.print "Got a syntax error: " + $! + "\n"
exit
end
end

def next_token
....
if @src.length == 0
puts "returning end"
#return [false, 0]
#return [false, '$']
return [false, false]
end
....
end

As you can see, I've tried returning different types things, to no affect
(although the value of the $end token changes, the error still gets kicked
up).

The really strange thing is that the parser I wrote earlier had the same
problem _unless_ i passed a file (not a string) to yylex. This seems to
imply that the EOF from the file somehow avoids this error. I'm not using
yylex in this case (as my tokens are quite easy), and I'd really like to
just understand what the problem is.

Any pointers would be greatly appreciated, but apparently pointing me to
further example code is not helpful, unless you can point out how this
sample code avoids this error. Again, I expect it's something small, but
it's small enough that I've missed it twice now, both times having written
the grammar from scratch.

Thanks,
Luke

--
First they came for the hackers. But I never did anything
illegal with my computer, so I didn't speak up.
Then they came for the pornographers. But I thought there was
too much smut on the Internet anyway, so I didn't speak up.
Then they came for the anonymous remailers. But a lot of nasty
stuff gets sent from anon.penet.fi, so I didn't speak up.
Then they came for the encryption users. But I could never
figure out how to work PGP anyway, so I didn't speak up.
Then they came for me. And by that time there was no one left
to speak up.
-- Alara Rogers, Aleph Press
 
J

Jim Freeze

Hello,

def next_token
....
if @src.length == 0
puts "returning end"
#return [false, 0]
#return [false, '$']
return [false, false]
end
....
end

I apologize for not being able to dig more into this, but for
my next_token method I have:

def next_token
@q.shift
end

Are there two different ways to setup the tokenizing (ie next_token and
parse method)? I vaguely recall that there might be, but can't
look it up right now.
 
L

Luke A. Kanies

Hello,

def next_token
....
if @src.length == 0
puts "returning end"
#return [false, 0]
#return [false, '$']
return [false, false]
end
....
end

I apologize for not being able to dig more into this, but for
my next_token method I have:

def next_token
@q.shift
end

Are there two different ways to setup the tokenizing (ie next_token and
parse method)? I vaguely recall that there might be, but can't
look it up right now.

Well, kind of, but we're both doing the same thing. Most of the examples
I've seen preparse the entire source string into an array, and then just
pop tokens off the stack. I'm using the 'next_token' routine to actually
collect the tokens and return them.

So what I'm doing is (theoretically) functionally equivalent, I'm just
doing the split-into-tokens inside next_token instead of inside parse,
which seems to make a bit more sense to me.

However, I'll try converting to stacking the text into an array of tokens
and see what happens. I don't understand how that could solve the
problem, but that doesn't mean it won't.

Luke
 
L

Luke A. Kanies

What if you do: return ["", ""] ?

I still get a syntax error, but this time the apparently-magical token
'$end' is not used.

I know I'm supposed to return false as the token, and that somehow racc
converts that into the $end token. I just don't know how it does that,
nor do I know why racc doesn't then gracefully cease trying to parse,
rather than continuing on and hitting a syntax error.

I've tried looking through the source code, and I'm extremely confused. I
can find what I am pretty sure is all of the parsing stuff, but I've tried
adding puts statements to see what the heck is going on, and they never
get called. Or at least, I don't see their output. I even tried
deleting all references to external modules to make sure that I wasn't
loading an unmodified library, and that didn't seem to work.

So, I guess I'll continue trying to understand the code without knowing
how to turn on debugging (even though it appears to be there) and without
being able to add my own debug statements. Yay.

Thanks.

Luke
 
J

Jim Freeze

What if you do: return ["", ""] ?

I still get a syntax error, but this time the apparently-magical token
'$end' is not used.

I know I'm supposed to return false as the token, and that somehow racc
converts that into the $end token. I just don't know how it does that,
nor do I know why racc doesn't then gracefully cease trying to parse,
rather than continuing on and hitting a syntax error.

Well, in the sample code I see:

@q.push [false, '$'] # optional from 1.3.7

I have successfully left that out of my parsers.
Essentially, the samples just tokenize the file
(and store the tokens in @q) then call do_parse.

do_parse apparently gets tokens from the stack by
calling next_token, which returns to tokens one
at a time by calling @q.shift. This would suggest
that when you are done all you need to do is return
the same value as [].shift #=> nil.
 
L

Luke A. Kanies

Okay, I may have actually tracked this down to my apparent ignorance of
ruby's regexes.

The following code does not behave as I expect at all:

string = "\nalias Jamie Dowdy\n"
string.sub!(/^./,"")

print "[#{string}]\n"

This code strips out the 'a' in 'alias'. In other words, the anchor '^'
is anchoring against the beginning of a line, rather than the beginning of
the string.

Not surprisingly, this, um, really screws up my pattern matching.

How do I specifically anchor against the beginning of a string in ruby,
_not_ the beginning of a line in a string?

Getting that fixed may solve my problem here (and with my other parser,
since I obviously expected this behaviour and likely made the same mistake
in my other parser).

Thanks,
Luke
 
C

Ceri Storey

This code strips out the 'a' in 'alias'. In other words, the anchor '^'
is anchoring against the beginning of a line, rather than the beginning of
the string.

I had that problem a day or two ago...
How do I specifically anchor against the beginning of a string in ruby,
_not_ the beginning of a line in a string?

Use \A for the beginning, and \Z for the end. I believe.
 
L

Luke A. Kanies

Well, in the sample code I see:

@q.push [false, '$'] # optional from 1.3.7

Yep, I've tried that, along with about nine other variations of having an
end token.
I have successfully left that out of my parsers.
Essentially, the samples just tokenize the file
(and store the tokens in @q) then call do_parse.

do_parse apparently gets tokens from the stack by
calling next_token, which returns to tokens one
at a time by calling @q.shift. This would suggest
that when you are done all you need to do is return
the same value as [].shift #=> nil.

Yeah, all the code does it that way, but there shouldn't be a functional
difference between collecting the tokens in parse() and returning them in
next_token(), and just collecting and returning them in next_token().
Either way, I've switched to a method like the examples, and I've
corrected my regex problems, and I still get an error.

At this point I think it's a problem with my grammar, that I'm somehow not
correctly specifying the end of the parsing. Obviously, though, I don't
know how to say "hey, the file is over, stop looking" or whatever the
magic words are. I _know_ that the false token is supposed to do that,
but for some reason racc thinks it shouldn't be expecting that token yet.

In case anyone feels like pointing out my idiocy, here's my grammar as it
stands now:

token DEFINE NAME STRING PARAM LCURLY RCURLY VALUE RETURN COMMENT
INLINECOMMENT EOF

rule
file: objects EOF
;

objects: object { [val[0]] }
| objects object { [val[0], val[1]].flatten }
;

object: DEFINE NAME LCURLY RETURN vars RCURLY returns {
Cricket::Object.create(val[1],val[3]) }
;

vars: var { [val[0]] }
| vars var { [val[0], val[1]].flatten }
;

var: PARAM VALUE returns { [val[0],val[1]] }
;

returns: return
| returns return
;

return: comment RETURN
;

comment: # nothing
| COMMENT
| INLINECOMMENT
;

end

It's for parsing text like this:

# a comment
define contact {
contact_name vwf1607 ; inline comment
alias Lawrence Hubenak
host_notification_period none
host_notification_commands host-notify-by-email
service_notification_period none
service_notification_commands notify-by-email
email (e-mail address removed)
pager (e-mail address removed)
}

I.e., nagios configs.

Well, I guess I'll figure it out eventually, I was just hoping not get
much past the 8 or so hours I've already wasted on it.

Thanks,
Luke
 
J

Jim Freeze

Well, in the sample code I see:

@q.push [false, '$'] # optional from 1.3.7

Yep, I've tried that, along with about nine other variations of having an
end token.
I have successfully left that out of my parsers.
Essentially, the samples just tokenize the file
(and store the tokens in @q) then call do_parse.

do_parse apparently gets tokens from the stack by
calling next_token, which returns to tokens one
at a time by calling @q.shift. This would suggest
that when you are done all you need to do is return
the same value as [].shift #=> nil.

Yeah, all the code does it that way, but there shouldn't be a functional
difference between collecting the tokens in parse() and returning them in
next_token(), and just collecting and returning them in next_token().
Either way, I've switched to a method like the examples, and I've
corrected my regex problems, and I still get an error.

I agree. Could be a grammar/file syntax mismatch.
At this point I think it's a problem with my grammar, that I'm somehow not
correctly specifying the end of the parsing. Obviously, though, I don't
know how to say "hey, the file is over, stop looking" or whatever the
magic words are. I _know_ that the false token is supposed to do that,
but for some reason racc thinks it shouldn't be expecting that token yet.

In case anyone feels like pointing out my idiocy, here's my grammar as it
stands now:

Ok, I'll take a look. At your grammar that is, not your idiocy. :)
 
S

Simon Strandgaard

I had that problem a day or two ago...


Use \A for the beginning, and \Z for the end. I believe.


Use '\z' (lowercase) if you want to match the end.

server> irb
irb(main):001:0> /x\Z/.match("ax\n").to_a
=> ["x"]
irb(main):002:0> /x\z/.match("ax\n").to_a
=> []
irb(main):003:0> /x\z/.match("ax").to_a
=> ["x"]
irb(main):004:0>
 
J

Jim Freeze

--h31gzZEtNLTqOjlF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Friday, 12 December 2003 at 3:55:31 +0900, Luke A. Kanies wrote:

Hi Luke

I took your code and repeated the $end problem.
I removed the problem by putting an optional return
after your define { } block.

See code attached:


--
Jim Freeze
----------
Bubble Memory, n.:
A derogatory term, usually referring to a person's
intelligence. See also "vacuum tube".

--h31gzZEtNLTqOjlF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="gram.y"


class CricketParser
token DEFINE NAME STRING PARAM VALUE RETURN COMMENT WORD
#INLINECOMMENT EOF

rule

start : objects
;

objects: object { [val[0]] }
| objects object { [val[0], val[1]].flatten }
;

#object: DEFINE NAME '{' RETURN
object: DEFINE WORD '{' RETURN
vars
'}' optreturns {
Cricket.create(
val[1],val[3])
}
;

vars: var { [val[0]] }
| vars var { [val[0], val[1]].flatten }
;

#var: PARAM VALUE returns { [val[0],val[1]] }
var: WORD WORD returns { [val[0],val[1]] }
;

optreturns : /*none*/
| returns
;

returns: return
| returns return
;

return: comment RETURN
;

comment: # nothing
| COMMENT
;
# | INLINECOMMENT

end

---- inner


def parse(str)
str.strip!
str.gsub!(/[\r\f]/, "")
@orig_lines = str.dup.split("\n")
@line_no = 1

@q = []
until str.empty? do
case str

#
# Remove comments. Leaves \n
#
when /\A[ \t]*[#;].*$/
str = $'

#
# Remove white space
#
when /\A[ \t]+/o
#@q.push [:SPACE, " "]
str = $'

#
# Tokenize new lines. Consolidate multiple NL into one.
#
when /\A\n+/o
@line_no += $&.count("\n")
unless @q.empty?
@q.push [:RETURN, "\n"] unless
[:RETURN].include?(@q.last[0])
end
str = $'

# keywords
when /\A(define)\b/o
id = $&.upcase.intern
@q << [id, id]
str = $'

# WORD
when /\A[a-zA-Z0-9_.@\-]+/o
@q.push [:WORD, $&]
str = $'

# One character tokens
else
c = str[0,1]
@q.push [c, c]
str = str[1..-1]
end#case

puts "@q: #{@q.inspect}"

end#until

puts "@q: #{@q.inspect}" if $DEBUG
do_parse
end

def next_token
@q.shift
end


---- footer

if $0 == __FILE__ then
file = <<-EOT
# a comment
define contact {
contact_name vwf1607 ; inline comment
alias Lawrence-Hubenak
host_notification_period none
host_notification_commands host-notify-by-email
service_notification_period none
service_notification_commands notify-by-email
email (e-mail address removed)
pager (e-mail address removed)
}

EOT

class Cricket
def initialize(a,b)
@a,@b = a,b
end
def self.create(a,b)
Cricket.new(a,b)
end
end#class Cricket

puts 'parsing:'
print file
puts
puts 'result:'
p CricketParser.new.parse( file )
end



--h31gzZEtNLTqOjlF--
 
L

Luke A. Kanies

On Friday, 12 December 2003 at 3:55:31 +0900, Luke A. Kanies wrote:

Hi Luke

I took your code and repeated the $end problem.
I removed the problem by putting an optional return
after your define { } block.

Um, wow, thank you!

I ended up finally figuring out how to turn on debugging in racc (yep, a
hack: I had to use -E and then edit the parser.rb file manually, setting
@yydebug = true), and this enabled me to figure out, that, well, my
grammer didn't do anything like I expected.

So, I ended up basically rewriting the grammar itself. In doing so, I was
finally able to avoid the $end error. In other words, it was definitely a
grammar problem, and I probably would have caught it much sooner if I had
figured out earlier how to turn debugging on. The silly thing is, I know
there's an API for turning it on, but I haven't been able to extract it
yet.

The problem here is that I have used perl's Parse::Yapp, which behaves
quite differently in many ways, and most especially in how it deals with
syntax errors. It is totally my fault, because I was unconsciously
expecting racc to behave a certain way, and when it didn't I got very
confused. With the advent of debugging, I figured it out relatively
quickly.

As a side note, the grammar rules must set the value of 'return'. For
some reason, racc does not use explicit mechanisms for returning data;
instead you have to set the value of 'return' and it returns that for you.
This also caused a bunch of problems for me.

To summarize:

There is debugging in racc, and using racc -E to embed the parser and then
manually setting @yydebug = true can turn it on. I'm sure there's a
better way.

Also, you must set 'return' manually, although it can be any type of
variable.

Thanks for all your help, Jim.

Luke
 
M

Minero Aoki

Hi,

In mail "Re: problems with racc: $end token"
Luke A. Kanies said:
There is debugging in racc, and using racc -E to embed the parser and then
manually setting @yydebug = true can turn it on. I'm sure there's a
better way.

Set @yydebug=true in your "inner" and use racc -g.

% cat t.y
class MyParser
options no_result_var
rule
program: list
list : { [] }
| list ITEM { val[0].push val[1]; val[0] }

---- inner
def parse
@tokens = [
[:ITEM, '1'],
[:ITEM, '2'],
[:ITEM, '3']
]
@yydebug = true #####
do_parse
end
def next_token
@tokens.shift
end

---- footer
p MyParser.new.parse

~/tmp % racc -ot.rb t.y
~/tmp % ruby t.rb
["1", "2", "3"]

~/tmp % racc -g -ot.rb t.y
~/tmp % ruby t.rb
reduce <none> --> list
[ (list []) ]

goto 2
[ 0 2 ]

read :ITEM(ITEM) "1"

shift ITEM

(snip)

Also, you must set 'return' manually, although it can be any type of
variable.

Try this:

class MyParser
options no_result_var #### this line
rule
....


Regards,
Minero Aoki
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top