Writing parsers?

Discussion in 'Ruby' started by Paatsch, Bernd, Feb 13, 2006.

  1. ------_=_NextPart_001_01C630F9.78EE12D1
    Content-Type: text/plain

    Hello,

    I got this great task assigned to write a parser and looking at the files to
    parse is not very trivial. Does anybody know where to find a website that
    would explain steps and pitfalls to avoid writing a parser?
    Any suggestion/help in is appreciated.

    Thanks,
    Bernd

    ------_=_NextPart_001_01C630F9.78EE12D1--
    Paatsch, Bernd, Feb 13, 2006
    #1
    1. Advertising

  2. D=C5=88a Utorok 14 Febru=C3=A1r 2006 00:59 Paatsch, Bernd nap=C3=ADsal:
    > Hello,
    >
    > I got this great task assigned to write a parser and looking at the files
    > to parse is not very trivial. Does anybody know where to find a website
    > that would explain steps and pitfalls to avoid writing a parser?
    > Any suggestion/help in is appreciated.
    >
    > Thanks,
    > Bernd


    http://epaperpress.com/lexandyacc/ seems to be a useful resource, provided =
    you=20
    already know some theory behind formal grammars and such. The two tools and=
    =20
    their derivatives are pretty much the open source standard for writing=20
    parsers. I believe there are Ruby bindings / variants of both.

    ANTLR is also somewhat used, but you're probably looking at Java there.

    David Vallner
    David Vallner, Feb 14, 2006
    #2
    1. Advertising

  3. Paatsch, Bernd

    Doug H Guest

    Doug H, Feb 14, 2006
    #3
  4. I just whipped this up in a bit of free time. It may be a decent
    starting point for a pure ruby parser. Note that there is no lookahead
    ability.

    class ParseError < StandardError; end

    class Parser

    @@reductions = {}
    @@reduction_procs = {}
    @@tokens = {}
    @@token_values = {}

    # Parse either a string or an IO object (read all at once) using the
    rules defined for this parser.
    def parse(input)
    stack = []
    value_stack = []
    text = input.is_a?(IO) ? input.read : input.dup
    loop do
    token, value = retrieve_token(text)
    stack << token
    value_stack << value
    reduce_stack(stack, value_stack)
    if text.length == 0
    if stack.length == 1
    return stack[0], value_stack[0]
    else
    raise ParseError, 'Stack failed to reduce'
    end
    end
    end
    end
    protected

    # Retrieve a single token from the input text and return an array of
    it and its value.
    def retrieve_token(text)
    @@tokens.each do |regexp, token|
    if md = text.match(regexp)
    text.gsub!(regexp, '')
    return [token, @@token_values[token] ?
    @@token_values[token].call(md.to_s) : nil]
    end
    end
    raise ParseError, "Invalid token in input near #{text}"
    end

    # Compare the stack to reduction rules to reduce any matches found
    def reduce_stack(stack, value_stack)
    loop do
    matched = false
    @@reductions.each do |tokens, result|
    if tokens == stack[stack.length - tokens.length, tokens.length]
    start_pos = stack.length - tokens.length
    stack[start_pos, tokens.length] = result
    value_stack[start_pos, tokens.length] =
    @@reduction_procs[tokens] ?
    @@reduction_procs[tokens].call(value_stack[start_pos, tokens.length]) :
    nil
    matched = true
    break
    end
    end
    return unless matched
    end
    end

    def self.token(regexp, token, &block)
    @@tokens[Regexp.new('\A' + regexp.to_s)] = token
    @@token_values[token] = block
    end

    def self.rule(*tokens, &block)
    final = tokens.pop
    tokens += final.keys
    result = final.values.first
    @@reductions[tokens] = result
    @@reduction_procs[tokens] = block
    end
    end

    class TestParser < Parser
    token /foo/i, :foo do |s|
    s.upcase
    end
    token /bar/i, :bar do |s|
    s.downcase
    end
    token /mega/i, :mega do |s|
    3
    end
    rule :foo, :bar => :foobar do |foo, bar|
    foo + bar
    end
    rule :mega, :foobar => :megafoobar do |mega, foobar|
    foobar * mega
    end
    end
    Timothy Goddard, Feb 14, 2006
    #4
  5. Robert Klemme, Feb 14, 2006
    #5
  6. Paatsch, Bernd

    Phil Tomson Guest

    In article <>,
    Timothy Goddard <> wrote:
    >I just whipped this up in a bit of free time. It may be a decent
    >starting point for a pure ruby parser. Note that there is no lookahead
    >ability.
    >
    >class ParseError < StandardError; end
    >
    >class Parser
    >
    > @@reductions = {}
    > @@reduction_procs = {}
    > @@tokens = {}
    > @@token_values = {}
    >
    > # Parse either a string or an IO object (read all at once) using the
    >rules defined for this parser.
    > def parse(input)
    > stack = []
    > value_stack = []
    > text = input.is_a?(IO) ? input.read : input.dup
    > loop do
    > token, value = retrieve_token(text)
    > stack << token
    > value_stack << value
    > reduce_stack(stack, value_stack)
    > if text.length == 0
    > if stack.length == 1
    > return stack[0], value_stack[0]
    > else
    > raise ParseError, 'Stack failed to reduce'
    > end
    > end
    > end
    > end
    > protected
    >
    > # Retrieve a single token from the input text and return an array of
    >it and its value.
    > def retrieve_token(text)
    > @@tokens.each do |regexp, token|
    > if md = text.match(regexp)
    > text.gsub!(regexp, '')
    > return [token, @@token_values[token] ?
    >@@token_values[token].call(md.to_s) : nil]
    > end
    > end
    > raise ParseError, "Invalid token in input near #{text}"
    > end
    >
    > # Compare the stack to reduction rules to reduce any matches found
    > def reduce_stack(stack, value_stack)
    > loop do
    > matched = false
    > @@reductions.each do |tokens, result|
    > if tokens == stack[stack.length - tokens.length, tokens.length]
    > start_pos = stack.length - tokens.length
    > stack[start_pos, tokens.length] = result
    > value_stack[start_pos, tokens.length] =
    >@@reduction_procs[tokens] ?
    >@@reduction_procs[tokens].call(value_stack[start_pos, tokens.length]) :
    >nil
    > matched = true
    > break
    > end
    > end
    > return unless matched
    > end
    > end
    >
    > def self.token(regexp, token, &block)
    > @@tokens[Regexp.new('\A' + regexp.to_s)] = token
    > @@token_values[token] = block
    > end
    >
    > def self.rule(*tokens, &block)
    > final = tokens.pop
    > tokens += final.keys
    > result = final.values.first
    > @@reductions[tokens] = result
    > @@reduction_procs[tokens] = block
    > end
    >end
    >
    >class TestParser < Parser
    > token /foo/i, :foo do |s|
    > s.upcase
    > end
    > token /bar/i, :bar do |s|
    > s.downcase
    > end
    > token /mega/i, :mega do |s|
    > 3
    > end
    > rule :foo, :bar => :foobar do |foo, bar|
    > foo + bar
    > end
    > rule :mega, :foobar => :megafoobar do |mega, foobar|
    > foobar * mega
    > end
    >end
    >


    This is a bit like Grammar:
    http://grammar.rubyforge.org/0.5/

    Phil
    Phil Tomson, Feb 15, 2006
    #6
  7. Grammar looks much more similar to Spirit, a C++ parser which looks
    really simple to use. It uses a very simple domain-specific language
    for writing grammars in C++ code. It's part of the boost libraries. It
    would be my first choice for a medium-speed parser that could be used
    quite easily from Ruby with just a few joining bits of C. Parsers in
    the style of YACC or Bison are much faster again, but the added
    complexity of defiing grammar probably makes using it a premature
    optimisation for most tasks.
    Timothy Goddard, Feb 15, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dede
    Replies:
    0
    Views:
    396
  2. Marcos

    Locator2 and SAX parsers

    Marcos, Jul 15, 2003, in forum: Java
    Replies:
    0
    Views:
    355
    Marcos
    Jul 15, 2003
  3. Nathaniel Hughes

    Cloning Pull Parsers? Particularly XPP3

    Nathaniel Hughes, Feb 2, 2004, in forum: Java
    Replies:
    0
    Views:
    331
    Nathaniel Hughes
    Feb 2, 2004
  4. Gibakou

    javax.xml.Parsers location

    Gibakou, Aug 16, 2004, in forum: Java
    Replies:
    4
    Views:
    909
    gizzy truong
    Nov 23, 2004
  5. Clint Olsen

    Writing fast(er) performing parsers in Perl

    Clint Olsen, Apr 19, 2004, in forum: Perl Misc
    Replies:
    8
    Views:
    216
    Clint Olsen
    Apr 21, 2004
Loading...

Share This Page