[SUMMARY] Parsing JSON (#155)

Discussion in 'Ruby' started by Ruby Quiz, Feb 7, 2008.

  1. Ruby Quiz

    Ruby Quiz Guest

    We saw a large variety of solutions for this week's problem. Many of them used
    a parser generator to construct their parser. You do that by defining a grammar
    that describes the syntax you need to read. The parser generator then
    translates your grammar into parsing code that will match the described syntax.
    Of the generators used, Treetop was definitely the most popular and is surely
    worth a look if you want to do some grammar based parsing.

    I'm not going to show a grammar based solution here, not because I don't like
    them, but because I want to show a few of the ideas behind simple parsing. To
    do that, we will need to examine a hand-rolled solution. Just to be clear
    though, using grammar based parsers can often be a more robust choice if you can
    find an official grammar for the content you need to parse.

    Eric Mahurin sent in a hand-rolled solution that has quite a few advantages.
    First, it is trivial to adapt so that the entire content to be parsed doesn't
    need to be read into memory. It's also some very efficient code.
    Unfortunately, that makes it a touch less obvious if you aren't already familiar
    with parsing techniques.

    Given that, I'm going to show my own hand-rolled recursive descent parser. It's
    not as cool as Eric's but it's a little easier to take in. We will say it's a
    good introduction to Eric's code, which you should be able to follow after I
    break this down.

    Here's the beginning of my parser:

    #!/usr/bin/env ruby -wKU

    require "strscan"

    class JSONParser
    AST = Struct.new:)value)

    def parse(input)
    @input = StringScanner.new(input)
    parse_value.value
    ensure
    @input.eos? or error("Unexpected data")
    end

    private

    # ...

    One of the first concerns when parsing is the need to manage where you currently
    are in the input. If you treat the input as an IO object, you can read input
    piece by piece and the IO object itself naturally keeps track of where you are.
    For String input though, it's often easier to use Ruby's standard StringScanner
    class. It wraps the input and allows you to test regular expression matches
    against the head of that input. It will tell you when they match or don't and
    when they do, your position automatically advances forward beyond that match.
    You can see that I set this up in the code above.

    The only public method for my class is parse(). It prepares the StringScanner
    as I just described, tries to parse a JSON value out of the input, then makes
    sure we consumed all of the input. Note that my use of ensure here isn't very
    standard. I'm just using it to run some code at the end of the method without
    changing the return value of the method.

    The AST definition also merits a bit of discussion. It would have been nice to
    just have each method return the Ruby objects for the JSON it parsed. However,
    false and nil (null in JSON) are legal JSON parses in some places. If I return
    those, I won't be able to tell if my parse succeeded or failed. To get around
    that, all parsed JSON values are wrapped in a trivial abstract syntax tree
    object. Returning this object is always true and, after I've verified that the
    parse worked, it's just one more method call to retrieve the actual value it
    parsed.

    It's worth mentioning that this parser is based on the not quite correct
    definition of JSON I put forth in the quiz tests. Only objects and arrays are
    allowed to be top-level JSON values. An easy fix is to replace this line

    # ...

    parse_value.value

    # ...

    with:

    # ...

    if top_level = parse_object || parse_array
    top_level.value
    else
    error("Illegal top-level JSON object")
    end

    # ...

    Now let's look at the main parser:

    # ...

    def parse_value
    trim_space
    parse_object or
    parse_array or
    parse_string or
    parse_number or
    parse_keyword or
    error("Illegal JSON value")
    ensure
    trim_space
    end

    # ...

    This method really sums up the basic strategy of recursive descent parsing. At
    each point, try to read out one of the legal values that can occur there. You
    can do that just by drilling down into more specialized methods that know how to
    read that one thing. If at any time you can't read a legal value, you have an
    error.

    Let's dig into the specialized parsers a bit more to see how this works:

    # ...

    def parse_object
    if @input.scan(/\{\s*/)
    object = Hash.new
    more_pairs = false
    while key = parse_string
    @input.scan(/\s*:\s*/) or error("Expecting object separator")
    object[key.value] = parse_value.value
    more_pairs = @input.scan(/\s*,\s*/) or break
    end
    error("Missing object pair") if more_pairs
    @input.scan(/\s*\}/) or error("Unclosed object")
    AST.new(object)
    else
    false
    end
    end

    # ...

    This code reads JSON objects. It's pretty linear, so let's digest it in order.
    First, we have to have an opening brace or we don't have an object at all. We
    can see here that I try a regular expression on the StringScanner to see if that
    is indeed what's up next. If it is scan() will return true and @input will
    advance past that brace for our future matches. If it's false, we can't read an
    object and the whole attempt is a bust.

    When we know we're inside an object, we create the Ruby equivalent (a Hash),
    fill it with all of the string/value pairs we can read, then make sure we find a
    closing brace. Reading the pairs is the trickiest part because we have to match
    a string, followed by a colon, and finally a value. Then, if we find a comma,
    we know another pair is expected. If not, we've read the whole object. Note
    that I verify these assumptions at every step and toss error()s if any of them
    fail. For parsing strings and values, we just reuse the parse_string() method
    we first saw called in parse_value() and parse_value() itself.

    You can see that I'm constantly trimming space around the JSON syntax. I could
    have also done that with repeated calls to the trim_space() helper we saw used
    in parse_value(), but that fattens up the code a bit and slows things down with
    more tests. For these simple cases, I opted for the shortcut.

    Having deciphered parse_object(), parse_array() is trivial:

    # ...

    def parse_array
    if @input.scan(/\[\s*/)
    array = Array.new
    more_values = false
    while contents = parse_value rescue nil
    array << contents.value
    more_values = @input.scan(/\s*,\s*/) or break
    end
    error("Missing value") if more_values
    @input.scan(/\s*\]/) or error("Unclosed array")
    AST.new(array)
    else
    false
    end
    end

    # ...

    This is identical to the process we just examined save that it's pulling
    individual values in the middle instead of string/value pairs. This simplifies
    the code a bit, as you can see. We also throw these objects into a Ruby Array
    instead of a Hash.

    Now we are ready for parse_string() and it has a couple of helpers:

    # ...

    def parse_string
    if @input.scan(/"/)
    string = String.new
    while contents = parse_string_content || parse_string_escape
    string << contents.value
    end
    @input.scan(/"/) or error("Unclosed string")
    AST.new(string)
    else
    false
    end
    end

    def parse_string_content
    @input.scan(/[^\\"]+/) and AST.new(@input.matched)
    end

    def parse_string_escape
    if @input.scan(%r{\\["\\/]})
    AST.new(@input.matched[-1])
    elsif @input.scan(/\\[bfnrt]/)
    AST.new(eval(%Q{"#{@input.matched}"}))
    elsif @input.scan(/\\u[0-9a-fA-F]{4}/)
    AST.new([Integer("0x#{@input.matched[2..-1]}")].pack("U"))
    else
    false
    end
    end

    # ...

    Whenever a structure you need to read gets more complicated, you want to break
    it down into smaller parsers that read more specialized pieces. Some probably
    would have broken down the the string/value pairs from object (into a
    parse_object_pair()), but you don't gain much for that and it was just simple
    enough that I opted for the easier approach. Here though we need to handle
    content and escapes differently and there may be any combination of them in any
    order inside the string. Now the gain is worth it.

    Content is easy enough to handle, since we can pass it through unaltered. It's
    already content in a Ruby String object. Escapes we have to work on a little
    more. Some we just unescape, but others need to be converted. I used pack() to
    handle Unicode, but you can see that I was lazy and used eval() on the special
    string escapes. All of these have the same meaning in Ruby and thanks to the
    match I know it's safe to eval() the contents without worrying about embedded
    Ruby nastiness.

    With those defined, parse_string() is similar to parse_array(). Find the start
    of a JSON string, pull content and escapes as long as we can, then find the end
    of the string.

    The last two parsers are the easiest of all:

    # ...

    def parse_number
    @input.scan(/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\b/) and
    AST.new(eval(@input.matched))
    end

    def parse_keyword
    @input.scan(/\b(?:true|false|null)\b/) and
    AST.new(eval(@input.matched.sub("null", "nil")))
    end

    # ...

    These are just match and eval() as you can plainly see. Again the evals() are
    safe because the match ensures we aren't facing any dangerous content.

    Some feel that using regular expressions like this isn't true parsing. We
    certainly could have chopped the number rule down into a bunch of smaller rules.
    However, the number definition is squarely in the domain of what regular
    expressions do well and I'm more of a practical kind of guy. I have access to
    regular expressions with this setup, the needed expression isn't really all that
    complex, and I like easy jobs. Thus I use them.

    All that is left are the two helpers I used, though the implementations won't be
    any surprise:

    # ...

    def trim_space
    @input.scan(/\s+/)
    end

    def error(message)
    if @input.eos?
    raise "Unexpected end of input."
    else
    raise "#{message}: #{@input.peek(@input.string.length)}"
    end
    end
    end

    First, trim_space() can just try a match to advance us pass any whitespace. It
    may fail, because there wasn't any whitespace to skip, but that doesn't affect
    us any. We know that we aren't about to read whitespace after it is called,
    either way.

    My error() wrapper just raise()s exceptions. It adds the content left to parse
    so you can see where you had trouble or replaces the message altogether to warn
    you that all of the content was exhausted.

    That's all it takes to build a simple JSON parser. Take some time to go look
    through the other hand-rolled solutions now and I bet you'll be surprised by how
    similar they work. Then you can look into grammars and how they simplify the
    process of defining new grammars.

    The final Ruby Quiz will take us back into the world of finance...
    Ruby Quiz, Feb 7, 2008
    #1
    1. Advertising

  2. Ruby Quiz

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Feb 7, 2008 8:08 AM, Ruby Quiz <> wrote:

    > We saw a large variety of solutions for this week's problem. Many of them
    > used
    > a parser generator to construct their parser. You do that by defining a
    > grammar
    > that describes the syntax you need to read. The parser generator then
    > translates your grammar into parsing code that will match the described
    > syntax.
    > Of the generators used, Treetop was definitely the most popular and is
    > surely
    > worth a look if you want to do some grammar based parsing.



    But not nearly the fastest... My very old Grammar class already generates
    parsers 5-10X faster than treetop and my next release will give another
    5-10X plus using ruby2cext will give another 5-10X (generates code that
    optimizes well with ruby2cext). Only a pure-C parser would beat it. I've
    had this stuff sitting around locally for too long and need to release.

    ... i'll get off my (biased) soapbox now ...

    <snip>

    Eric Mahurin sent in a hand-rolled solution that has quite a few advantages.
    > First, it is trivial to adapt so that the entire content to be parsed
    > doesn't
    > need to be read into memory. It's also some very efficient code.



    Computationally, I believe this is the simplest (most efficient) type -
    LL(1). I parses "L"eft-to-right and derives the "L"eftmost part of a
    grammar using (1) token/character. When the token is a character it becomes
    very easy to parse a file by (just need a variable to hold the next
    character in the file - previously read with IO.getc).


    > Unfortunately, that makes it a touch less obvious if you aren't already
    > familiar
    > with parsing techniques.



    Yep. When rolling by hand, Regexp makes it a bit easier to do recursive
    descent parsers since you can easily do more than just a character at a
    time. If no Regexp has unlimited match length, you'll have an LL(k) parser
    where k is that max match length. You could adapt reading from a file by
    keeping a string buffer of the next (k) characters in the file. I you have
    a Regexp that can have unlimited match length, I think you might call that
    an LL(*) parser. You'll have more problems parsing from a file in this
    case, unless possibly you can keep all matches contained in a line
    (IO.getswould act as a mini-parser looking for newline).

    Look here if you want to see more about LL parsers:

    http://en.wikipedia.org/wiki/LL_parser


    > The AST definition also merits a bit of discussion. It would have been
    > nice to
    > just have each method return the Ruby objects for the JSON it parsed.
    > However,
    > false and nil (null in JSON) are legal JSON parses in some places. If I
    > return
    > those, I won't be able to tell if my parse succeeded or failed. To get
    > around
    > that, all parsed JSON values are wrapped in a trivial abstract syntax tree
    > object. Returning this object is always true and, after I've verified
    > that the
    > parse worked, it's just one more method call to retrieve the actual value
    > it
    > parsed.



    The approach I like to take is have the caller pass the AST to be modified.
    I just use a sequence (Array or even String) to represent the sequence of
    branches at that level. Then each grammar method can put none to an
    arbitrary number of branches in the AST. There is much more flexibility.
    The grammar method just returns true or false for whether it matches
    independent of the AST. Some parsers don't even need an AST (or a sparse
    one) since things might be done on the fly as parsing occurs. Passing the
    AST separately works in all these cases.
    Eric Mahurin, Feb 8, 2008
    #2
    1. Advertising

  3. Ruby Quiz

    James Gray Guest

    On Feb 8, 2008, at 12:27 PM, Eric Mahurin wrote:

    > On Feb 7, 2008 8:08 AM, Ruby Quiz <> wrote:
    >
    >> We saw a large variety of solutions for this week's problem. Many
    >> of them
    >> used
    >> a parser generator to construct their parser. You do that by
    >> defining a
    >> grammar
    >> that describes the syntax you need to read. The parser generator
    >> then
    >> translates your grammar into parsing code that will match the
    >> described
    >> syntax.
    >> Of the generators used, Treetop was definitely the most popular and
    >> is
    >> surely
    >> worth a look if you want to do some grammar based parsing.

    >
    >
    > But not nearly the fastest...


    That's true, but I think the need for raw speed in parsers is seldom
    the top priority. Large XML can often cause speed issues, but outside
    of that I don't personally need many super fast parsers. Maybe I just
    don't run into those problems a lot though.

    James Edward Gray II
    James Gray, Feb 8, 2008
    #3
  4. James Gray wrote:
    > On Feb 8, 2008, at 12:27 PM, Eric Mahurin wrote:
    >>> Treetop was definitely the most popular

    >> But not nearly the fastest...

    > That's true, but I think the need for raw speed in parsers is seldom
    > the top priority.


    I was frankly amazed how much of the discussion was about speed.
    Besides, there's an awful lot of useful languages that you can't
    do with LL(1). Including Ruby and C...

    That said, Treetop is very slow, and we need to improve that.
    Fortunately, it's not very hard to.
    Clifford Heath, Feb 8, 2008
    #4
  5. Ruby Quiz

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Feb 8, 2008 5:59 PM, Clifford Heath <> wrote:

    > James Gray wrote:
    > > On Feb 8, 2008, at 12:27 PM, Eric Mahurin wrote:
    > >>> Treetop was definitely the most popular
    > >> But not nearly the fastest...

    > > That's true, but I think the need for raw speed in parsers is seldom
    > > the top priority.

    >
    > I was frankly amazed how much of the discussion was about speed.
    > Besides, there's an awful lot of useful languages that you can't
    > do with LL(1). Including Ruby and C...



    There are techniques to convert LL(k) and LR(k) grammars to LL(1) and LR(1)
    (factoring, left-recursion removal), but they can make the grammar get large
    and can make the actions difficult. C shouldn't be an issue. Ruby is
    definitely a beast to parse. Starting with the YACC spec, (almost) any
    LL/recursive-descent/packrat parser will have a hard time dealing with the
    left-recursion. Not to mention all of the lexer states. Starting from
    scratch is probably a better choice for many LL/recursive-descent/packrat in
    handling LR grammars (with left-recursion). That being said, I have a way
    of handling left-recursion directly in my development LL(1/*) parser
    generator. I haven't seen any other LL parsers do it.
    Eric Mahurin, Feb 9, 2008
    #5
  6. Eric Mahurin wrote:
    > There are techniques to convert LL(k) and LR(k) grammars to LL(1) and LR(1)
    > (factoring, left-recursion removal), but they can make the grammar get large


    That's an understatement! They must magnify the number of rules
    by something like M^(k-1), where M is the number of terminals...
    or something like that. It's not reasonable to consider, and still
    doesn't do LL(*) like a packrat does.

    C labels require 2 levels of lookahead.
    Clifford Heath, Feb 9, 2008
    #6
  7. Ruby Quiz

    James Gray Guest

    On Feb 8, 2008, at 5:59 PM, Clifford Heath wrote:

    > James Gray wrote:
    >> On Feb 8, 2008, at 12:27 PM, Eric Mahurin wrote:
    >>>> Treetop was definitely the most popular
    >>> But not nearly the fastest...

    >> That's true, but I think the need for raw speed in parsers is
    >> seldom the top priority.

    >
    > I was frankly amazed how much of the discussion was about speed.


    Don't be. It's a common pastime for little challenges like this.
    It's the easiest metric to quantify, so it makes for fun
    comparisons. :)

    James Edward Gray II
    James Gray, Feb 9, 2008
    #7
  8. Ruby Quiz

    ThoML Guest

    Re: Parsing JSON (#155)

    > I was frankly amazed how much of the discussion was about speed.

    I personally found it quite interesting to see how well a hand-crafted
    parser could perform. I initially assumed the hackish RE-based
    solution would be fastest.

    Another aspect would of course be how long it takes to create such a
    parser in comparison to the other solutions. Unfortunately, we don't
    have timings for that.

    > That said, Treetop is very slow, and we need to improve that.


    My main concern with treetop isn't so much speed but rather that the
    polygot approach. While the idea per se is pretty cool, it seems to
    preclude a programmatic generation/extension of a grammar definition.
    Eg store the rules as an array of lambdas, programmatically add a new
    rule on demand, recreate the parser etc. If I understand it right,
    this isn't easily possible with treetop. I scanned the source code a
    little bit and it seems treetop requires the parser definition to be
    saved on disk. I assume you're now going to tell me I'm wrong? Please
    do so. I'd be glad to here it's otherwise.

    Regards,
    Thomas.
    ThoML, Feb 9, 2008
    #8
  9. Re: Parsing JSON (#155)

    ThoML wrote:
    >> I was frankly amazed how much of the discussion was about speed.

    > I personally found it quite interesting to see how well a hand-crafted
    > parser could perform. I initially assumed the hackish RE-based
    > solution would be fastest.


    Yes, it was interesting. I just thought that a few other things
    were interesting too ;-)

    > My main concern with treetop isn't so much speed but rather that the
    > polygot approach.


    Well, I'm the author of the Polyglot gem, and that doesn't preclude
    the idea of executing Ruby blocks during the parse. I'd be much in
    favor of that actually, but it's done Nathan's way as he created
    Treetop. I like the way ANTLR and grammar let you build the parse
    tree you want. Or not. But remember, packrat needss parse tree nodes
    that contain enough to enable the backtracking.

    Since Treetop has the ability to request a custom node (using the
    <my_syntax_node_class> annotation) perhaps you can do what you want
    by adding code to the constructor in that class? Just remember though,
    the node might get discarded during backtracking, so don't do anything
    too permanent :).

    > Eg store the rules as an array of lambdas, programmatically add a new
    > rule on demand, recreate the parser etc. If I understand it right,
    > this isn't easily possible with treetop.


    I think it would be a sin against the gods of good taste to do that,
    especially when the thing that did it might get discarded!

    > I scanned the source code a
    > little bit and it seems treetop requires the parser definition to be
    > saved on disk. I assume you're now going to tell me I'm wrong? Please
    > do so. I'd be glad to here it's otherwise.


    It's otherwise. The compiler uses a Builder pattern, and then either
    saves that or eval's it. When Polyglot requires a grammar, it first
    passes through the normal (or rubygems+normal) require, which will
    load a .rb file if one exists in the path. Otherwise it compiles
    the grammar file and eval's it.

    I think, without testing it, that multiple require's in the 2nd
    situation will compile the grammar multiple times, which would be
    a bug... I'll test for that tomorrow... bed now ;-).

    Clifford Heath.
    Clifford Heath, Feb 9, 2008
    #9
  10. Ruby Quiz

    Eric Mahurin Guest

    Re: Parsing JSON (#155)

    [Note: parts of this message were removed to make it a legal post.]

    On Feb 9, 2008 12:59 AM, ThoML <> wrote:

    > > I was frankly amazed how much of the discussion was about speed.

    >
    > I personally found it quite interesting to see how well a hand-crafted
    > parser could perform. I initially assumed the hackish RE-based
    > solution would be fastest.
    >


    I would have thought that too. I knew my LL(1) hand solution would be near
    the top but was surprised that it came out on top (at least on my
    machine/benchmark) especially considering the RE+eval (especially yours)
    solutions. I think object creation was a big factor. My LL(1) parser
    created basically nothing but the AST. The match objects likely slowed down
    the RE solutions (StringScanner was probably best off since the match object
    is just a string). When you turn off GC, some of the RE solutions beat
    mine.

    Another aspect would of course be how long it takes to create such a
    > parser in comparison to the other solutions. Unfortunately, we don't
    > have timings for that.
    >


    I'm guessing that you could crank out an initial RE solution the fastest.
    But, those solutions were also the ones that failed as more tests were
    added, since they were hacky. The solutions that took no shortcuts and
    followed the spec to a T from the beginning may have taken longer initially,
    but they had fewer issues.


    >
    > > That said, Treetop is very slow, and we need to improve that.

    >
    > My main concern with treetop isn't so much speed but rather that the
    > polygot approach. While the idea per se is pretty cool, it seems to
    > preclude a programmatic generation/extension of a grammar definition.
    > Eg store the rules as an array of lambdas, programmatically add a new
    > rule on demand, recreate the parser etc.



    My Grammar classes definitely give you that ability. One of the many powers
    of using Ruby as the DSL.

    For example, if you deal with a "," separated list terminated by ";" for a
    variety of items types, you could do something like this:

    comma_list = lambda { |item| item.list1(E(?;), E(?,)) }

    then use it later like this

    string_list = comma_list[string] # string defined earlier
    number_list = comma_list[number] # number defined earlier

    One of the powers is creating your own generic grammar constructs on the
    fly.
    Eric Mahurin, Feb 9, 2008
    #10
  11. Clifford Heath wrote:
    > James Gray wrote:
    >> On Feb 8, 2008, at 12:27 PM, Eric Mahurin wrote:
    >>>> Treetop was definitely the most popular
    >>> But not nearly the fastest...

    >> That's true, but I think the need for raw speed in parsers is seldom
    >> the top priority.

    >
    > I was frankly amazed how much of the discussion was about speed.
    > Besides, there's an awful lot of useful languages that you can't
    > do with LL(1). Including Ruby and C...
    >
    > That said, Treetop is very slow, and we need to improve that.
    > Fortunately, it's not very hard to.


    I was reading about PEG parsers on the wikipedia page [1], and wondered
    along to a site about packrat parsers [2].

    One article there [3] seems to imply that while a packrat parser's
    memoization gives linear theoretical time, for almost all real grammars
    (and inputs, I presume) it's actually not necessary.

    They suggest that for the cases where there is a an exponential time
    problem, it's sensible to rely on user supplied memoization directives
    in the grammar, or to add memoization selectively and automatically,
    based on profiling information.

    I'm not sure if that's any help, or if you've already come across these
    articles. Hopefully they're of some interest though.

    Cheers,
    Benjohn


    [1]http://en.wikipedia.org/wiki/Parsing_expression_grammar
    [2]http://pdos.csail.mit.edu/~baford/packrat/
    [3]http://www.mercury.csse.unimelb.edu.au/information/papers/packrat.pdf
    Benjohn Barnes, Feb 19, 2008
    #11
  12. Benjohn Barnes wrote:
    > One article there [3] seems to imply that while a packrat parser's
    > memoization gives linear theoretical time, for almost all real grammars
    > (and inputs, I presume) it's actually not necessary.


    There's a bit of a chicken/egg problem here.

    Existing computer languages have been created to be easy to parse,
    which means they require minimum lookahead- one or two tokens.

    In part, that's because such languages are easier for humans to
    parse as well, and in part because parser generators couldn't
    produce efficient parsers for them... until now. Now that packrat
    is on the scene, we're free to create more natural looking grammars.

    > They suggest that for the cases where there is a an exponential time
    > problem, it's sensible to rely on user supplied memoization directives
    > in the grammar,


    That requires a lot more user understanding (of where to memoize),
    or a whole lot more analysis. ANTLR takes the 2nd approach, but it's
    a difficult task that still causes many surprises. Treetop is slow,
    but it's easily accessible to Joe Average without special skills and
    without major surprises. I think it fills an important niche.

    > or to add memoization selectively and automatically,
    > based on profiling information.


    That would be cool. Write an exponential-time parser, run it over
    your test cases in learning mode, and then regenerate it to get a
    fast parser.

    Clifford Heath.
    Clifford Heath, Feb 20, 2008
    #12
  13. Clifford Heath wrote:
    > Benjohn Barnes wrote:
    >> One article there [3] seems to imply that while a packrat parser's
    >> memoization gives linear theoretical time, for almost all real
    >> grammars (and inputs, I presume) it's actually not necessary.

    >
    > There's a bit of a chicken/egg problem here.
    >
    > Existing computer languages have been created to be easy to parse,
    > which means they require minimum lookahead- one or two tokens.
    >
    > In part, that's because such languages are easier for humans to
    > parse as well, and in part because parser generators couldn't
    > produce efficient parsers for them... until now. Now that packrat
    > is on the scene, we're free to create more natural looking grammars.


    That's an interesting suggestion - that languages are as they are to a
    great extent because of the difficulty of implementing their parsers, so
    existing languages don't need memoized parsing. Certainly a compelling
    point. Do you see newer parsing opening up language design then?

    I bumped in to a few suggestions of plugable language syntax while
    reading about PEG, and I definitely like that idea.

    Cheers,
    B
    Benjohn Barnes, Mar 2, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michel Bardiaux

    Re: libclc: clc_getopt (a bit long, 155 lines)

    Michel Bardiaux, Jul 15, 2003, in forum: C Programming
    Replies:
    2
    Views:
    315
    Chris Torek
    Jul 16, 2003
  2. The Jetman
    Replies:
    0
    Views:
    311
    The Jetman
    Oct 31, 2003
  3. mdh

    P 155 k&r section 7.3

    mdh, Sep 6, 2008, in forum: C Programming
    Replies:
    35
    Views:
    943
    CBFalconer
    Sep 8, 2008
  4. Ruby Quiz

    [QUIZ] Parsing JSON (#155)

    Ruby Quiz, Feb 1, 2008, in forum: Ruby
    Replies:
    84
    Views:
    1,018
    Eric Mahurin
    Feb 7, 2008
  5. David Karr
    Replies:
    1
    Views:
    162
    David Karr
    Jun 17, 2013
Loading...

Share This Page