With this artifact, everyone can easily invent new languages


Simeon Chaos

How is peasy (https://github/chaosim/peasy) simpler and more powerful than other parser tools?

Simpler and more powerful? Maybe it is a bit contrary to common sense. Trueor false? To affirm, please give this project (https://github.com/chaosim/peasy) a glimpse at first. Because of being simple , you can comprehend it in a little while; because of being powerful, it will deserve your every minute.

All along, the design of programming languages ​​is a complex task. Because we need to understand the esoteric compiler theory and technology, and one of the most critical and very difficult part is to define the rules of the new language and to parse with them.To solve this problem,there have been many theories , techniques and tools . These tools can be roughly divided into two categories: one is parser generator, another is towrite the parser by hand with or without a parser library.

The parser generator generally based on some type of formal languages ​​, by which the generator make the lexer and parser from somerules. Strong aspect of these tools is that they can produce the most efficient parser support for their own form of language types , according to the characteristics of formal language , generally ensures linear time complexity. Commonly divided into three types : LR technology , LL technology , PEG technology. LR technology language has shift/reduction , from the bottomup , the most right derived and other technical features , they can only support LR languages ​​, more accurately, mainly LALR variants , plus some special treatment for the priority of operator. One of the most famous was undoubtedly the lex / yacc and its numerous subsequent variants ( such as bison, jison (javascript language ), ply (python lex / yacc) , etc. ) . LL technology has a top-down form of the language , recursive descent , the most left-dereived and other technical features , although the concept is easier to understand than the LR language , but because LL covers less languages, so not used as universal as lex / yacc. Representative products are antlr. PEG technology refers to parsing expression grammar parser generator. peg is a kind of formal grammar particularly suited to be parsed, currently there are already many tools appear. The most common method is packrat algorithm based implementations. Such as rats, ometa (there are many versions , such as ometa / squeak, ometa-js, pyMeta etc. ), pyPEG under python, peg.js under javascript, treetop and citrus under ruby, and so on.

No matter what type of parser generator , the method to use them is to design rules for these tools , and then the tools generate parser. This is the difficult aspects. First, the object program is generated based on the state transfer and stack process, it is difficult to understand, debug and modify. Second, we must understand parser theory and technology of these tools is based on, the rules used by these tools is actually a domain-specific language , their expressivenes is very limited, while we must understand and become familiar with the new language in order to design rules. Third, we must embed certain processing ( abstract syntax tree structure , semantics, error reporting and handling , etc) in the form of object parser language into lex/parser rules. Difficulties in these areas prevent the most programmers to easily use this type of tool.
Meanwhile, people have also been pursuing a more relaxed and flexible approach to write parsers. Most of these methods produce tools can be classifiedas combinational parser library , or peg grammar-based peg library . Use of such libraries, programmers can use their own language in daily use to design a new universal language , parsing text, so they are used more freguently. Representative works is parsec under haskell language which maybe is the most mature and powerful. However, because haskell is too academic, andparsec is not universally popular. c + + language has boost phoenix library, probably because it depends on the c++ template, an advanced language features, it has not been widely used too. Python has pyparsing, which is used by many users. For specific questions , there are many applications do not use any tools or libraries, but manually write the entire parser , for example: cpython implementation, esprima for javascript. However, unfortunately, in their grammar and parsing of these tools, there are two obvious difficulties not been solved: the first is left recursive grammar, because left-recursive function will lead to an unconditional infinite recursion. The second is the parsing efficiency. In order to obtain a linear time complexity , we should only parse same syntax component only once at any position. To achieve the latter, the result of parsing is required. Meanwhile , the parser also need to be considered in conjunction to produce abstract syntax trees , semantic processing , error handling , etc., and these libraries maybe try to keep away from some of the problems (in the combined libraries left recursion are generally fobidden), or in order to solve these problems the program becomes very complex , difficult to understand , use, modify and extend.

Can be seen from the above description , parsing is a very common programming needs , but also has a very high technical difficulty , resulting in a large number of theories, technique, and tools you have seen above, in this regard a variety of technical papers, code and tools can be used simply to describe the voluminous , but still did not achieve the desired results .

with peasy, a complete and elegant solution to all these problems emerge for the first time. On one hand you can say peasy the most simple and elegant, but also has the most powerful and adaptability , and does not need to sacrifice speed and efficiency.
Its characteristics are as follows:

* for the first time, peasy provides a simple and complete solution to the problem of left recursion in the hand-written recursive descent parser manner, no matter direct or indirect left-recursive rule . To handle left recursive in peasy, only one function is needed (only 20 lines of code in coffeescript).
* peasy allows programmers to write the parser with common programming language, and the parser and the input to the grammar rules are similar to the parser generator , both concise and readable , easy to modify and extend.
* the simple and flexible caching mechanism in peasy can improve the time efficiency of parsing, we can maintain a linear time for liner time complexity grammar by caching. For complex grammar and data , we can also improve time efficiency by caching mechanism, and avoide the exponential, cube or square time complexity like some other common parsing algorithm. peasy's cache implementation is very simple, only one function (only ten lines of code under coffeescript ) .
* the entire peasy implementation is very simple. only two hundred lines ofcode under cofeescript. The concept is simple. Only two kind of components: some match functions which operate on the parsed data and parsing pointerand some combinational functions which generate match function. all of these functions are very simple , usually only a few lines of code , And they exists in the code of peasy for demonstration objective, can be deleted, modified and replaced, according to the specific needs.

In fact, instead of to be considered as a library or a tool, maybe it's better to view peasy as a method, a way or some example to write parsers mannually. This will allow for a greater degree of power.

peasy provide specific examples to support the above conclusion. The most obvious example is the common programming languages ​​in order to adapt and modify their parser tool to prevent left- recursive grammar. For example python grammar (http://docs.python.org/2/reference/grammar.html), javascript grammar (http://www-archive.mozilla.org/js/language/grammar14.html), while by using peasy, all binary arithmetic expression can be condensed into a simple left -recursive rules , left recursion elimination isnot needed any more. For expression parsing, in order to further improve time efficiency , peasy also provides an example , has many different priorities for computing the call stack example can skip directly to the highest priority constant direct expression resolves to the final rather than, as is currently common practice, the rules of writing as follows : or -> (or | | and) | and; ...; add -> add + mul | mul; mul -> mul * atom | atom. Rare is , peasy while achieving enhanced capabilities of these expressions can also ensure that the linear time complexity of the parsing process. Want to know how peasy achieve these results? Please give a glimpse to peasy on github(https://github.com/chaosim/peasy).

the first two of following links is the messages I posted when I started toget the inspiration to start writing peasy. Because there are some of other tasks to finish, intermediate interrupted for some time. Not long ago, I use peasy to parse something in another projects , and I further optimize the api of peasy, and which is now very simple and elegant, and I am satisfied :) As for the most critical left recursive , and caching algorithms , withstood the subsequent examples of the test , there is no any change. Together, these examples also continues to demonstrate the advantages of peasy .

Finally, I need to mention the impact of language on thought: Without coffeescript language, maybe it's difficult for me to have these ideas. in coffeescript, the -> function notation and "assignment is an expression" makingspecial concise readable grammar. In python similar results can be achieved, need to go through some twists and turns, and that is not so natural as in coffeescript.There should exists no big obstacle in other dynamic languages, but they is not as elegant as coffeescript tool. This point is obviouswith the difference between the javascript code of peasy and its coffeescript source.

messages I posted in google groups before:

lr grammar and lex/yacc, bison, ply

LL grammar

some one asked for parser to handle left recursion on SO:

peg and packrat


Alessandro Warth describe a method to handle left recursion in peg grammar in his PhD paper firstly.
After I have my idea about Peasy and algorith to handle left recursion, I searched and found his paper, and found my algorithm is actully similar to his method.
Memoization in Top-Down Parsing


parsing under python

parser for javascript or in javascript



Chris Angelico

All along, the design of programming languages is a complex task. Becausewe need to understand the esoteric compiler theory and technology, and oneof the most critical and very difficult part is to define the rules of thenew language and to parse with them.To solve this problem, there have beenmany theories , techniques and tools . These tools can be roughly divided into two categories: one is parser generator, another is to write the parser by hand with or without a parser library.

No tool will change the fact that the *design* of a language is a
complex task. Long before you get to implementing anything, you have
to decide what your language will look like, what its special features
are, how it distinguishes one thing from another, etc etc etc. That
work won't change based on the tool you use - or rather, if it DOES
change, it's because the tool is too restrictive. First write some
code in your new language, then and only then work out how to
implement an interpreter/compiler. The first part of the job is pretty
complex (and extremely important), and tools don't come into play till
the second.


Simeon Chaos

Yes, it's complex to design a new language. So don't let the tool stand in the way. There is a saying: Grinding a chopper will not hold up thework of cutting firewood.

在 2014å¹´1月11日星期六UTC+8上åˆ10æ—¶17分33秒,Chris Angelico写é“:

James Harris

Simeon Chaos said:
? 2014?1?11????UTC+8??10?17?33?,Chris Angelico??:
Yes, it's complex to design a new language. So don't let the tool stand in
the way. There is a saying: Grinding a chopper will not hold up the work
of cutting firewood.

To the OP: this is a suitable topic for comp.lang.misc which is used for
discussions about programming language design and implementation such as
parse mechanisms.


Simeon Chaos

Thank you, James. I didn't know this group before. I'll post this message there.

在 2014å¹´1月11日星期六UTC+8下åˆ3æ—¶47分33秒,James Harris写é“:



James Harris

Simeon Chaos said:
Thank you, James. I didn't know this group before. I'll post this message

You're welcome. It can be hard to find apt groups on Usenet because there
are so many. I don't think there was ever a group for programming language
design (or invention) so that one which was about miscellaneous languages
picked up the topic.

By the way, you could use comp.lang.misc simply for an announcement such as
you posted here. If you want to start a discussion about an aspect of your
design, however, the best option is to post a specific topic.



Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question