implementation for Parsing Expression Grammar?


X

xahlee

In the past weeks i've been thinking over the problem on the practical
problems of regex in its matching power. For example, often it can't
be used to match anything of nested nature, even the most simple
nesting. It can't be used to match any simple grammar expressed by
BNF. Some rather very regular and simple languages such as XML, or
even url, email address, are not specified as a regex. (there exist
regex that are pages long that tried to match email address though)

I wrote out a more elaborate account of my thoughts here:
http://xahlee.org/cmaci/notation/pattern_matching_vs_pattern_spec.html

----------------

After days of researching this problem, looking into parsers and its
theories etc, today i found the answer!!

What i was looking for is called Parsing Expression Grammar (PEG).

See
http://en.wikipedia.org/wiki/Parsing_expression_grammar

It seems to me it's already in Perl6, and there's also a
implementation in Haskell. Is the perl6 PEG is in a usable state?

Thanks.

Xah
(e-mail address removed)
∑ http://xahlee.org/

☄
 
Ad

Advertisements

K

Kay Schluehr

In the past weeks i've been thinking over the problem on the practical
problems of regex in its matching power. For example, often it can't
be used to match anything of nested nature, even the most simple
nesting. It can't be used to match any simple grammar expressed by
BNF. Some rather very regular and simple languages such as XML, or
even url, email address, are not specified as a regex.

Well formed XML cannot be fully specified within BNF as well because
it is context sensitive: in order to recognize a tag/endtag pair one
has to maintain a stack. That's not a big deal in practice if one
wants to write an XML parser but one can't use an arbitrary LL or LR
parser generator to produce a parse tree representing the XML.
 
Ad

Advertisements

B

Barb Knox

Lars Rune Nøstdal said:
Hi,
Finite automata works for "nested things".

Only in the special case when the depth of nesting is bounded ahead of
time. If it's unbounded then there is an unbounded amount of "stack"
information that the automaton needs to remember, therefore a finite
automaton cannot do it.

<pedantry>
That should be "Finite automata WORK ...", since "automata" is plural.

--
---------------------------
| BBB b \ Barbara at LivingHistory stop co stop uk
| B B aa rrr b |
| BBB a a r bbb | Quidquid latine dictum sit,
| B B a a r b b | altum viditur.
| BBB aa a r bbb |
-----------------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top