[ann] regexp-engine 0.4

S

Simon Strandgaard

download:
http://rubyforge.org/download.php/219/regexp-engine-0.4.tar.gz

homepage:
http://raa.ruby-lang.org/list.rhtml?name=regexp


Try it out; tell me your opinion.

--
Simon Strandgaard



Changes
=======

non-greedy matching has been implemented. You can now do
/a(.*?)a/.match("0a1a2a3").to_a #=> ["a1a", "1"]

Now using iterators internally; the way has been paved
for i18n, so that the engine operate on unicode, jis..etc.


Status
======

Data structure has stabilized and the fundemental operations
are working quite good (was difficult to implement).
Uses iterators, this should make it easy to operate on many
different kinds of input-streams (unicode, UTF-8), but right
now the iterator only works on ASCII.
Performance is not impressive.
Left is all the easy stuff (character-classes, unicode, optimizaition).

* features of the scanner so far:
a|b|c alternation
* + ? {n,m} repeat(min..max) greedy/lazy
( ... ) grouping -> register.. nested repeat also works
. match anything except newline
\1 .. \9 backreferences

* features of the parser so far:
a|b|c alternation
* *? repeat(0..infinity) greedy/lazy
+ +? repeat(1..infinity) greedy/lazy
{n,} {n,}? repeat(n..infinity) greedy/lazy
? ?? repeat(0..1) greedy/lazy
{n,m} {n,m}? repeat(n..m) greedy/lazy
{n} {n}? repeat(n..n) greedy/lazy (does lazy make sense here?)
( ... ) group -> register
. match anything except newline
\1 .. \9 backreferences
\ escape
specialcase: illegal ranges is treated as they are just
ordinary literals.


License
=======

Ruby's license.


About
=====

AEditor needs a regexp engine. You probably think, why not
rely on an existing engine (for instance Ruby's regexp engine) ?
Existing engines are not flexible enough. The iterator pattern
provides that needed flexibility. Thus it should not matter
wheter the engine operate on: UCS-4 or UTF-8 or ASCII.

Goal is to build an engine which is fully compatible with Ruby's
regexp syntax, which can work with iterators.

Eventualy extend the regexp syntax, with some editor-stuff.
For instance: point where cursor should be placed,
match text which is legal ruby code, execute regexp within
retangular selection... etc. I am open to other suggestions.

Eventualy re-implement in C++ to gain performance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top