P~(ptilde) 0.9 released, new scripting language with novel regex

ptilderegex · Mar 14, 2008

P~ (pronounced "ptilde") is a new Java friendly scripting language.
The principle reason for creating it was to offer a new and more
powerful approach to creating regular expressions. Unlike all other
regex engines, P~ does not use the Perl-compatible metacharacter
syntax, instead using algebraic syntax for regex composition. This
decision opens the door to more powerful side-effects than even
possible in Perl, but preserving the readability and maintainability
of P~ regexes. In other regex engines, your regular expressions become
hard to read as the difficulty of the problem increases. Not so in P~.

While P~ makes it easy to grapple with matching and transformation
problems that are hard for even Perl programmers, its basic grammar is
Java-like, more so than even Groovy. This means that Java programmers
can quickly learn the basic grammar forms.

P~ is also Java friendly because you can import Java classes within
your scripts, and use their public apis just like in your Java code.
All you have to do is make sure that when you launch the Ptilde
scripting application, you include the appropriate Java libraries (jar
files) in the classpath.

Finally, P~ is Java friendly because its engine is a Java library.
Thus, if a Java programmer has a tough matching or transformation
problem, solve it first with a P~ script, using the standalone
application shell and the novel P~ regex grammars; then make this
script available to your Java application as either a file or a
resource, and easily invoke it from your Java class. You are allowed
to pass arguments and return a result from a scriptlet.

If this sounds interesting, take a look at the home page for the
documentation, which is found at http://www.ptilde.com. Start with the
Tutorial which will guide you through first the basic grammar of
Ptilde and then through the regex grammar forms.

Michele Dondi · Mar 14, 2008

Unlike all other
regex engines, P~ does not use the Perl-compatible metacharacter ^^^
^^^
syntax, instead using algebraic syntax for regex composition. [...]
problems that are hard for even Perl programmers, its basic grammar is
Java-like, more so than even Groovy. This means that Java programmers

^^^^^^^^^
^^^^^^^^^

So why do you think it may be of interest *here*?!?

Michele

brian d foy · Mar 14, 2008

ptilderegex said:
This
decision opens the door to more powerful side-effects than even
possible in Perl

That's a pretty bold statement considering that Perl can execute
arbitrary code from within a regular expression.

While P~ makes it easy to grapple with matching and transformation
problems that are hard for even Perl programmers,

Perhaps you can list a couple of examples. People in this newsgroup
love showing others how easy it is to get things done with regular
expressions.

Most of the examples I saw on your website have problems. For instance,
you have a transformation to strip C++ style comments:

function Pattern Comment1 ()
{
return strip("//" + *'[^\r\n]') + eoleof;
}

You fail to distinguish between code and string literals though.
There's an answerin perlfaq6 that addresses this. Your C style comment
stripper is similarly flawed.

In your examples page, you talk about stripping HTML tags, and say that
there isn't an equivalent Perl solution. I guess you missed the one in
perlfaq9.

Although your new language might be nice, don't ignorantly compare it
to any language you don't know.

ptilderegex · Mar 15, 2008

That's a pretty bold statement considering that Perl can execute
arbitrary code from within a regular expression.

Correct me if I'm wrong but don't Perl code assertions fire as
encountered by the NFA engine. This means that they could fire even
if the regex was ultimately not found to match. This was discussed on
the documentation site years ago when I looked, but I'm not up to
date. Anyway, this fire as you go approach is useful for debugging,
but not as powerful as a true "side-effect".

Perhaps you can list a couple of examples. People in this newsgroup
love showing others how easy it is to get things done with regular
expressions.

A simple example: lets say that in Perl you have a regex but you don't
know what it is. Its held in a string passed by some function and
needs to be a parameter. Now, you want to strip everything but what
matches each time. Or better yet, output what does match to one
stream, and output what doesn't match to another (in one pass).

Most of the examples I saw on your website have problems. For instance,
you have a transformation to strip C++ style comments:

function Pattern Comment1 ()
{
return strip("//" + *'[^\r\n]') + eoleof;

}

You fail to distinguish between code and string literals though.
There's an answerin perlfaq6 that addresses this. Your C style comment
stripper is similarly flawed.

The overall example does handle this. The base class parses string
literals and char literals (Java targets have no code literals), and
the base class includes these in its recognition units. Thus in the
subclass, you only need to polymorphically change the behavior of
match-a-comment, to match-and-strip-a-comment.

In your examples page, you talk about stripping HTML tags, and say that
there isn't an equivalent Perl solution. I guess you missed the one in
perlfaq9.

I'll check to see if this Perl solution is a single pass regex, and if
so, amend the comment! Thanks.

brian d foy · Mar 15, 2008

A simple example: lets say that in Perl you have a regex but you don't
know what it is. Its held in a string passed by some function and
needs to be a parameter. Now, you want to strip everything but what
matches each time. Or better yet, output what does match to one
stream, and output what doesn't match to another (in one pass).

It sounds like most of your problem has little to do with regular
expressions and more to do with I/O management.

while( <$fh> )
{
if( m/$regex/ ) { print $out "$`$'"; print $out2 $& }
else { print $out }
}

ptilderegex · Mar 16, 2008

It sounds like most of your problem has little to do with regular
expressions and more to do with I/O management.

while( <$fh> )
{
if( m/$regex/ ) { print $out "$`$'"; print $out2 $& }
else { print $out }
}

The point of Ptilde is that you can do these complex stream
transformations of any kind at all in one regex pass. What you've got
above is a while loop, not a single regex pass.

ptilderegex · Mar 18, 2008

The point ofPtildeis that you can do these complex stream

You say that as if it's a bad thing.

Not that the Perl solution given above is a bad thing, but we're just
saying that certain complex transformations can be done in P~ in a
single-pass regex, which can turn on/off or redirect transformation
output, or insert output at any nesting level of a subcomposition,
which is a feature that seems to add value to P~ relative to other NFA
regex engines that lack the equivalent of the Ptilde do-pattern.

P~ 0.9 released, new Java-friendly scripting with novel regex	2	Feb 27, 2008
P~(ptilde) released, new scripting language with novel regex	0	Mar 10, 2008
a Java-friendly regex scripting language (not Groovy)	0	Dec 18, 2007
FAQ 6.9 How can I quote a variable to use in a regex?	10	Apr 12, 2011
ANN: Tao Scripting Language 0.9.0 beta released!	0	Apr 25, 2005
FAQ 6.4 How do I match XML, HTML, or other nasty, ugly things with a regex?	0	Jan 27, 2011
Ruby's regex	2	Sep 11, 2010
Announcing new scripting/prototyping language	35	Feb 5, 2004

P~(ptilde) 0.9 released, new scripting language with novel regex

ptilderegex

Michele Dondi

brian d foy

ptilderegex

brian d foy

ptilderegex

ptilderegex

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads