Utility to locate errors in regular expressions

M

Malte Forkel

Finding out why a regular expression does not match a given string can
very tedious. I would like to write a utility that identifies the
sub-expression causing the non-match. My idea is to use a parser to
create a tree representing the complete regular expression. Then I could
simplify the expression by dropping sub-expressions one by one from
right to left and from bottom to top until the remaining regex matches.
The last sub-expression dropped should be (part of) the problem.

As a first step, I am looking for a parser for Python regular
expressions, or a Python regex grammar to create a parser from.

But may be my idea is flawed? Or a similar (or better) tools already
exists? Any advice will be highly appreciated!

Malte
 
R

Roy Smith

Malte Forkel said:
Finding out why a regular expression does not match a given string can
very tedious. I would like to write a utility that identifies the
sub-expression causing the non-match. My idea is to use a parser to
create a tree representing the complete regular expression. Then I could
simplify the expression by dropping sub-expressions one by one from
right to left and from bottom to top until the remaining regex matches.
The last sub-expression dropped should be (part of) the problem.

As a first step, I am looking for a parser for Python regular
expressions, or a Python regex grammar to create a parser from.

But may be my idea is flawed? Or a similar (or better) tools already
exists? Any advice will be highly appreciated!

I think this would be a really cool tool. The debugging process I've
always used is essentially what you describe. I start try progressively
shorter sub-patterns until I get a match, then try to incrementally add
back little bits of the original pattern until it no longer matches.
With luck, the problem will become obvious at that point.

Having a tool which automated this would be really useful.

Of course, most of Python user community are wimps and shy away from big
hairy regexes [ducking and running].
 
N

Neil Cerutti

Of course, most of Python user community are wimps and shy away
from big hairy regexes [ducking and running].

I prefer the simple, lumbering regular expressions like those in
the original Night of the Regular Expressions. The fast, powerful
ones from programs like the remake of Dawn of the GREP, just
aren't as scary.
 
R

rusi

Finding out why a regular expression does not match a given string can
very tedious. I would like to write a utility that identifies the
sub-expression causing the non-match. My idea is to use a parser to
create a tree representing the complete regular expression. Then I could
simplify the expression by dropping sub-expressions one by one from
right to left and from bottom to top until the remaining regex matches.
The last sub-expression dropped should be (part of) the problem.

As a first step, I am looking for a parser for Python regular
expressions, or a Python regex grammar to create a parser from.

But may be my idea is flawed? Or a similar (or better) tools already
exists? Any advice will be highly appreciated!

Malte



python-specific: http://kodos.sourceforge.net/
Online: http://gskinner.com/RegExr/
emacs-specific: re-builder and regex-tool http://bc.tech.coop/blog/071103.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top