Debugging regexes


R

Roedy Green

Regexes are black box. If they fail, you have no idea where.

It would be nice if Matcher had some debugging methods to tell you the
offset and length of the best/longest match it was able to make, even
if it did not completely match.
 
Ad

Advertisements

R

Roedy Green

Roedy Green said:
Regexes are black box. If they fail, you have no idea where.

It would be nice if Matcher had some debugging methods to tell you the
offset and length of the best/longest match it was able to make, even
if it did not completely match.

If your regex is so complicated that you can't find the error after a
few minutes search with the Mark I eyeball, you should break it up
into multiple passes or rewrite your parsing code to not use a regex.

For long regexes, it can be a good idea to format it across multiple
lines with comments:

Pattern exampleRegex =
Pattern.compile( "^\\s*" // Start of string, optional whitespace
+ "(19[5-9][0-9]|20[0-9][0-9])" // G1: four digit year, from 1950 to 2099.
+ "-(0[1-9]|1[012])" // '-' divisor, G2: month, 01 to 12
+ ":((\\w+\\s?)*)" // ':' divisor, G3: one or more words, separated by a
// single whitespace
+ ":)(.+)?$" // Optional: ':' divisor, G6: Remaining contents of line
);
Those are good tips. I often define named constants and build my
regexes out of trusted bits.

My two most common errors are:
forgetting to quote something like - . ( ) $
using \\ when I should have used \ and vice versa. Sometimes I build
regexes in files, sometimes in Java source.

Maybe I should get on with writing the various proofreading tools I
imagined.

http://mindprod.com/project/regexcomposer.html
http://mindprod.com/project/regexdebugger.html
http://mindprod.com/project/regexproofreader.html
http://mindprod.com/project/regexutility.html
 
M

markspace

Those are good tips. I often define named constants and build my
regexes out of trusted bits.

My two most common errors are:
forgetting to quote something like - . ( ) $
using \\ when I should have used \ and vice versa. Sometimes I build
regexes in files, sometimes in Java source.


A new Java 8 annotation checks regex syntax:

@Regex – Provides compile-time verification that a String intended to be
used as a regular expression is a properly formatted regular expression


https://blogs.oracle.com/java-platform-group/entry/java_8_s_new_type
 
G

Gene Wirchenko

Regexes are black box. If they fail, you have no idea where.

True. For that reason, I often use a finite state automaton. A
regex either succeeds or fails. I prefer to pick up more information.

When I use regexes, I keep them simple, because debugging one can
be a real bear. I will use short regexes with glue code rather than
one monster.
It would be nice if Matcher had some debugging methods to tell you the
offset and length of the best/longest match it was able to make, even
if it did not completely match.

I wish that there were not so many varieties regex languages.

Sincerely,

Gene Wirchenko
 
Ad

Advertisements

R

Roedy Green

I wish that there were not so many varieties regex languages

Amen brother.

I use three different schemes every day:
1. Java
2. Funduc search replace
3. Visual Slick edit Unix

It is so hard to remember six different sets of reserved characters,
just to get started. I wrote a little tool to help
http://mindprod.com/applet/quoter.html

It takes any string and treats it as data and quotes it
by the three different schemes, search or replace.
and shows it raw, in a Java String, in a CSV file.

I was browsing http://regexbuddy.com

Their products supports many different schemes. I asked about the
three I use. They did not respond.

I have written a specialized CSV-based regex search/replace. I like
working with Java regexes best since they are rich in features, and
well documented.

It takes lines of the form

fromString, toString, file1, file2, file3...

At some point I will teach itn wildcards and negatives.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top