Prefixes of regular expressions

S J Kissane · Jun 29, 2008

Hi all

I was thinking about regular expressions, in the context of
syntax checking in user interfaces.

Example use case: I have a form field, with a regex to
determine if its contents is valid. The user starts typing it
in... if its valid, the field goes green. If its invalid, but by
typing more they could make it valid, the field goes yellow.
If its invalid, and they cannot make it valid by typing more,
only by removing characters they've already typed, it
goes red.

Suppose I have a regular expression defined like this: [0-9A-F]{8}
Now, suppose the user has typed: "09AB"
We can see, although that string does not match the regular
expression,
it could match if the user added to it appropriately.

Comparatively, suppose they typed: "09AG"
We can see, that no matter what they possibly add,
it can never be made to match the regular expression;
the only way of making it match is to remove characters.

We might say that, although the string does not match the
regular expression, it is a "valid prefix" of the regular expression.

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

I suppose, if I wrote my own regular expression library
(or even just started with Sun's one and hacked it),
I could make this work... but I don't want to do that.

Thanks
Simon

Arved Sandstrom · Jun 29, 2008

S J Kissane said:
Hi all

I was thinking about regular expressions, in the context of
syntax checking in user interfaces.

Example use case: I have a form field, with a regex to
determine if its contents is valid. The user starts typing it
in... if its valid, the field goes green. If its invalid, but by
typing more they could make it valid, the field goes yellow.
If its invalid, and they cannot make it valid by typing more,
only by removing characters they've already typed, it
goes red.

Suppose I have a regular expression defined like this: [0-9A-F]{8}
Now, suppose the user has typed: "09AB"
We can see, although that string does not match the regular
expression,
it could match if the user added to it appropriately.

Comparatively, suppose they typed: "09AG"
We can see, that no matter what they possibly add,
it can never be made to match the regular expression;
the only way of making it match is to remove characters.

We might say that, although the string does not match the
regular expression, it is a "valid prefix" of the regular expression.

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

I suppose, if I wrote my own regular expression library
(or even just started with Sun's one and hacked it),
I could make this work... but I don't want to do that.

Thanks
Simon

Define two regular expressions: one for valid (green), and one for valid
prefix (yellow). Match the supplied string against the first; if matched,
display green. If no match, match against the second. If matched, display
yellow; if no match, display red. You lose nothing in responsiveness with
two regular expressions - this is a user-typed form field after all. And the
logic becomes more clear.

For some situations you could probably use groupCount() on the Matcher
object, with capturing groups, to distinguish between a valid prefix and a
valid complete string.

AHS

S J Kissane · Jun 29, 2008

Define two regular expressions: one for valid (green), and one for valid
prefix (yellow). Match the supplied string against the first; if matched,
display green. If no match, match against the second. If matched, display
yellow; if no match, display red. You lose nothing in responsiveness with
two regular expressions - this is a user-typed form field after all. And the
logic becomes more clear.

For some situations you could probably use groupCount() on the Matcher
object, with capturing groups, to distinguish between a valid prefix and a
valid complete string.

AHS

Indeed, such an approach would work. But, logically speaking, I only
need
one regular expression to do this, not two. And by using two, I need
to manually
construct the prefix regex based on the whole string regex, when
logically
the former can be derived from the latter.

Maybe its time for a trip to bugs.sun.com... Who knows, I might see
the functionality
I'm after in J2SE 12.0

Simon

Roedy Green · Jun 30, 2008

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

the brute force approach is to have a different regex for each length
of string.

Back in the dayso of Java 1.0 I invented a FormattedTextField that
handled a variety of patters, where you described each slot with a
character code.
e.g. 9 numeric A caps a- lower case ...
I had "humps" where you can have decorative punctuation appear e.g.
(604) 871-1166 that you don't key, can't change and is not part of the
final data field.

Roedy Green · Jun 30, 2008

the brute force approach is to have a different regex for each length
of string.

If you look at those regexes,, you may be able to create a single
regex that will work for more than one length. e.g.. that ended with
[0-9]* to reduce the total number of them you require.

David Segall · Jun 30, 2008

S J Kissane said:
Suppose I have a regular expression defined like this: [0-9A-F]{8}
Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

I can see that your putative changes to Matcher provides an elegant
solution to your problem but they would require changing some return
values from boolean to something containing more information. Rather
than altering Java's method(s) or having multiple regular expressions
to test for your three return values perhaps you could append a valid
string of the appropriate length to the input as a second test. In
your example, this approach is worse than using a second regular
expression to check for a valid prefix but it may provide an easier
general solution.

Arved Sandstrom · Jul 1, 2008

Define two regular expressions: one for valid (green), and one for valid
prefix (yellow). Match the supplied string against the first; if matched,
display green. If no match, match against the second. If matched, display
yellow; if no match, display red. You lose nothing in responsiveness with
two regular expressions - this is a user-typed form field after all. And
the
logic becomes more clear.

For some situations you could probably use groupCount() on the Matcher
object, with capturing groups, to distinguish between a valid prefix and a
valid complete string.

AHS

Indeed, such an approach would work. But, logically speaking, I only
need
one regular expression to do this, not two. And by using two, I need
to manually
construct the prefix regex based on the whole string regex, when
logically
the former can be derived from the latter.
[ SNIP ]

I really don't see you avoiding some non-RE conditional logic at some point.
If you're not so keen on 2 separate regular expressions, there is always:

Pattern p = Pattern.compile("([0-9A-F]{1,8})");
Matcher m = p.matcher(stringToMatch);

if (m.matches()) {
int matchLen = m.group(1).length();
if (matchLen < 8) {
// do "yellow" stuff
} else if (matchLen == 8) {
// do "green" stuff
}
} else {
// do "red" stuff
}

AHS

Lasse Reichstein Nielsen · Jul 1, 2008

S J Kissane said:
We might say that, although the string does not match the
regular expression, it is a "valid prefix" of the regular expression.

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression?

You can't. The Java RegExp library doesn't provide support for
what you need.

I suppose, if I wrote my own regular expression library
(or even just started with Sun's one and hacked it),
I could make this work... but I don't want to do that.

You could start out with an existing alternative RegExp
library. Perhaps <URLhttp://www.brics.dk/automaton/>, which
is not a traditional RegExp library, but is closer related
to the Comp.Sci. notions of regular languages and finite
automatons.
However, it does have a prefix operation on automatons:
<URL:http://www.brics.dk/automaton/doc/d...tml#prefixClose(dk.brics.automaton.Automaton)>

Good luck.
/L

The power of regular expressions without regular expressions.	0	Jul 17, 2013
JavaScript Challenge: Validating Email Addresses	1	Oct 6, 2023
Utility to locate errors in regular expressions	3	May 24, 2013
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
regular expressions	9	Aug 6, 2007
Password check with regular expressions	19	Feb 11, 2009
Regular Expressions -- Backtracking?	15	Oct 2, 2010
FAQ 6.17 How do I efficiently match many regular expressions at once?	0	Apr 28, 2011

Prefixes of regular expressions

S J Kissane

Arved Sandstrom

S J Kissane

Roedy Green

Roedy Green

David Segall

Arved Sandstrom

Lasse Reichstein Nielsen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads