Prefixes of regular expressions

S

S J Kissane

Hi all

I was thinking about regular expressions, in the context of
syntax checking in user interfaces.

Example use case: I have a form field, with a regex to
determine if its contents is valid. The user starts typing it
in... if its valid, the field goes green. If its invalid, but by
typing more they could make it valid, the field goes yellow.
If its invalid, and they cannot make it valid by typing more,
only by removing characters they've already typed, it
goes red.

Suppose I have a regular expression defined like this: [0-9A-F]{8}
Now, suppose the user has typed: "09AB"
We can see, although that string does not match the regular
expression,
it could match if the user added to it appropriately.

Comparatively, suppose they typed: "09AG"
We can see, that no matter what they possibly add,
it can never be made to match the regular expression;
the only way of making it match is to remove characters.

We might say that, although the string does not match the
regular expression, it is a "valid prefix" of the regular expression.

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

I suppose, if I wrote my own regular expression library
(or even just started with Sun's one and hacked it),
I could make this work... but I don't want to do that.

Thanks
Simon
 
A

Arved Sandstrom

S J Kissane said:
Hi all

I was thinking about regular expressions, in the context of
syntax checking in user interfaces.

Example use case: I have a form field, with a regex to
determine if its contents is valid. The user starts typing it
in... if its valid, the field goes green. If its invalid, but by
typing more they could make it valid, the field goes yellow.
If its invalid, and they cannot make it valid by typing more,
only by removing characters they've already typed, it
goes red.

Suppose I have a regular expression defined like this: [0-9A-F]{8}
Now, suppose the user has typed: "09AB"
We can see, although that string does not match the regular
expression,
it could match if the user added to it appropriately.

Comparatively, suppose they typed: "09AG"
We can see, that no matter what they possibly add,
it can never be made to match the regular expression;
the only way of making it match is to remove characters.

We might say that, although the string does not match the
regular expression, it is a "valid prefix" of the regular expression.

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

I suppose, if I wrote my own regular expression library
(or even just started with Sun's one and hacked it),
I could make this work... but I don't want to do that.

Thanks
Simon

Define two regular expressions: one for valid (green), and one for valid
prefix (yellow). Match the supplied string against the first; if matched,
display green. If no match, match against the second. If matched, display
yellow; if no match, display red. You lose nothing in responsiveness with
two regular expressions - this is a user-typed form field after all. And the
logic becomes more clear.

For some situations you could probably use groupCount() on the Matcher
object, with capturing groups, to distinguish between a valid prefix and a
valid complete string.

AHS
 
S

S J Kissane

Define two regular expressions: one for valid (green), and one for valid
prefix (yellow). Match the supplied string against the first; if matched,
display green. If no match, match against the second. If matched, display
yellow; if no match, display red. You lose nothing in responsiveness with
two regular expressions - this is a user-typed form field after all. And the
logic becomes more clear.

For some situations you could probably use groupCount() on the Matcher
object, with capturing groups, to distinguish between a valid prefix and a
valid complete string.

AHS
Indeed, such an approach would work. But, logically speaking, I only
need
one regular expression to do this, not two. And by using two, I need
to manually
construct the prefix regex based on the whole string regex, when
logically
the former can be derived from the latter.

Maybe its time for a trip to bugs.sun.com... Who knows, I might see
the functionality
I'm after in J2SE 12.0 :)

Simon
 
R

Roedy Green

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.

the brute force approach is to have a different regex for each length
of string.

Back in the dayso of Java 1.0 I invented a FormattedTextField that
handled a variety of patters, where you described each slot with a
character code.
e.g. 9 numeric A caps a- lower case ...
I had "humps" where you can have decorative punctuation appear e.g.
(604) 871-1166 that you don't key, can't change and is not part of the
final data field.
 
R

Roedy Green

the brute force approach is to have a different regex for each length
of string.

If you look at those regexes,, you may be able to create a single
regex that will work for more than one length. e.g.. that ended with
[0-9]* to reduce the total number of them you require.
 
D

David Segall

S J Kissane said:
Suppose I have a regular expression defined like this: [0-9A-F]{8}
Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression? I have looked at the java.util.regex.Matcher
API in Java SE 6, and I can't see a way of doing this.
I can see that your putative changes to Matcher provides an elegant
solution to your problem but they would require changing some return
values from boolean to something containing more information. Rather
than altering Java's method(s) or having multiple regular expressions
to test for your three return values perhaps you could append a valid
string of the appropriate length to the input as a second test. In
your example, this approach is worse than using a second regular
expression to check for a valid prefix but it may provide an easier
general solution.
 
A

Arved Sandstrom

Define two regular expressions: one for valid (green), and one for valid
prefix (yellow). Match the supplied string against the first; if matched,
display green. If no match, match against the second. If matched, display
yellow; if no match, display red. You lose nothing in responsiveness with
two regular expressions - this is a user-typed form field after all. And
the
logic becomes more clear.

For some situations you could probably use groupCount() on the Matcher
object, with capturing groups, to distinguish between a valid prefix and a
valid complete string.

AHS
Indeed, such an approach would work. But, logically speaking, I only
need
one regular expression to do this, not two. And by using two, I need
to manually
construct the prefix regex based on the whole string regex, when
logically
the former can be derived from the latter.
[ SNIP ]

I really don't see you avoiding some non-RE conditional logic at some point.
If you're not so keen on 2 separate regular expressions, there is always:

Pattern p = Pattern.compile("([0-9A-F]{1,8})");
Matcher m = p.matcher(stringToMatch);

if (m.matches()) {
int matchLen = m.group(1).length();
if (matchLen < 8) {
// do "yellow" stuff
} else if (matchLen == 8) {
// do "green" stuff
}
} else {
// do "red" stuff
}

AHS
 
L

Lasse Reichstein Nielsen

S J Kissane said:
We might say that, although the string does not match the
regular expression, it is a "valid prefix" of the regular expression.

Now, the question is, given a regular expression and a string,
how in Java can I determine if the string is a valid prefix of
the regular expression?

You can't. The Java RegExp library doesn't provide support for
what you need.
I suppose, if I wrote my own regular expression library
(or even just started with Sun's one and hacked it),
I could make this work... but I don't want to do that.

You could start out with an existing alternative RegExp
library. Perhaps <URLhttp://www.brics.dk/automaton/>, which
is not a traditional RegExp library, but is closer related
to the Comp.Sci. notions of regular languages and finite
automatons.
However, it does have a prefix operation on automatons:
<URL:http://www.brics.dk/automaton/doc/d...tml#prefixClose(dk.brics.automaton.Automaton)>

Good luck.
/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top