Is this efficient way to validate phrase against collection of regular expressions

O

Ossie

Note the output below. Is this a reasonable way to determine if a
phrase is
found within a collection of patterns? Is there a better way to manage
the
validation of the phrase against a collection of patterns? My fear is
this
is pretty intense way of validating the pattern of the phrase.


[omoore@skipper tmp]$ java MyExample "STK PIPE" "*PART" "this should
not qualify for exlcusion" "AA1234567890"
============================================
phrase: STK PIPE
exlclude? true
============================================
phrase: *PART
exlclude? true
============================================
phrase: this should not qualify for exlcusion
exlclude? false
============================================
phrase: AA1234567890
exlclude? true
============================================
[omoore@skipper tmp]$ cat MyExample.java
import java.util.*;
import java.util.regex.*;

public class MyExample
{
public static void main(String [] args)
{
System.out.println("============================================");
for ( int i = 0 ; i < args.length ; i++ )
{
System.out.println(" phrase: " + args );
System.out.println("exlclude? " + isPhraseInExcludeList(args) );
System.out.println("============================================");
}
}

public static void initPatternVector()
{
m_ExcludedPhraseList = new Vector(0);

//starts with 'STK '
m_ExcludedPhraseList.add("STK .*");
//starts with '*'
m_ExcludedPhraseList.add("[*].*");
//containing 'FRT'
m_ExcludedPhraseList.add(".*FRT.*");
//starts 2 upper & 6 nbrs
m_ExcludedPhraseList.add("[A-Z]{2}[0-9]{6}.*");
}

public static boolean isPhraseInExcludeList(String phrase)
{
if ( m_ExcludedPhraseList == null)
{
initPatternVector();
}

for ( int i = 0 ; i < m_ExcludedPhraseList.capacity() ; i++ )
{
String pattern = (String) m_ExcludedPhraseList.get(i);
if( Pattern.matches( pattern , phrase ) )
{
return true;
}
}

return false;
}

public static Vector m_ExcludedPhraseList;
}
[omoore@skipper tmp]$
 
E

Eric Bodden

Once upon a time <[email protected]>,
Ossie said:
Is there a better way to manage the
validation of the phrase against a collection of patterns? My fear is
this is pretty intense way of validating the pattern of the phrase.
No, I think that's pretty much optimal. You could use an arraylist instead
of a vector though. Vector is pretty slow. The matching itself however is
the most expensive part and that overhead cannot be overcome anyway.

Eric

--
 
C

Chris Uppal

Ossie said:
Note the output below. Is this a reasonable way to determine if a
phrase is
found within a collection of patterns?

I'm afraid not. Two reasons:


First, this line:

if( Pattern.matches( pattern , phrase ) )


creates a new pattern matcher object for every string that you want to check.
Creating a pattern matcher is a relatively expensive operation (much more
expensive than using it to check a short string), so you should create
instances of java.util.Matcher as part of your settup, and then re-use them for
each input string you want to check.

Secondly, in theory (i.e. I have not intention of testing it), the time taken
to check whether a string matches a regexp is proportional to the length of the
string, independently of the complexity of the regexp[*]. So I'd expect that
it would be quicker to generate single regexp that was an alternation of all
your exclusion cases. That *should* execute as fast as any one of the separate
tests in your code.

I've never looked at Java's regexp implementation, but if the syntax is similar
to what I'm used to in other fields, then a single exclusion pattern like:

"(STK .*)|([*].*)|(.*FRT.*)|([A-Z]{2}[0-9]{6}.*)"

would do the trick.

-- chris

[*] That's to say that the theory of how to do it is well-known. However there
are tradeoffs of time vs. space and any particular regexp implementation may,
in order to save space, use an implementation that isn't constant-time in the
complexity of the regexp. I don't know what -- say -- Sun's implementation
does, but I'd *hope* that it is constant-time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top