Convert regex -> wildcards

C

Chris

I'd like to do string matching using wildcards (*, ?). I'm thinking the
easiest way to do it is to convert a pattern with wildcards to regex syntax,
and then use the new regex methods in 1.4 to do the match.

Does anyone know of some code to do the conversion?
 
J

Jarkko Viinamäki

Try this:

/**
* Converts wildcard expression to regular expression. In wildcard-format,
* = 0-N
* characters and ? = any one character. Wildcards can be used easily with
JDK 1.4 by
* converting them to regexps:
* <p>
* <code>
* import java.util.regex.Pattern;
*
* Pattern p = Pattern.compile(wildcardToRegexp("*.jpg"));
* </code>
*
* @param wildcard wildcard expression string
* @return given wildcard expression as regular expression
*/
public String wildcardToRegexp(String wildcard)
{
StringBuffer s = new StringBuffer(wildcard.length());
s.append('^');
for(int i = 0, is = wildcard.length(); i < is; i++)
{
char c = wildcard.charAt(i);
switch(c)
{
case '*':
s.append('.');
s.append('*');
break;
case '?':
s.append('.');
break;
// escape special regexp-characters
case '(': case ')': case '[': case ']': case '$':
case '^': case '.': case '{': case '}': case '|':
case '\\':
s.append('\\');
s.append(c);
break;
default:
s.append(c);
break;
}
}
s.append('$');
return(s.toString());
}

HTH,
Jarkko
 
H

hiwa

Chris said:
I'd like to do string matching using wildcards (*, ?). I'm thinking the
easiest way to do it is to convert a pattern with wildcards to regex syntax,
and then use the new regex methods in 1.4 to do the match.

Does anyone know of some code to do the conversion?
My humble routine:
Code:
/* handles * and ? only */
public String wild2regx(String wstr){
char c;
StringBuffer sb = new StringBuffer();

for(int i = 0; i < wstr.length(); ++i){
c = wstr.charAt(i);
if (c == '*'){
sb.append(".*");
}
else if (c == '?'){
sb.append(".");
}
else if (c == '.'){
sb.append("\\.");
}
else{
sb.append(c);
}
}
return(new String(sb));
}
 
T

Thomas Schodt

Ike said:
Does anyone have it going the other way, i.e. regex to wildcards too? -Ike

A wildcard pattern can always be expressed as a regex.
Only a very limited set of regex'es can be expressed as wildcard patterns.
 
S

skeptic

Chris said:
I'd like to do string matching using wildcards (*, ?). I'm thinking the
easiest way to do it is to convert a pattern with wildcards to regex syntax,
and then use the new regex methods in 1.4 to do the match.

Does anyone know of some code to do the conversion?

Here is excerpt from
http://jregex.sourceforge.net/api/jregex/WildcardPattern.html 's
source:

public class WildcardPattern extends Pattern{

//a wildcard class, see WildcardPattern(String,String,int)
public static final String WORD_CHAR="\\w";

//a wildcard class, see WildcardPattern(String,String,int)
public static final String ANY_CHAR=".";

private static final String defaultSpecials="[]().{}+|^$\\";
private static final String defaultWcClass=ANY_CHAR;
protected static String convertSpecials(String s,String
wcClass,String specials){
int len=s.length();
StringBuffer sb=new StringBuffer();
for(int i=0;i<len;i++){
char c=s.charAt(i);
switch(c){
case '*':
sb.append("(");
sb.append(wcClass);
sb.append("*)");
break;
case '?':
sb.append("(");
sb.append(wcClass);
sb.append(")");
break;
default:
if(specials.indexOf(c)>=0) sb.append('\\');
sb.append(c);
}
}
return sb.toString();
}

private String str;

/**
* @param wc The pattern
*/
public WildcardPattern(String wc){
this(wc,true);
}

/**
* @param wc The pattern
* @param icase If true, the pattern is case-insensitive.
*/
public WildcardPattern(String wc,boolean icase){
this(wc,icase? DEFAULT|IGNORE_CASE: DEFAULT);
}


/**
* @param wc The pattern
* @param flags The bitwise OR of any of REFlags.* . The only
meaningful
* flags are REFlags.IGNORE_CASE and REFlags.DOTALL (the latter
allows
* the wildcards to match the EOL characters).
*/
public WildcardPattern(String wc,int flags){
compile(wc,defaultWcClass,defaultSpecials,flags);
}

/**
* @param wc The pattern
* @param wcClass The wildcard class, could be any of WORD_CHAR or
ANY_CHAR
* @param flags The bitwise OR of any of REFlags.* . The only
meaningful
* flags are REFlags.IGNORE_CASE and REFlags.DOTALL (the latter
allows
* the wildcards to match the EOL characters).
*/
public WildcardPattern(String wc,String wcClass,int flags){
compile(wc,wcClass,defaultSpecials,flags);
}

protected void compile(String wc,String wcClass,String specials,int
flags){
String converted=convertSpecials(wc,wcClass,specials);
try{
compile(converted,flags);
}
catch(PatternSyntaxException e){
//something unexpected
throw new Error(e.getMessage()+"; original expr: "+wc+",
converted: "+converted);
}
str=wc;
}
.....

Adapting it to java.util.regex is a piece of cake.
Hope this helps.

Regards
 
C

Chris

Jarkko Viinamäki said:
Try this:

/**
* Converts wildcard expression to regular expression. In wildcard-format,
* = 0-N
* characters and ? = any one character.

< snip >

Thanks. I think this is the most elegant solution.

I made a couple of small mods:

-- added "case '+':" as a special char
-- made the opening and closing ^ and $ optional, in case you're not
matching a whole line.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top