multiple pattern replacement using regular expressions

J

Jarkko Viinamäki

Hello fellow Java-coders,

I needed to write an efficient pattern replacement class which could use
regular expressions and since I managed to get some kind of version done, I
thought I'd share it. This tool is very useful in many applications. For
instance web-based bulletin board systems usually need to do lots of
replacements to remove obscene words and to convert different custom tags to
HTML code.

If you have improvement ideas or notice a bug, please reply to this thread.
Thank you!

PS. If anyone knows an efficient way to do this same thing with streams,
I'll be eager to learn your ideas. The main problem with streams is that you
basically cannot read the stream N times (N being the number of patterns)
but rather you should do it with one pass. For instance this example code
uses N pass technique since Strings aren't usually that long (< 65kb).

Jarkko

----


import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.*;

/**
* Replaces multiple patterns within given string based on regular
expressions!
* <p>
* Enhanced memory management - doesn't create temporary objects and uses
only two StringBuffers
* which can be reused in subsequent calls.
* <p>
* This class is thread safe.
*
* @see <a
href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html">
java.util.regex.Matcher</a>
* @see <a
href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html">
java.util.regex.Pattern</a>
*
* @since JDK 1.4
* @version $Id: PatternReplacer.java,v 1.3 2004/02/20 15:44:03 viinajar Exp
$
* @author Jarkko Viinamäki [jviinama at cc.hut.fi]
*/
public class PatternReplacer
{
private final StringBuffer _inputBuffer = new StringBuffer();
private final StringBuffer _workBuffer = new StringBuffer();
private HashMap _replaceMap = new HashMap();
private boolean _caseSensitive = true;

/**
* Constructor
*/
public PatternReplacer()
{
}

/**
* @param is if false, matching is case-insensitive
*/
public void setCaseSensitive(boolean is)
{
_caseSensitive = is;
}

/**
* @param pattern a simple string or a regular expression
* @param replacement String to be inserted into matching position. Note
that you can use $0 to
* refer to the matching text.
*/
public synchronized void addPattern(String pattern, String replacement)
{
Pattern p;
// precompile the pattern
if( _caseSensitive == false )
p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
else
p = Pattern.compile(pattern);
_replaceMap.put(p, replacement);
}

/**
* Processes given string and replaces all matching patterns.
*
* @param input string to be transformed
* @return resulting string
*/
public synchronized String replace(String input)
{
Iterator it = _replaceMap.keySet().iterator();

_inputBuffer.setLength(0);
_inputBuffer.append(input);

_workBuffer.setLength(0);
_workBuffer.ensureCapacity(input.length());

while(it.hasNext())
{
Pattern pattern = (Pattern)it.next();
replaceInto(pattern, (String)_replaceMap.get(pattern), _inputBuffer,
_workBuffer);
_inputBuffer.setLength(0);
_inputBuffer.append(_workBuffer);
}
return(_inputBuffer.toString());
}

/**
* Performs replacement for given pattern
*
* @param pattern pattern to look for
* @param replacement string to use instead
* @param input current status of the input string
* @param work temporary buffer for writing the results
*/
private void replaceInto(Pattern pattern, String replacement, StringBuffer
input, StringBuffer work)
{
work.ensureCapacity(input.length());
work.setLength(0);
Matcher m = pattern.matcher(input);

int end = 0;
while (m.find())
{
m.appendReplacement(work, replacement);
end = m.end();
}
// we could call substring(int) but that would create a new String object
// and we want to avoid memory allocation. Looping here is very fast
anyway
for(int i = end, is = input.length(); i < is; i++)
work.append(input.charAt(i));
}

public static void main(String args[])
{
String s = "The quick brown fox jumped over the www.hut.fi lazy dog's
back.";

PatternReplacer ms = new PatternReplacer();
ms.addPattern("quick", "slow");
ms.addPattern("jump", "walk");
ms.addPattern("lazy", "hard working");
ms.addPattern("brown", "red");
ms.addPattern("www.[^ ]+", "<a href=\"http://$0\">");
System.out.println(ms.replace(s));
}
}
 
?

=?ISO-8859-1?Q?Daniel_Sj=F6blom?=

Jarkko said:
Hello fellow Java-coders,

I needed to write an efficient pattern replacement class which could use
regular expressions and since I managed to get some kind of version done, I
thought I'd share it. This tool is very useful in many applications. For
instance web-based bulletin board systems usually need to do lots of
replacements to remove obscene words and to convert different custom tags to
HTML code.

If you have improvement ideas or notice a bug, please reply to this thread.
Thank you!

PS. If anyone knows an efficient way to do this same thing with streams,
I'll be eager to learn your ideas. The main problem with streams is that you
basically cannot read the stream N times (N being the number of patterns)
but rather you should do it with one pass. For instance this example code
uses N pass technique since Strings aren't usually that long (< 65kb).

This is actually the source of a big bug in the class. You must replace
everything in one pass. For example, this results in the wrong behaviour:

String s = "The quick brown fox jumped over the www.hut.fi lazy dog's
back.";

PatternReplacer ms = new PatternReplacer();
ms.addPattern("fox", "dog");
ms.addPattern("dog", "fox");
System.out.println(ms.replace(s));
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top