multiple pattern replacement using regular expressions

Discussion in 'Java' started by Jarkko Viinamäki, Feb 20, 2004.

  1. Hello fellow Java-coders,

    I needed to write an efficient pattern replacement class which could use
    regular expressions and since I managed to get some kind of version done, I
    thought I'd share it. This tool is very useful in many applications. For
    instance web-based bulletin board systems usually need to do lots of
    replacements to remove obscene words and to convert different custom tags to
    HTML code.

    If you have improvement ideas or notice a bug, please reply to this thread.
    Thank you!

    PS. If anyone knows an efficient way to do this same thing with streams,
    I'll be eager to learn your ideas. The main problem with streams is that you
    basically cannot read the stream N times (N being the number of patterns)
    but rather you should do it with one pass. For instance this example code
    uses N pass technique since Strings aren't usually that long (< 65kb).

    Jarkko

    ----


    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    import java.util.*;

    /**
    * Replaces multiple patterns within given string based on regular
    expressions!
    * <p>
    * Enhanced memory management - doesn't create temporary objects and uses
    only two StringBuffers
    * which can be reused in subsequent calls.
    * <p>
    * This class is thread safe.
    *
    * @see <a
    href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html">
    java.util.regex.Matcher</a>
    * @see <a
    href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html">
    java.util.regex.Pattern</a>
    *
    * @since JDK 1.4
    * @version $Id: PatternReplacer.java,v 1.3 2004/02/20 15:44:03 viinajar Exp
    $
    * @author Jarkko Viinamäki [jviinama at cc.hut.fi]
    */
    public class PatternReplacer
    {
    private final StringBuffer _inputBuffer = new StringBuffer();
    private final StringBuffer _workBuffer = new StringBuffer();
    private HashMap _replaceMap = new HashMap();
    private boolean _caseSensitive = true;

    /**
    * Constructor
    */
    public PatternReplacer()
    {
    }

    /**
    * @param is if false, matching is case-insensitive
    */
    public void setCaseSensitive(boolean is)
    {
    _caseSensitive = is;
    }

    /**
    * @param pattern a simple string or a regular expression
    * @param replacement String to be inserted into matching position. Note
    that you can use $0 to
    * refer to the matching text.
    */
    public synchronized void addPattern(String pattern, String replacement)
    {
    Pattern p;
    // precompile the pattern
    if( _caseSensitive == false )
    p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
    else
    p = Pattern.compile(pattern);
    _replaceMap.put(p, replacement);
    }

    /**
    * Processes given string and replaces all matching patterns.
    *
    * @param input string to be transformed
    * @return resulting string
    */
    public synchronized String replace(String input)
    {
    Iterator it = _replaceMap.keySet().iterator();

    _inputBuffer.setLength(0);
    _inputBuffer.append(input);

    _workBuffer.setLength(0);
    _workBuffer.ensureCapacity(input.length());

    while(it.hasNext())
    {
    Pattern pattern = (Pattern)it.next();
    replaceInto(pattern, (String)_replaceMap.get(pattern), _inputBuffer,
    _workBuffer);
    _inputBuffer.setLength(0);
    _inputBuffer.append(_workBuffer);
    }
    return(_inputBuffer.toString());
    }

    /**
    * Performs replacement for given pattern
    *
    * @param pattern pattern to look for
    * @param replacement string to use instead
    * @param input current status of the input string
    * @param work temporary buffer for writing the results
    */
    private void replaceInto(Pattern pattern, String replacement, StringBuffer
    input, StringBuffer work)
    {
    work.ensureCapacity(input.length());
    work.setLength(0);
    Matcher m = pattern.matcher(input);

    int end = 0;
    while (m.find())
    {
    m.appendReplacement(work, replacement);
    end = m.end();
    }
    // we could call substring(int) but that would create a new String object
    // and we want to avoid memory allocation. Looping here is very fast
    anyway
    for(int i = end, is = input.length(); i < is; i++)
    work.append(input.charAt(i));
    }

    public static void main(String args[])
    {
    String s = "The quick brown fox jumped over the www.hut.fi lazy dog's
    back.";

    PatternReplacer ms = new PatternReplacer();
    ms.addPattern("quick", "slow");
    ms.addPattern("jump", "walk");
    ms.addPattern("lazy", "hard working");
    ms.addPattern("brown", "red");
    ms.addPattern("www.[^ ]+", "<a href=\"http://$0\">");
    System.out.println(ms.replace(s));
    }
    }
     
    Jarkko Viinamäki, Feb 20, 2004
    #1
    1. Advertising

  2. Jarkko Viinamäki wrote:
    > Hello fellow Java-coders,
    >
    > I needed to write an efficient pattern replacement class which could use
    > regular expressions and since I managed to get some kind of version done, I
    > thought I'd share it. This tool is very useful in many applications. For
    > instance web-based bulletin board systems usually need to do lots of
    > replacements to remove obscene words and to convert different custom tags to
    > HTML code.
    >
    > If you have improvement ideas or notice a bug, please reply to this thread.
    > Thank you!
    >
    > PS. If anyone knows an efficient way to do this same thing with streams,
    > I'll be eager to learn your ideas. The main problem with streams is that you
    > basically cannot read the stream N times (N being the number of patterns)
    > but rather you should do it with one pass. For instance this example code
    > uses N pass technique since Strings aren't usually that long (< 65kb).


    This is actually the source of a big bug in the class. You must replace
    everything in one pass. For example, this results in the wrong behaviour:

    String s = "The quick brown fox jumped over the www.hut.fi lazy dog's
    back.";

    PatternReplacer ms = new PatternReplacer();
    ms.addPattern("fox", "dog");
    ms.addPattern("dog", "fox");
    System.out.println(ms.replace(s));

    --
    Daniel Sjöblom
     
    =?ISO-8859-1?Q?Daniel_Sj=F6blom?=, Feb 22, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    619
    Jay Douglas
    Aug 15, 2003
  2. Jeff
    Replies:
    1
    Views:
    1,278
    Joris Gillis
    Feb 25, 2005
  3. Replies:
    1
    Views:
    416
    Frank Schmitt
    Dec 15, 2003
  4. Vibha Tripathi
    Replies:
    3
    Views:
    2,247
    George Sakkis
    Jul 5, 2005
  5. Noman Shapiro
    Replies:
    0
    Views:
    240
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page