Replace (remove) multiple chars from string.

Discussion in 'Java' started by Fredrik, Oct 13, 2004.

  1. Fredrik

    Fredrik Guest

    Im trying to remove unwanted chars from a string in order to make it
    XML-compatible. The chars that should be avoided are:

    [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
    [#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
    [#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
    [#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
    [#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
    [#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
    [#10FFFE-#x10FFFF].

    (http://www.w3.org/TR/REC-xml/#charsets)

    I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
    going to process rather large strings, Im looking for the most
    efficent replace-algorithm.
    All suggestions are much appreciated.

    (a perhaps related question: How can I use loop-generated \u escape
    codes? ("\u00" + i) naturally results in a compiler error message,
    since the escape code is incomplete)
     
    Fredrik, Oct 13, 2004
    #1
    1. Advertisements

  2. A loop with a carefully crafted set of tests, lookup tables etc. to
    verify each characters. And a StringBuilder or pre-allocated char[] to
    place the valid characters into.

    For the tests I would give the binary representation of the char codes
    in the illegal ranges a good look. Maybe you can find bit patterns to
    test for one or more ranges with simple binary operations.
    Why do you want to? Just cast the int to a char.

    /Thomas
     
    Thomas Weidenfeller, Oct 13, 2004
    #2
    1. Advertisements

  3. Fredrik

    Fredrik Guest

    Char cast, why didn't i think of that... :)

    I tried a set of nested if-else with consideration to char-frequency,
    and it seems to work OK.

    Thanks!
     
    Fredrik, Oct 14, 2004
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.