Replace (remove) multiple chars from string.

Discussion in 'Java' started by Fredrik, Oct 13, 2004.

  1. Fredrik

    Fredrik Guest

    Im trying to remove unwanted chars from a string in order to make it
    XML-compatible. The chars that should be avoided are:

    [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
    [#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
    [#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
    [#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
    [#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
    [#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
    [#10FFFE-#x10FFFF].

    (http://www.w3.org/TR/REC-xml/#charsets)

    I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
    going to process rather large strings, Im looking for the most
    efficent replace-algorithm.
    All suggestions are much appreciated.

    (a perhaps related question: How can I use loop-generated \u escape
    codes? ("\u00" + i) naturally results in a compiler error message,
    since the escape code is incomplete)
     
    Fredrik, Oct 13, 2004
    #1
    1. Advertising

  2. Fredrik wrote:
    > I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
    > going to process rather large strings, Im looking for the most
    > efficent replace-algorithm.


    A loop with a carefully crafted set of tests, lookup tables etc. to
    verify each characters. And a StringBuilder or pre-allocated char[] to
    place the valid characters into.

    For the tests I would give the binary representation of the char codes
    in the illegal ranges a good look. Maybe you can find bit patterns to
    test for one or more ranges with simple binary operations.

    > (a perhaps related question: How can I use loop-generated \u escape
    > codes? ("\u00" + i) naturally results in a compiler error message,
    > since the escape code is incomplete)


    Why do you want to? Just cast the int to a char.

    /Thomas
     
    Thomas Weidenfeller, Oct 13, 2004
    #2
    1. Advertising

  3. Fredrik

    Fredrik Guest

    Char cast, why didn't i think of that... :)

    I tried a set of nested if-else with consideration to char-frequency,
    and it seems to work OK.

    Thanks!
     
    Fredrik, Oct 14, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Don Hiatt
    Replies:
    3
    Views:
    1,644
    Terry Reedy
    Jul 24, 2003
  2. Replies:
    0
    Views:
    486
  3. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,328
    Tim Rentsch
    Sep 23, 2005
  4. Hongyu
    Replies:
    9
    Views:
    951
    James Kanze
    Aug 8, 2008
  5. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    271
    Dan Rogers
    Nov 16, 2004
Loading...

Share This Page