Replace (remove) multiple chars from string.

F

Fredrik

Im trying to remove unwanted chars from a string in order to make it
XML-compatible. The chars that should be avoided are:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
[#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
[#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
[#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
[#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
[#10FFFE-#x10FFFF].

(http://www.w3.org/TR/REC-xml/#charsets)

I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
going to process rather large strings, Im looking for the most
efficent replace-algorithm.
All suggestions are much appreciated.

(a perhaps related question: How can I use loop-generated \u escape
codes? ("\u00" + i) naturally results in a compiler error message,
since the escape code is incomplete)
 
T

Thomas Weidenfeller

Fredrik said:
I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
going to process rather large strings, Im looking for the most
efficent replace-algorithm.

A loop with a carefully crafted set of tests, lookup tables etc. to
verify each characters. And a StringBuilder or pre-allocated char[] to
place the valid characters into.

For the tests I would give the binary representation of the char codes
in the illegal ranges a good look. Maybe you can find bit patterns to
test for one or more ranges with simple binary operations.
(a perhaps related question: How can I use loop-generated \u escape
codes? ("\u00" + i) naturally results in a compiler error message,
since the escape code is incomplete)

Why do you want to? Just cast the int to a char.

/Thomas
 
F

Fredrik

Char cast, why didn't i think of that... :)

I tried a set of nested if-else with consideration to char-frequency,
and it seems to work OK.

Thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top