F
Fredrik
Im trying to remove unwanted chars from a string in order to make it
XML-compatible. The chars that should be avoided are:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
[#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
[#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
[#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
[#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
[#10FFFE-#x10FFFF].
(http://www.w3.org/TR/REC-xml/#charsets)
I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
going to process rather large strings, Im looking for the most
efficent replace-algorithm.
All suggestions are much appreciated.
(a perhaps related question: How can I use loop-generated \u escape
codes? ("\u00" + i) naturally results in a compiler error message,
since the escape code is incomplete)
XML-compatible. The chars that should be avoided are:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
[#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
[#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
[#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
[#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
[#10FFFE-#x10FFFF].
(http://www.w3.org/TR/REC-xml/#charsets)
I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
going to process rather large strings, Im looking for the most
efficent replace-algorithm.
All suggestions are much appreciated.
(a perhaps related question: How can I use loop-generated \u escape
codes? ("\u00" + i) naturally results in a compiler error message,
since the escape code is incomplete)