Checking whether a string contains only ISO-8859-1 chars

J

Jonck

Hi,
I need to send strings to someone else's servlet. However, these strings
may only contain ISO-8859-1 characters, therefore I need to check
whether the user of my app has not tried to enter any non-ISO-8859-1
characters before I send his/her input on to the servlet. Does anyone
know of an easy way to check whether a string contains only ISO-8859-1
characters?

The only solution I could think of was to use a regular expression where
I enter every ISO-8859-1 character in the matching sequence, but this is
rather clunky and prone to errors.

Thanks for any help, Jonck
 
T

Thomas Fritsch

Jonck said:
Hi,
I need to send strings to someone else's servlet. However, these strings
may only contain ISO-8859-1 characters, therefore I need to check
whether the user of my app has not tried to enter any non-ISO-8859-1
characters before I send his/her input on to the servlet. Does anyone
know of an easy way to check whether a string contains only ISO-8859-1
characters?

The only solution I could think of was to use a regular expression where
I enter every ISO-8859-1 character in the matching sequence, but this is
rather clunky and prone to errors.

Thanks for any help, Jonck

Another solution would be:
convert the String into bytes and the bytes back to a String, and then
compare both Strings:
String s = ...;
byte bytes[] = s.getBytes(s, "ISO-8859-1");
String s2 = new String(bytes, "ISO-8859-1");
if (s2.equals(s))
.... // String s is OK
See also the javadoc of String.
 
C

Chris Smith

Jonck said:
I need to send strings to someone else's servlet. However, these strings
may only contain ISO-8859-1 characters, therefore I need to check
whether the user of my app has not tried to enter any non-ISO-8859-1
characters before I send his/her input on to the servlet. Does anyone
know of an easy way to check whether a string contains only ISO-8859-1
characters?

The only solution I could think of was to use a regular expression where
I enter every ISO-8859-1 character in the matching sequence, but this is
rather clunky and prone to errors.

The easiest way to do this is with the java.nio.charset package:

CharsetEncoder enc = Charset.forName("ISO-8859-1").newEncoder();
if (enc.canEncode(str)) ...;
else ...;

For this particular encoding, you could also take advantage of the fact
that ISO-8859-1 contains exactly the set of unicode character with
ordinals less than 256. So you can write this instead:

boolean canEncode = true;
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) >= 256)
{
canEncode = false;
break;
}
}

The only advantage of the second approach is that this would work with
any version of the Java API; even obsolete versions like 1.1. For other
encodings, of course, this doesn't work so well.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
J

Jonck

Thomas and Chris, thanks to you both for your suggestions, both of your
solutions work perfectly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top