Extra 'invisible' characters in soap packet

R

R. K. Wijayaratne

We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
charcter limits to certain feilds (e.g. max 100 chars) and if there
are more than the expected number, it throws an error. So what we do
is we retrieve the data from the MSSQL database, truncate it to 100
characters if it is over the limit, and then call the web service.

The problem is sometimes extra 'invisible' characters get inserted
into the field data that take the field over the character limit.
These characters are there, but are not visible when I open the XML
logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
visible when I open them in the free Context Editor.

For example note the extra 'Â' char in the field below, which takes
the character count to 101 and thus over the limit by 1.

<Neighbourhood>Elizabeth North is one of the older suburbs, with
development dating from the 1950s and 1960s, as mu</Neighbourhood>

It seems that these characters are 'invisible' to the .NET String
manipulation methods, which does not seem to count them when counting
characters for truncation.

Any ideas what is happening here???
 
C

Chris Mullins [MVP - C#]

Welcome to the Brave New Unicode World.

What's going on is that you have combining characters, "A" and "^", which
are actually in string as two seperate codepoints. When you "view" the
string, the display infrastructure turns that into a single graphme, and
shows it as a single character. This is by design.

The easiest thing to do, is to stop counting characters, and start counting
bytes. To get an accurate byte count, you need to know what encoding you're
using. Then you can ask the encoder, ".GetBytes()" and have it return you
the byte count. Be carefull that you don't just start chopping bytes though,
as you may end up cutting a surrorgate pair in half, and destroying your
string.

The .Net classes that deal with this stuff start with the StringInfo class.

The best place to start reading the Jon Skeet's primer on this stuff for
..Net developers:
http://www.yoda.arachsys.com/csharp/unicode.html


--
Chris Mullins

We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
charcter limits to certain feilds (e.g. max 100 chars) and if there
are more than the expected number, it throws an error. So what we do
is we retrieve the data from the MSSQL database, truncate it to 100
characters if it is over the limit, and then call the web service.

The problem is sometimes extra 'invisible' characters get inserted
into the field data that take the field over the character limit.
These characters are there, but are not visible when I open the XML
logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
visible when I open them in the free Context Editor.

For example note the extra 'Â' char in the field below, which takes
the character count to 101 and thus over the limit by 1.

<Neighbourhood>Elizabeth North is one of the older suburbs, with
development dating from the 1950s and 1960s, as mu</Neighbourhood>

It seems that these characters are 'invisible' to the .NET String
manipulation methods, which does not seem to count them when counting
characters for truncation.

Any ideas what is happening here???
 
R

R. K. Wijayaratne

Hello,

Thanks for your helpful reply. Can I ask how do we what you have
suggested below?

"To get an accurate byte count, you need to know what encoding
you're using."

Do we target UTF8? Or do we need to find out what encoding the Java
web service uses and accommodate that (I think they are using ASCII)?

RKW.
 
R

R. K. Wijayaratne

Converting the string to ASCII before truncating did the trick:

Encoding asciiEnc = Encoding.ASCII;
byte[] buffer = asciiEnc.GetBytes(myString);
myString = asciiEnc.GetString(buffer);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top