Extra 'invisible' characters in soap packet

Discussion in 'ASP .Net Web Services' started by R. K. Wijayaratne, Nov 21, 2007.

  1. We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
    charcter limits to certain feilds (e.g. max 100 chars) and if there
    are more than the expected number, it throws an error. So what we do
    is we retrieve the data from the MSSQL database, truncate it to 100
    characters if it is over the limit, and then call the web service.

    The problem is sometimes extra 'invisible' characters get inserted
    into the field data that take the field over the character limit.
    These characters are there, but are not visible when I open the XML
    logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
    visible when I open them in the free Context Editor.

    For example note the extra 'Â' char in the field below, which takes
    the character count to 101 and thus over the limit by 1.

    <Neighbourhood>Elizabeth North is one of the older suburbs, with
    development dating from the 1950s and 1960s, as mu</Neighbourhood>

    It seems that these characters are 'invisible' to the .NET String
    manipulation methods, which does not seem to count them when counting
    characters for truncation.

    Any ideas what is happening here???
    R. K. Wijayaratne, Nov 21, 2007
    #1
    1. Advertising

  2. Welcome to the Brave New Unicode World.

    What's going on is that you have combining characters, "A" and "^", which
    are actually in string as two seperate codepoints. When you "view" the
    string, the display infrastructure turns that into a single graphme, and
    shows it as a single character. This is by design.

    The easiest thing to do, is to stop counting characters, and start counting
    bytes. To get an accurate byte count, you need to know what encoding you're
    using. Then you can ask the encoder, ".GetBytes()" and have it return you
    the byte count. Be carefull that you don't just start chopping bytes though,
    as you may end up cutting a surrorgate pair in half, and destroying your
    string.

    The .Net classes that deal with this stuff start with the StringInfo class.

    The best place to start reading the Jon Skeet's primer on this stuff for
    ..Net developers:
    http://www.yoda.arachsys.com/csharp/unicode.html


    --
    Chris Mullins

    "R. K. Wijayaratne" <> wrote in message
    news:...
    We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
    charcter limits to certain feilds (e.g. max 100 chars) and if there
    are more than the expected number, it throws an error. So what we do
    is we retrieve the data from the MSSQL database, truncate it to 100
    characters if it is over the limit, and then call the web service.

    The problem is sometimes extra 'invisible' characters get inserted
    into the field data that take the field over the character limit.
    These characters are there, but are not visible when I open the XML
    logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
    visible when I open them in the free Context Editor.

    For example note the extra 'Â' char in the field below, which takes
    the character count to 101 and thus over the limit by 1.

    <Neighbourhood>Elizabeth North is one of the older suburbs, with
    development dating from the 1950s and 1960s, as mu</Neighbourhood>

    It seems that these characters are 'invisible' to the .NET String
    manipulation methods, which does not seem to count them when counting
    characters for truncation.

    Any ideas what is happening here???
    Chris Mullins [MVP - C#], Nov 22, 2007
    #2
    1. Advertising

  3. Hello,

    Thanks for your helpful reply. Can I ask how do we what you have
    suggested below?

    "To get an accurate byte count, you need to know what encoding
    you're using."

    Do we target UTF8? Or do we need to find out what encoding the Java
    web service uses and accommodate that (I think they are using ASCII)?

    RKW.


    On Nov 22, 11:01 am, "Chris Mullins [MVP - C#]" <>
    wrote:
    > Welcome to the Brave New Unicode World.
    >
    > What's going on is that you have combining characters, "A" and "^", which
    > are actually in string as two seperate codepoints. When you "view" the
    > string, the display infrastructure turns that into a single graphme, and
    > shows it as a single character. This is by design.
    >
    > The easiest thing to do, is to stop counting characters, and start counting
    > bytes. To get an accurate byte count, you need to know what encoding you're
    > using. Then you can ask the encoder, ".GetBytes()" and have it return you
    > the byte count. Be carefull that you don't just start chopping bytes though,
    > as you may end up cutting a surrorgate pair in half, and destroying your
    > string.
    >
    > The .Net classes that deal with this stuff start with the StringInfo class..
    >
    > The best place to start reading the Jon Skeet's primer on this stuff for
    > .Net developers:http://www.yoda.arachsys.com/csharp/unicode.html
    >
    > --
    > Chris Mullins
    >
    > "R. K. Wijayaratne" <> wrote in messagenews:...
    > We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
    > charcter limits to certain feilds (e.g. max 100 chars) and if there
    > are more than the expected number, it throws an error. So what we do
    > is we retrieve the data from the MSSQL database, truncate it to 100
    > characters if it is over the limit, and then call the web service.
    >
    > The problem is sometimes extra 'invisible' characters get inserted
    > into the field data that take the field over the character limit.
    > These characters are there, but are not visible when I open the XML
    > logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
    > visible when I open them in the free Context Editor.
    >
    > For example note the extra 'Â' char in the field below, which takes
    > the character count to 101 and thus over the limit by 1.
    >
    > <Neighbourhood>Elizabeth North is one of the older suburbs, with
    > development dating from the 1950s and 1960s, as mu</Neighbourhood>
    >
    > It seems that these characters are 'invisible' to the .NET String
    > manipulation methods, which does not seem to count them when counting
    > characters for truncation.
    >
    > Any ideas what is happening here???
    R. K. Wijayaratne, Nov 22, 2007
    #3
  4. Converting the string to ASCII before truncating did the trick:

    Encoding asciiEnc = Encoding.ASCII;
    byte[] buffer = asciiEnc.GetBytes(myString);
    myString = asciiEnc.GetString(buffer);


    On Nov 22, 3:06 pm, "R. K. Wijayaratne" <> wrote:
    > Hello,
    >
    > Thanks for your helpful reply. Can I ask how do we what you have
    > suggested below?
    >
    > "To get an accurate byte count, you need to know what encoding
    > you're using."
    >
    > Do we target UTF8? Or do we need to find out what encoding the Java
    > web service uses and accommodate that (I think they are using ASCII)?
    >
    > RKW.
    >
    > On Nov 22, 11:01 am, "Chris Mullins [MVP - C#]" <>
    > wrote:
    >
    >
    >
    > > Welcome to the Brave New Unicode World.

    >
    > > What's going on is that you have combining characters, "A" and "^", which
    > > are actually in string as two seperate codepoints. When you "view" the
    > > string, the display infrastructure turns that into a single graphme, and
    > > shows it as a single character. This is by design.

    >
    > > The easiest thing to do, is to stop counting characters, and start counting
    > > bytes. To get an accurate byte count, you need to know what encoding you're
    > > using. Then you can ask the encoder, ".GetBytes()" and have it return you
    > > the byte count. Be carefull that you don't just start chopping bytes though,
    > > as you may end up cutting a surrorgate pair in half, and destroying your
    > > string.

    >
    > > The .Net classes that deal with this stuff start with the StringInfo class.

    >
    > > The best place to start reading the Jon Skeet's primer on this stuff for
    > > .Net developers:http://www.yoda.arachsys.com/csharp/unicode.html

    >
    > > --
    > > Chris Mullins

    >
    > > "R. K. Wijayaratne" <> wrote in messagenews:...
    > > We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
    > > charcter limits to certain feilds (e.g. max 100 chars) and if there
    > > are more than the expected number, it throws an error. So what we do
    > > is we retrieve the data from the MSSQL database, truncate it to 100
    > > characters if it is over the limit, and then call the web service.

    >
    > > The problem is sometimes extra 'invisible' characters get inserted
    > > into the field data that take the field over the character limit.
    > > These characters are there, but are not visible when I open the XML
    > > logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
    > > visible when I open them in the free Context Editor.

    >
    > > For example note the extra 'Â' char in the field below, which takes
    > > the character count to 101 and thus over the limit by 1.

    >
    > > <Neighbourhood>Elizabeth North is one of the older suburbs, with
    > > development dating from the 1950s and 1960s, as mu</Neighbourhood>

    >
    > > It seems that these characters are 'invisible' to the .NET String
    > > manipulation methods, which does not seem to count them when counting
    > > characters for truncation.

    >
    > > Any ideas what is happening here???- Hide quoted text -

    >
    > - Show quoted text -
    R. K. Wijayaratne, Nov 23, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy B
    Replies:
    5
    Views:
    590
    Andy B
    May 29, 2008
  2. Li Han
    Replies:
    2
    Views:
    505
    bobicanprogram
    Feb 9, 2009
  3. mathieu
    Replies:
    3
    Views:
    594
    Bo Persson
    Sep 4, 2009
  4. Gelonida N
    Replies:
    4
    Views:
    880
    Gelonida N
    Sep 11, 2011
  5. pete

    soap packet editing

    pete, Oct 2, 2003, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    120
Loading...

Share This Page