CDONTS or CDOSYS UTF-8 Email

J

Jed

I have a form that needs to handle international characters withing the UTF-8
character set. I have tried all the recommended strategies for getting utf-8
characters from form input to email message and I cannot get it to work. I
need to stay with classic asp for this.

Here are some things I tried:

'CDONTS
Call msg.SetLocaleIDs(65001)

'CDOSYS
msg.HTMLBodyPart.Charset = "utf-8"

I included the following meta tag in the email HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I also tried modifying the CharSet and CodePage of all involved Request and
Responses.

I was able to Response.Write the form content on post back to the screen and
it was properly rendered. However, none of my efforts can get the email to
render with the correct codebase. I have tried opening the email in Outlook
and Thunderbird. Neither one picks up on the UTF-8 charset meta tag.

Any help or link to tutorial would help so much.

Thanks.
 
A

Anthony Jones

Jed said:
I have a form that needs to handle international characters withing the UTF-8
character set. I have tried all the recommended strategies for getting utf-8
characters from form input to email message and I cannot get it to work. I
need to stay with classic asp for this.

Here are some things I tried:

'CDONTS
Call msg.SetLocaleIDs(65001)

'CDOSYS
msg.HTMLBodyPart.Charset = "utf-8"

I included the following meta tag in the email HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I also tried modifying the CharSet and CodePage of all involved Request and
Responses.

I was able to Response.Write the form content on post back to the screen and
it was properly rendered. However, none of my efforts can get the email to
render with the correct codebase. I have tried opening the email in Outlook
and Thunderbird. Neither one picks up on the UTF-8 charset meta tag.

Any help or link to tutorial would help so much.

Mixing charsets in a message is a real mine field. Try this before writing
any content to the message:-

oMsg.BodyPart.charset = "UTF-8"

Where oMsg is a CDOSYS message object (CDONTS is deprecated don't write new
code against it).

That will make all text parts use UTF-8 encoding.

Anthony.
 
J

Jed

Hey, Anthony,

Thanks for the suggestion. I was optimistic about its potential, but it
doesn't seem to make a difference.

Here is my code:
msg.BodyFormat = 0 'Set body text to HTML=0 TEXT=1
msg.MailFormat = 0 'Set format to MIME=0 TEXT=1
'Call msg.SetLocaleIDs(65001)
msg.Body = Message
msg.Send
'Try writing the contents to the browser to see if the string is bad
Response.Clear
'Response.CodePage = 65001
Response.CharSet = "utf-8"
Response.Write Message
Response.End

Basically if I post some a character like ú [u with an accent mark] it will
render fine in the browser, but the email it will appear as ú [A with tilde
over it, followed by a superscript o] I think that is the ANSII equivalent
or something.

I have read "The Absolute Minimum Every Software Developer Absolutely,
Positively Must Know About Unicode and Character Sets (No Excuses!)"
[http://www.joelonsoftware.com/articles/Unicode.html] but it doesn't seemed
to shed any light on why this isn't working.

Hmm..
 
J

Jed

Actually, this is the CDOSYS code I tried.

msg.BodyPart.Charset = "utf-8"
msg.HTMLBody = Message
msg.HTMLBodyPart.Charset = "utf-8"
msg.Send

I accidentally copied the CDONTS code in the last post.


Jed said:
Hey, Anthony,

Thanks for the suggestion. I was optimistic about its potential, but it
doesn't seem to make a difference.

Here is my code:
msg.BodyFormat = 0 'Set body text to HTML=0 TEXT=1
msg.MailFormat = 0 'Set format to MIME=0 TEXT=1
'Call msg.SetLocaleIDs(65001)
msg.Body = Message
msg.Send
'Try writing the contents to the browser to see if the string is bad
Response.Clear
'Response.CodePage = 65001
Response.CharSet = "utf-8"
Response.Write Message
Response.End

Basically if I post some a character like ú [u with an accent mark] it will
render fine in the browser, but the email it will appear as ú [A with tilde
over it, followed by a superscript o] I think that is the ANSII equivalent
or something.

I have read "The Absolute Minimum Every Software Developer Absolutely,
Positively Must Know About Unicode and Character Sets (No Excuses!)"
[http://www.joelonsoftware.com/articles/Unicode.html] but it doesn't seemed
to shed any light on why this isn't working.

Hmm..

Anthony Jones said:
Mixing charsets in a message is a real mine field. Try this before writing
any content to the message:-

oMsg.BodyPart.charset = "UTF-8"

Where oMsg is a CDOSYS message object (CDONTS is deprecated don't write new
code against it).

That will make all text parts use UTF-8 encoding.

Anthony.
 
L

Luke Zhang [MSFT]

Hello,

Is the CDOSYS code executed in an ASP application? You may try send a plain
text email intstead of the HTML email like:

msg.BodyPart.Charset = "UTF-8"
msg.TextBody = Message
msg.TextBodyPart.Charset = "UTF-8"
msg.Send

Can you receive correct charactors in the email for plain text format?

Sincerely,

Luke Zhang

Microsoft Online Community Support
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
A

Anthony Jones

Jed said:
Actually, this is the CDOSYS code I tried.

msg.BodyPart.Charset = "utf-8"
msg.HTMLBody = Message
msg.HTMLBodyPart.Charset = "utf-8"
msg.Send

I accidentally copied the CDONTS code in the last post.

Try this in a VBScript file:-

Option Explicit

Const cdoSendUsingMethod =
"http://schemas.microsoft.com/cdo/configuration/sendusing"
Const cdoFlushBuffersOnWrite =
"http://schemas.microsoft.com/cdo/configuration/flushbuffersonwrite"
Const cdoSMTPServerPickupDirectory =
"http://schemas.microsoft.com/cdo/configuration/smtpserverpickupdirectory"
Const cdoSendUsingPickup = 1

Dim oMsg : Set oMsg = CreateObject("CDO.Message")

Set oMsg.Configuration = CreateObject("CDO.Configuration")

With oMsg.Configuration.Fields
.Item(cdoSendUsingMethod) = cdoSendUsingPickup
.Item(cdoFlushBuffersOnWrite) = True
.Item(cdoSMTPServerPickupDirectory) = "G:\temp\pickup" '*** change this
.Update
End With

oMsg.BodyPart.charset = "UTF-8"

oMsg.From = "(e-mail address removed)"
oMsg.To = "(e-mail address removed)"
oMsg.Subject = "Testing"
oMsg.HTMLBody = "<html><body>£</body></html>"

oMsg.Send

MsgBox "Done"


Change the pick folder to a temp folder on your macine.

When executed open the resulting eml file in Outlook Express (double click
it). Does the £ appear correctly without other strange characters?

Open the eml file in notepad you should see something like:-

X-Receiver: (e-mail address removed)
X-Sender: (e-mail address removed)
From: <[email protected]>
To: <[email protected]>
Subject: Testing
Date: Sun, 12 Nov 2006 19:46:27 -0000
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0001_01C70693.3DE9F350"
Content-Class: urn:content-classes:message

This is a multi-part message in MIME format.

------=_NextPart_000_0001_01C70693.3DE9F350
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: base64

wqPigqzFkg0K

------=_NextPart_000_0001_01C70693.3DE9F350
Content-Type: text/html;
charset="UTF-8"
Content-Transfer-Encoding: 8bit

<html><body>£</body></html>
------=_NextPart_000_0001_01C70693.3DE9F350--

I deleted some headers for clarity. However you can see that by specifying
UTF-8 on the main message body part before writing anything to the message
has caused it to cascade the UTF-8 encoding to the alternative parts.

What happens you change the code so that the configuration sends using port
25 to your SMTP server and you specify your real email address as the
receiver. Does the email look ok when it arrives in outlook/thunderbird?
 
J

Jed

Thanks for the input Anthony,

I wrote out the email as you indicated and indeed the headers are UTF-8 but
the text is wrong:

This:
msg.BodyPart.Charset = "UTF-8"
msg.TextBody = Message
msg.TextBodyPart.Charset = "UTF-8"
msg.HTMLBody = Message
msg.HTMLBodyPart.Charset = "UTF-8"

Yields this:

------=_NextPart_000_0001_01C70806.B7C0CB80
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: 8bit

------=_NextPart_000_0001_01C70806.B7C0CB80
Content-Type: text/html;
charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

When I just set the BodyPart.Charset = "UTF-8" and set the HTMLBody =
Message then I get the following text in the text version of the email.

=C3=83=C2=BA

Using Notepad++
Start in ANSI
Convert the chars above from Hex to Text
(Plugins > TextFX Convert > Hex to Text)
Then switch to UTF-8
You get the chars in the email

Then cut the characters
Switch to ANSI mode
Paste the characters
Switch to UTF-8
And you get the character that is supposed to be there.

I don't get it. Any ideas what I am doing wrong?
 
A

Anthony Jones

Jed said:
Thanks for the input Anthony,

I wrote out the email as you indicated and indeed the headers are UTF-8 but
the text is wrong:

Before we go any further did you paste my code verbatim into a VBS? (Cos
what you posted below isn't what I posted)
Did you then open it in outlook express and did it look right?

This:
msg.BodyPart.Charset = "UTF-8"

Don't do this:-
msg.TextBody = Message
msg.TextBodyPart.Charset = "UTF-8"
msg.HTMLBody = Message

Don't do this either:-
msg.HTMLBodyPart.Charset = "UTF-8"

Yields this:

------=_NextPart_000_0001_01C70806.B7C0CB80
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: 8bit

------=_NextPart_000_0001_01C70806.B7C0CB80
Content-Type: text/html;
charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

When I just set the BodyPart.Charset = "UTF-8" and set the HTMLBody =
Message then I get the following text in the text version of the email.

=C3=83=C2=BA

Using Notepad++
Start in ANSI
Convert the chars above from Hex to Text
(Plugins > TextFX Convert > Hex to Text)
Then switch to UTF-8
You get the chars in the email

Then cut the characters
Switch to ANSI mode
Paste the characters
Switch to UTF-8
And you get the character that is supposed to be there.

I don't get it. Any ideas what I am doing wrong?

It would help if I knew what character this is supposed to be? ú ?
What ANSI codepage are you using and what are the char codes for these
characters in that code page?
Are you certain the chararacter isn't already corrupted?
The fact that 4 octets have appeared in the output suggests to me that the
character is going through the UTF-8 encoding twice?
Is this in ASP?
Are you posting from a UTF-8 encoded HTML form?
 
J

Jed

Hi Anthony,

I have a good feeling that you will be able to help me get to the bottom of
this.

Let me answer your questions.

Anthony Jones said:
Before we go any further did you paste my code verbatim into a VBS? (Cos
what you posted below isn't what I posted)

Yes. I tried it exactly as you recommended then I tried some other things.
Did you then open it in outlook express and did it look right?

Yes. I opened the eml in outlook and it did not look right.
It would help if I knew what character this is supposed to be? ú ?

Yes. You are correct about the character code. I would have pasted it in
my message but I was not confident that it would come out right in the post.
What ANSI codepage are you using and what are the char codes for these
characters in that code page?

I don't know what ANSI code page Notepad++ uses. I am guessing the default
for my localization settings in windows.
Are you certain the chararacter isn't already corrupted?

I don't know, but when I write the results out to the web page using
Response.Write(Message) I get the correct characters.

Response.Clear
'I have heard you need the following, but it seems to
' render fine in the browser without it
'Response.CodePage = 65001
Response.CharSet = "utf-8"
Response.Write Message
Response.End
The fact that 4 octets have appeared in the output suggests to me that the
character is going through the UTF-8 encoding twice?

This is possible, I guess. I don't know.
Is this in ASP?

Yes. This is a classic asp page handling the request using the standard asp
ISAPI dll in IIS 6.
Are you posting from a UTF-8 encoded HTML form?

I believe so. I put the following in the HTML of the form page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Does that make any sense?
 
A

Anthony Jones

Jed said:
Hi Anthony,

I have a good feeling that you will be able to help me get to the bottom of
this.

Let me answer your questions.



Yes. I tried it exactly as you recommended then I tried some other things.

Yes. I opened the eml in outlook and it did not look right.

Was that after you 'tried some things' or before?

It didn't contain simply a British pound sign (£)?
How did the contents of the eml file create by my original code differ from
the contents I posted along with the code?

Yes. You are correct about the character code. I would have pasted it in
my message but I was not confident that it would come out right in the post.

I don't know what ANSI code page Notepad++ uses. I am guessing the default
for my localization settings in windows.

Yes it uses the localization settings.

I don't know, but when I write the results out to the web page using
Response.Write(Message) I get the correct characters.

Response.Clear
'I have heard you need the following, but it seems to
' render fine in the browser without it
'Response.CodePage = 65001
Response.CharSet = "utf-8"
Response.Write Message
Response.End

Your problem I believe hinges around a couple of little understood facts.
The response.codepage affects the way posted characters received in the
Request are converted to unicode. IOW, if the response code page is set to
a standard ANSI character set then any characters received in a form post
will be assumed to also be in the same ANSI character set.

Here's another fact. A browser will encode characters into a Form post
according to the charset for the page. Hence a content-type specifying a
charset of UTF-8 will cause characters in the form fields to be encoded to
UTF-8 when posted.

Combining these facts we can see that if a UTF-8 page posts characters to an
ASP target which reads the form fields whilst the Response.CodePage is set
to an ANSI codepage this would result in each byte in a multibyte UTF-8
character to be treated as individual characters.

The code above hides this problem because Response.Write is assuming it is
sending ANSI but tells the page it is getting UTF-8 reversing the problem.

This is possible, I guess. I don't know.


Yes. This is a classic asp page handling the request using the standard asp
ISAPI dll in IIS 6.


I believe so. I put the following in the HTML of the form page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Yeah don't do that. Use the Charset and ContentType properties of the
response object.
Does that make any sense?

Yes. When receiving a Form post from a UTF-8 page make sure your
Response.Codepage is set to 65001 before you attempt to read any form
fields.

Anthony.
 
J

Jed

You're a rock star Anthony! I assumed that the Response.Codepage only
affected the Response stream, but the fact that it also determines how the
Request items are read is good to know.

I run into encoding problems with XML too. One of these days I am going to
figure this out.

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top