Characters aren't displayed correctly

Hussein B · Mar 1, 2009

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
Ubuntu 8.04
Python 2.5.2
Evolution Mail Client
Thanks.

Philip Semanchuk · Mar 1, 2009

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Are the characters stored in the database correctly?

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

Are you sure that the program you're using to view the email
understands the encoding?

Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.

Good luck
Philip

J. Clifford Dyer · Mar 1, 2009

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

Click to expand...

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Are the characters stored in the database correctly?

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

Are you sure that the program you're using to view the email
understands the encoding?

Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.

Good luck
Philip

Let me add to that checklist:

Are you sure the email you are creating has the encoding declared
properly in the headers?

Cheers,
Cliff

Hussein B · Mar 2, 2009

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

Click to expand...

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Are the characters stored in the database correctly?

Yes they are.

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)? Yes.

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

I don't know, how to make sure of this point?

Hussein B · Mar 2, 2009

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

Click to expand...

Click to expand...

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Click to expand...

Are the characters stored in the database correctly?

Click to expand...

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?

Click to expand...

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

Click to expand...

Are you sure that the program you're using to view the email
understands the encoding?

Click to expand...

Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.

Click to expand...

Good luck
Philip

Click to expand...

Let me add to that checklist:

Are you sure the email you are creating has the encoding declared
properly in the headers?

Cheers,
Cliff

My HTML markup contains only table tags (you know, table, tr and td)

J. Clifford Dyer · Mar 2, 2009

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:

Click to expand...

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

Click to expand...

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Click to expand...

Are the characters stored in the database correctly?

Click to expand...

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?

Click to expand...

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

Click to expand...

Are you sure that the program you're using to view the email
understands the encoding?

Click to expand...

Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.

Click to expand...

Good luck
Philip

Click to expand...

Let me add to that checklist:

Are you sure the email you are creating has the encoding declared
properly in the headers?

Cheers,
Cliff

Click to expand...

My HTML markup contains only table tags (you know, table, tr and td)

Ah. The issue is not with the HTML markup, but the email headers. For
example, the email you sent me has a header that says:

Content-type: text/plain; charset="iso-8859-1"

Guessing from the recipe you linked to, you probably need something
like:

msgRoot['Content-type'] = 'text/plain; charset="utf-16"'

replacing utf-16 with whatever encoding you have encoded your email
with.

Or it may be that the header has to be attached to the individual mime
parts. I'm not as familiar with MIME.

Cheers,
Cliff

Hussein B · Mar 2, 2009

31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?
What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?
Are you sure that the program you're using to view the email
understands the encoding?
Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.
Good luck
Philip
Let me add to that checklist:
Are you sure the email you are creating has the encoding declared
properly in the headers?
Cheers,
Cliff

Click to expand...

Click to expand...

My HTML markup contains only table tags (you know, table, tr and td)

Click to expand...

Ah. The issue is not with the HTML markup, but the email headers. For
example, the email you sent me has a header that says:

Content-type: text/plain; charset="iso-8859-1"

Guessing from the recipe you linked to, you probably need something
like:

msgRoot['Content-type'] = 'text/plain; charset="utf-16"'

replacing utf-16 with whatever encoding you have encoded your email
with.

Or it may be that the header has to be attached to the individual mime
parts. I'm not as familiar with MIME.

Cheers,
Cliff

Hey Cliff,
I tried your tip and I still get the same thing (?????)
I added print statement to print each value of the result set into the
console, which also prints ???? characters instead of the real
characters values.
Maybe a conversion is happened upon getting the data from the
database?
(the values are stored correctly in the database)

John Machin · Mar 2, 2009

Can you reveal which language???

Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

Click to expand...

Click to expand...

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Click to expand...

Are the characters stored in the database correctly?

Click to expand...

Yes they are.

How do you KNOW that they are stored correctly? What makes you so
sure?

Yes.

So what is the encoding used to store them?

I don't know, how to make sure of this point?

You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.

And let's get the database output sorted out before we worry about the
email message.

Hussein B · Mar 2, 2009

Can you reveal which language???

Arabic

Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones..
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?

Click to expand...

Click to expand...

Yes they are.

Click to expand...

How do you KNOW that they are stored correctly? What makes you so
sure?

Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.

So what is the encoding used to store them?

Tables are created with UTF-8 encoding option

You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.

The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.

And let's get the database output sorted out before we worry about the
email message.

Thanks all for help.

Philip Semanchuk · Mar 2, 2009

Personally, I'd add a debug breakpoint just after extracting the
characters from the database, like so:

import pdb
pdb.set_trace()

When you're stopped at the breakpoint, examine the string you get
back. Is it what you expect? For instance, is it Unicode?

isinstance(my_string, unicode)

Or maybe you're expecting a utf-8 encoded string, so examine one of
the non-ASCII characters. Is it really utf-8 encoded?

>>> my_string = u"snö".encode("utf-8")
>>> my_string[0] 's'
>>> my_string[1] 'n'
>>> my_string[2] '\xc3'
>>> my_string[3]

Click to expand...

Click to expand...

'\xb6'

Since you feel pretty confident that you're getting what you expect
out of the database, maybe you want to eliminate that from
consideration. As a test, construct "by hand" a string that represents
the email message you're trying to send. If you send that with the
proper content-type header and you still don't get the results you
want, then we can all stop discussing the database. Make sense?

Forget about the HTML markup, too. That's just a distraction. Start
with the simplest problem first, and then add pieces on.

See if you can successfully construct and send an email that says
"Hello world" in English/ASCII. If that works, change it to Arabic. If
that works, change the email format to HTML. If that works, starts
pulling the content from the database. If that works, then you're
done. =)

bye
Philip

John Machin · Mar 2, 2009

Can you reveal which language???
Arabic

Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.

Click to expand...

Click to expand...

How do you KNOW that they are stored correctly? What makes you so
sure?

Click to expand...

Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.

So what is the encoding used to store them?

Click to expand...

Tables are created with UTF-8 encoding option

You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.

Click to expand...

The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.

Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?

We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.

So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.

John Machin · Mar 2, 2009

Personally, I'd add a debug breakpoint just after extracting the
characters from the database, like so:

import pdb
pdb.set_trace()

When you're stopped at the breakpoint, examine the string you get
back. Is it what you expect? For instance, is it Unicode?

isinstance(my_string, unicode)

Or maybe you're expecting a utf-8 encoded string, so examine one of
the non-ASCII characters. Is it really utf-8 encoded?

>>> my_string = u"snö".encode("utf-8")
>>> my_string[0]
's'
>>> my_string[1]
'n'
>>> my_string[2]
'\xc3'
>>> my_string[3]
'\xb6'

Since you feel pretty confident that you're getting what you expect
out of the database, maybe you want to eliminate that from
consideration. As a test, construct "by hand" a string that represents
the email message you're trying to send. If you send that with the
proper content-type header and you still don't get the results you
want, then we can all stop discussing the database. Make sense?

Forget about the HTML markup, too. That's just a distraction. Start
with the simplest problem first, and then add pieces on.

See if you can successfully construct and send an email that says
"Hello world" in English/ASCII. If that works, change it to Arabic. If
that works, change the email format to HTML. If that works, starts
pulling the content from the database. If that works, then you're
done. =)

Yuk. You are asking him to write extra speculative code when he's
having extreme difficulty debugging the code he's already got! He's
already said he's getting ?????? soon after the database retrieval ---
you want him to work on the downstream problem when the upstream is
still very muddy???

Sheeesh.

Philip Semanchuk · Mar 2, 2009

Yuk. You are asking him to write extra speculative code when he's
having extreme difficulty debugging the code he's already got! He's
already said he's getting ?????? soon after the database retrieval ---
you want him to work on the downstream problem when the upstream is
still very muddy???

First of all, I preceded that paragraph with a detailed example of how
to verify that he's getting what he expects out of the database. So
no, I am not asking the OP to write extra speculative code. I'm giving
him another tool with which to work at his problem.

He claims to have done what I asked him to do in the first place --
break the problem into steps and verify the database steps. He says
they're working OK. I chose to take him at his word.

If he's right, then we can move on to the next step of troubleshooting
the email. If he's wrong and the problem is indeed with the database
code, then we'll eventually discover that and he'll have learned a
valuable lesson. It will be time-consuming and therefore painful for
him, but then he'll be more likely to remember it.

There's more than one way to attack this problem/set of problems, yes?

This is all kind of OT since it is about general debugging and not
about Python. The only Python-specific aspect I see is that debugging
non-ASCII problems with print is a little tricky since it introduces
yet another variable -- the terminal's encoding settings. If, for
instance, the OP's terminal is set to ISO 8859-6 or some such (I don't
know anything about encodings to handle Arabic) and he's feeding it
UTF-8, then ??????? might indeed be the result.

John Machin · Mar 2, 2009

First of all, I preceded that paragraph with a detailed example of how
to verify that he's getting what he expects out of the database. So
no, I am not asking the OP to write extra speculative code. I'm giving
him another tool with which to work at his problem.

He claims to have done what I asked him to do in the first place --
break the problem into steps and verify the database steps. He says
they're working OK. I chose to take him at his word.

Rule number 1: Don't believe anything an OP says that is not
corroborated by output that looks like it was produced using the repr
() function (2.x) or ascii() function (3.x)

Rule number 2: Don't ignore rule number 1, especially when not
corroborated by any output at all.

Rule number 3: [added since the Great Renaming aka the Mad Hatter's
Tea Party] Ask the OP what version of Python they are using so that
they can be told to use ascii() instead of repr() if using 3.X

If he's right, then we can move on to the next step of troubleshooting
the email. If he's wrong and the problem is indeed with the database
code, then we'll eventually discover that

He has *already* demonstrated, at my request, that there is a problem
with, or soon after, the database extraction:

"""
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
"""

and he'll have learned a
valuable lesson. It will be time-consuming and therefore painful for
him, but then he'll be more likely to remember it.

There's more than one way to attack this problem/set of problems, yes?

This is all kind of OT since it is about general debugging and not
about Python. The only Python-specific aspect I see is that debugging
non-ASCII problems with print is a little tricky since it introduces
yet another variable -- the terminal's encoding settings. If, for
instance, the OP's terminal is set to ISO 8859-6 or some such (I don't
know anything about encodings to handle Arabic) and he's feeding it
UTF-8, then ??????? might indeed be the result.

and that is the rationale for Rule #1

Philip Semanchuk · Mar 3, 2009

Rule number 1: Don't believe anything an OP says that is not
corroborated by output that looks like it was produced using the repr
() function (2.x) or ascii() function (3.x)

Saying "I don't believe you" has never worked well for me as a
conversation opener. Sometimes taking someone at his word is another
name for giving him enough rope to...make a mistake that he'll remember.

And for many people, trust breeds trust. I trust him, maybe he'll
trust me when I say (for the second time), "You need to break this
problem down into discrete, debuggable units."

I (mostly) agree with your rule. But as I said, there's more than one
way to solve this problem. Or perhaps I should say that there's more
than one way to lead the OP to a solution to this problem. We teach
differently, you and I. I believe there's room in the world for *both*
styles -- perhaps even a third or fourth! =)

Cheers
Philip

Hussein B · Mar 3, 2009

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???

Click to expand...

Arabic

Click to expand...

Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?

Click to expand...

Click to expand...

Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.

Click to expand...

Tables are created with UTF-8 encoding option

Click to expand...

The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.

Click to expand...

Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?

We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.

So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.

This is how I retrieve the data:

db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()

Thanks all for your nice help.

Hussein B · Mar 3, 2009

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?
Yes.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?
I don't know, how to make sure of this point?
You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.

Click to expand...

Click to expand...

Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?

Click to expand...

We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.

Click to expand...

So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.

Click to expand...

This is how I retrieve the data:

db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()

Thanks all for your nice help.

Hey,
I added use_unicode and charset keyword params to the connect() method
and I got the following:
u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
\u0634\u0647\u0631'
So characters are getting converted successfully.
Well, using the previous recipe for sending the mail:
http://code.activestate.com/recipes/473810/
I got the following error:

Traceback (most recent call last):
File "HtmlMail.py", line 52, in <module>
s.sendmail(sender, receiver , msg.as_string())
File "/usr/lib/python2.5/email/message.py", line 131, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/usr/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/usr/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/usr/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/usr/lib/python2.5/email/generator.py", line 201, in
_handle_multipart
g.flatten(part, unixfrom=False)
File "/usr/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/usr/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/usr/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/usr/lib/python2.5/email/generator.py", line 201, in
_handle_multipart
g.flatten(part, unixfrom=False)
File "/usr/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/usr/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/usr/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/usr/lib/python2.5/email/generator.py", line 178, in
_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)

Again, any ideas guys?

Thanks to you all, you rocks !

John Machin · Mar 3, 2009

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?
Yes.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?
I don't know, how to make sure of this point?
You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?
We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.
So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.

Click to expand...

Click to expand...

This is how I retrieve the data:

Click to expand...

db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()

Click to expand...

Thanks all for your nice help.

Click to expand...

Hey,
I added use_unicode and charset keyword params to the connect() method

Hey, that was a brilliant idea -- I was just about to ask you to try
use_unicode=True, charset="utf8" ... what were the actual values that
you used?

Let's suppose that you used charset="XXXX" ... as far as I can tell,
not being a mysqldb user myself, this means that your data tables and/
or your default connection don't use XXXX as an encoding. If so, this
might be an issue you might like to take up with whoever created the
database that you are using.

and I got the following:
u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
\u0634\u0647\u0631'
So characters are getting converted successfully.

I guess so -- U+06nn sure are Arabic characters

However as suggested above, "converted from what?" might be worth
pursuing if you like to understand what is going on instead of just
applying magic recipes ;-)

Well, using the previous recipe for sending the mail:http://code.activestate.com/recipes/473810/
I got the following error:

Traceback (most recent call last):
File "HtmlMail.py", line 52, in <module>
s.sendmail(sender, receiver , msg.as_string())

[big snip]

_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)

Again, any ideas guys?

That recipe appears to have been written by an ascii bigot for ascii
bigots :-(

Try reading the docs for email.charset (that's the charset module in
the email package).

Cheers,
John

Hussein B · Mar 3, 2009

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?
Yes.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?
I don't know, how to make sure of this point?
You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?
We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.
So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.
This is how I retrieve the data:
db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()
Thanks all for your nice help.

Click to expand...

Click to expand...

Hey,
I added use_unicode and charset keyword params to the connect() method

Click to expand...

Hey, that was a brilliant idea -- I was just about to ask you to try
use_unicode=True, charset="utf8" ... what were the actual values that
you used?

I didn't supply values for them the first times.

Let's suppose that you used charset="XXXX" ... as far as I can tell,
not being a mysqldb user myself, this means that your data tables and/
or your default connection don't use XXXX as an encoding. If so, this
might be an issue you might like to take up with whoever created the
database that you are using.

and I got the following:
u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
\u0634\u0647\u0631'
So characters are getting converted successfully.

Click to expand...

I guess so -- U+06nn sure are Arabic characters

However as suggested above, "converted from what?" might be worth
pursuing if you like to understand what is going on instead of just
applying magic recipes ;-)

Well, using the previous recipe for sending the mail:http://code.activestate.com/recipes/473810/
I got the following error:

Click to expand...

Traceback (most recent call last):
File "HtmlMail.py", line 52, in <module>
s.sendmail(sender, receiver , msg.as_string())

Click to expand...

[big snip]

_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)

Click to expand...

Again, any ideas guys?

Click to expand...

That recipe appears to have been written by an ascii bigot for ascii
bigots :-(

Try reading the docs for email.charset (that's the charset module in
the email package).

Every thing is working now, I did the following:
t = MIMEText(markup.encode('utf-8'), 'html', 'utf-8')

Cheers,
John

Thank you all guys and especially you John, I owe you a HUGE bottle of
beer

John Machin · Mar 3, 2009

I didn't supply values for them the first times.

I guessed that! I was referring to the fact that you didn't tell us
what values you did eventually supply that made it generate seemingly
reasonable Arabic letters in unicode!! Was it charset="utf8" that did
the trick?

Let's suppose that you used charset="XXXX" ... as far as I can tell,
not being a mysqldb user myself, this means that your data tables and/
or your default connection don't use XXXX as an encoding. If so, this
might be an issue you might like to take up with whoever created the
database that you are using.

Click to expand...

I guess so -- U+06nn sure are Arabic characters

Click to expand...

However as suggested above, "converted from what?" might be worth
pursuing if you like to understand what is going on instead of just
applying magic recipes ;-)

Click to expand...

[big snip]

_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)
Again, any ideas guys?

Click to expand...

Click to expand...

That recipe appears to have been written by an ascii bigot for ascii
bigots :-(

Click to expand...

Try reading the docs for email.charset (that's the charset module in
the email package).

Click to expand...

Every thing is working now, I did the following:
t = MIMEText(markup.encode('utf-8'), 'html', 'utf-8')

Thank you all guys and especially you John, I owe you a HUGE bottle of
beer

Thanks for the kind thought, but beer decreases grey-cell count and
increases girth ... I don't need any assistance with those matters

Cheers,
John

Treetop parser (or PEG in general?) questions	18	Jan 28, 2008
Need help with first program to connect to mysql database via apacheand python.	1	Feb 7, 2008
Embedded Ruby and Tag Libs	15	Sep 26, 2005
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
ANN: Sequel 3.11.0 Released	0	May 3, 2010
ValidatorEnable dying when master pages are involved	0	Sep 19, 2006
Can't make this page work	6	Mar 8, 2006
MapNetworkDrive: Extremely Poor Performance When Writing to Path	1	Feb 5, 2007

Characters aren't displayed correctly

Hussein B

Philip Semanchuk

J. Clifford Dyer

Hussein B

Hussein B

J. Clifford Dyer

Hussein B

John Machin

Hussein B

Philip Semanchuk

John Machin

John Machin

Philip Semanchuk

John Machin

Philip Semanchuk

Hussein B

Hussein B

John Machin

Hussein B

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads