Characters aren't displayed correctly

H

Hussein B

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
Ubuntu 8.04
Python 2.5.2
Evolution Mail Client
Thanks.
 
P

Philip Semanchuk

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Are the characters stored in the database correctly?

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

Are you sure that the program you're using to view the email
understands the encoding?

Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.


Good luck
Philip
 
J

J. Clifford Dyer

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

There's so many places where this could go wrong and you haven't
narrowed down the problem.

Are the characters stored in the database correctly?

Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?

What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?

Are you sure that the program you're using to view the email
understands the encoding?

Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.


Good luck
Philip

Let me add to that checklist:

Are you sure the email you are creating has the encoding declared
properly in the headers?

Cheers,
Cliff
 
H

Hussein B

Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?

There's so many places where this could go wrong and you haven't  
narrowed down the problem.

Are the characters stored in the database correctly?
Yes they are.
Are they stored consistently (i.e. all using the same encoding, not  
some using utf-8 and others using iso-8859-1)? Yes.

What are you getting out of the database? Is it being converted to  
Unicode correctly, or at all?
I don't know, how to make sure of this point?
 
H

Hussein B

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Are they stored consistently (i.e. all using the same encoding, not  
some using utf-8 and others using iso-8859-1)?
What are you getting out of the database? Is it being converted to  
Unicode correctly, or at all?
Are you sure that the program you're using to view the email  
understands the encoding?
Isolate those questions one at a time. Add some debugging breakpoints.  
Ensure that you have what you think you have. You might not fix your  
problem, but you will make it much smaller and more specific.
Good luck
Philip

Let me add to that checklist:

Are you sure the email you are creating has the encoding declared
properly in the headers?



Cheers,
Cliff

My HTML markup contains only table tags (you know, table, tr and td)
 
J

J. Clifford Dyer

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't
narrowed down the problem.
Are the characters stored in the database correctly?
Are they stored consistently (i.e. all using the same encoding, not
some using utf-8 and others using iso-8859-1)?
What are you getting out of the database? Is it being converted to
Unicode correctly, or at all?
Are you sure that the program you're using to view the email
understands the encoding?
Isolate those questions one at a time. Add some debugging breakpoints.
Ensure that you have what you think you have. You might not fix your
problem, but you will make it much smaller and more specific.
Good luck
Philip

Let me add to that checklist:

Are you sure the email you are creating has the encoding declared
properly in the headers?




Cheers,
Cliff

My HTML markup contains only table tags (you know, table, tr and td)

Ah. The issue is not with the HTML markup, but the email headers. For
example, the email you sent me has a header that says:

Content-type: text/plain; charset="iso-8859-1"

Guessing from the recipe you linked to, you probably need something
like:

msgRoot['Content-type'] = 'text/plain; charset="utf-16"'

replacing utf-16 with whatever encoding you have encoded your email
with.

Or it may be that the header has to be attached to the individual mime
parts. I'm not as familiar with MIME.


Cheers,
Cliff
 
H

Hussein B

31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Are they stored consistently (i.e. all using the same encoding, not  
some using utf-8 and others using iso-8859-1)?
What are you getting out of the database? Is it being converted to  
Unicode correctly, or at all?
Are you sure that the program you're using to view the email  
understands the encoding?
Isolate those questions one at a time. Add some debugging breakpoints.  
Ensure that you have what you think you have. You might not fix your  
problem, but you will make it much smaller and more specific.
Good luck
Philip
Let me add to that checklist:
Are you sure the email you are creating has the encoding declared
properly in the headers?
Cheers,
Cliff
My HTML markup contains only table tags (you know, table, tr and td)

Ah.  The issue is not with the HTML markup, but the email headers.  For
example, the email you sent me has a header that says:

Content-type: text/plain; charset="iso-8859-1"

Guessing from the recipe you linked to, you probably need something
like:

msgRoot['Content-type'] = 'text/plain; charset="utf-16"'

replacing utf-16 with whatever encoding you have encoded your email
with.

Or it may be that the header has to be attached to the individual mime
parts.  I'm not as familiar with MIME.

Cheers,
Cliff

Hey Cliff,
I tried your tip and I still get the same thing (?????)
I added print statement to print each value of the result set into the
console, which also prints ???? characters instead of the real
characters values.
Maybe a conversion is happened upon getting the data from the
database?
(the values are stored correctly in the database)
 
J

John Machin

Can you reveal which language???
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?

Yes they are.

How do you KNOW that they are stored correctly? What makes you so
sure?

So what is the encoding used to store them?
I don't know, how to make sure of this point?

You could show us some of the output from the database query. As well
as
print the_output
you should
print repr(the_output)
and show us both, and also tell us what you *expect* to see.

And let's get the database output sorted out before we worry about the
email message.
 
H

Hussein B

Can you reveal which language???


Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones..
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.

How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
You could show us some of the output from the database query. As well
as
   print the_output
you should
   print repr(the_output)
and show us both, and also tell us what you *expect* to see.

The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
And let's get the database output sorted out before we worry about the
email message.

Thanks all for help.
 
P

Philip Semanchuk

Personally, I'd add a debug breakpoint just after extracting the
characters from the database, like so:

import pdb
pdb.set_trace()

When you're stopped at the breakpoint, examine the string you get
back. Is it what you expect? For instance, is it Unicode?

isinstance(my_string, unicode)

Or maybe you're expecting a utf-8 encoded string, so examine one of
the non-ASCII characters. Is it really utf-8 encoded?
>>> my_string = u"snö".encode("utf-8")
>>> my_string[0] 's'
>>> my_string[1] 'n'
>>> my_string[2] '\xc3'
>>> my_string[3]
'\xb6'


Since you feel pretty confident that you're getting what you expect
out of the database, maybe you want to eliminate that from
consideration. As a test, construct "by hand" a string that represents
the email message you're trying to send. If you send that with the
proper content-type header and you still don't get the results you
want, then we can all stop discussing the database. Make sense?

Forget about the HTML markup, too. That's just a distraction. Start
with the simplest problem first, and then add pieces on.

See if you can successfully construct and send an email that says
"Hello world" in English/ASCII. If that works, change it to Arabic. If
that works, change the email format to HTML. If that works, starts
pulling the content from the database. If that works, then you're
done. =)

bye
Philip
 
J

John Machin

Can you reveal which language???
Arabic


Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?

Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.


So what is the encoding used to store them?

Tables are created with UTF-8 encoding option
You could show us some of the output from the database query. As well
as
   print the_output
you should
   print repr(the_output)
and show us both, and also tell us what you *expect* to see.

The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.

Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?

We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.

So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.
 
J

John Machin

Personally, I'd add a debug breakpoint just after extracting the  
characters from the database, like so:

    import pdb
    pdb.set_trace()

When you're stopped at the breakpoint, examine the string you get  
back. Is it what you expect? For instance, is it Unicode?

    isinstance(my_string, unicode)

Or maybe you're expecting a utf-8 encoded string, so examine one of  
the non-ASCII characters. Is it really utf-8 encoded?

 >>> my_string = u"snö".encode("utf-8")
 >>> my_string[0]
's'
 >>> my_string[1]
'n'
 >>> my_string[2]
'\xc3'
 >>> my_string[3]
'\xb6'

Since you feel pretty confident that you're getting what you expect  
out of the database, maybe you want to eliminate that from  
consideration. As a test, construct "by hand" a string that represents  
the email message you're trying to send. If you send that with the  
proper content-type header and you still don't get the results you  
want, then we can all stop discussing the database. Make sense?

Forget about the HTML markup, too. That's just a distraction. Start  
with the simplest problem first, and then add pieces on.

See if you can successfully construct and send an email that says  
"Hello world" in English/ASCII. If that works, change it to Arabic. If  
that works, change the email format to HTML. If that works, starts  
pulling the content from the database. If that works, then you're  
done. =)

Yuk. You are asking him to write extra speculative code when he's
having extreme difficulty debugging the code he's already got! He's
already said he's getting ?????? soon after the database retrieval ---
you want him to work on the downstream problem when the upstream is
still very muddy???

Sheeesh.
 
P

Philip Semanchuk

Yuk. You are asking him to write extra speculative code when he's
having extreme difficulty debugging the code he's already got! He's
already said he's getting ?????? soon after the database retrieval ---
you want him to work on the downstream problem when the upstream is
still very muddy???

First of all, I preceded that paragraph with a detailed example of how
to verify that he's getting what he expects out of the database. So
no, I am not asking the OP to write extra speculative code. I'm giving
him another tool with which to work at his problem.

He claims to have done what I asked him to do in the first place --
break the problem into steps and verify the database steps. He says
they're working OK. I chose to take him at his word.

If he's right, then we can move on to the next step of troubleshooting
the email. If he's wrong and the problem is indeed with the database
code, then we'll eventually discover that and he'll have learned a
valuable lesson. It will be time-consuming and therefore painful for
him, but then he'll be more likely to remember it.

There's more than one way to attack this problem/set of problems, yes?

This is all kind of OT since it is about general debugging and not
about Python. The only Python-specific aspect I see is that debugging
non-ASCII problems with print is a little tricky since it introduces
yet another variable -- the terminal's encoding settings. If, for
instance, the OP's terminal is set to ISO 8859-6 or some such (I don't
know anything about encodings to handle Arabic) and he's feeding it
UTF-8, then ??????? might indeed be the result.
 
J

John Machin

First of all, I preceded that paragraph with a detailed example of how  
to verify that he's getting what he expects out of the database. So  
no, I am not asking the OP to write extra speculative code. I'm giving  
him another tool with which to work at his problem.

He claims to have done what I asked him to do in the first place --  
break the problem into steps and verify the database steps. He says  
they're working OK. I chose to take him at his word.

Rule number 1: Don't believe anything an OP says that is not
corroborated by output that looks like it was produced using the repr
() function (2.x) or ascii() function (3.x)

Rule number 2: Don't ignore rule number 1, especially when not
corroborated by any output at all.

Rule number 3: [added since the Great Renaming aka the Mad Hatter's
Tea Party] Ask the OP what version of Python they are using so that
they can be told to use ascii() instead of repr() if using 3.X
If he's right, then we can move on to the next step of troubleshooting  
the email. If he's wrong and the problem is indeed with the database  
code, then we'll eventually discover that

He has *already* demonstrated, at my request, that there is a problem
with, or soon after, the database extraction:

"""
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
"""
and he'll have learned a  
valuable lesson. It will be time-consuming and therefore painful for  
him, but then he'll be more likely to remember it.

There's more than one way to attack this problem/set of problems, yes?

This is all kind of OT since it is about general debugging and not  
about Python. The only Python-specific aspect I see is that debugging  
non-ASCII problems with print is a little tricky since it introduces  
yet another variable -- the terminal's encoding settings. If, for  
instance, the OP's terminal is set to ISO 8859-6 or some such (I don't  
know anything about encodings to handle Arabic) and he's feeding it  
UTF-8, then ??????? might indeed be the result.

and that is the rationale for Rule #1
 
P

Philip Semanchuk

Rule number 1: Don't believe anything an OP says that is not
corroborated by output that looks like it was produced using the repr
() function (2.x) or ascii() function (3.x)

Saying "I don't believe you" has never worked well for me as a
conversation opener. Sometimes taking someone at his word is another
name for giving him enough rope to...make a mistake that he'll remember.

And for many people, trust breeds trust. I trust him, maybe he'll
trust me when I say (for the second time), "You need to break this
problem down into discrete, debuggable units."

I (mostly) agree with your rule. But as I said, there's more than one
way to solve this problem. Or perhaps I should say that there's more
than one way to lead the OP to a solution to this problem. We teach
differently, you and I. I believe there's room in the world for *both*
styles -- perhaps even a third or fourth! =)


Cheers
Philip
 
H

Hussein B

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Tables are created with UTF-8 encoding option
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.

Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?

We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.

So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.

This is how I retrieve the data:

db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()

Thanks all for your nice help.
 
H

Hussein B

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Are they stored consistently (i.e. all using the same encoding, not  
some using utf-8 and others using iso-8859-1)?
Yes.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
What are you getting out of the database? Is it being converted to  
Unicode correctly, or at all?
I don't know, how to make sure of this point?
You could show us some of the output from the database query. As well
as
   print the_output
you should
   print repr(the_output)
and show us both, and also tell us what you *expect* to see.
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?
We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.
So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.

This is how I retrieve the data:

db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
                         passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()

Thanks all for your nice help.

Hey,
I added use_unicode and charset keyword params to the connect() method
and I got the following:
u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
\u0634\u0647\u0631'
So characters are getting converted successfully.
Well, using the previous recipe for sending the mail:
http://code.activestate.com/recipes/473810/
I got the following error:

Traceback (most recent call last):
File "HtmlMail.py", line 52, in <module>
s.sendmail(sender, receiver , msg.as_string())
File "/usr/lib/python2.5/email/message.py", line 131, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/usr/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/usr/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/usr/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/usr/lib/python2.5/email/generator.py", line 201, in
_handle_multipart
g.flatten(part, unixfrom=False)
File "/usr/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/usr/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/usr/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/usr/lib/python2.5/email/generator.py", line 201, in
_handle_multipart
g.flatten(part, unixfrom=False)
File "/usr/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/usr/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/usr/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/usr/lib/python2.5/email/generator.py", line 178, in
_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)


Again, any ideas guys? :)
Thanks to you all, you rocks !
 
J

John Machin

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Are they stored consistently (i.e. all using the same encoding, not  
some using utf-8 and others using iso-8859-1)?
Yes.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
What are you getting out of the database? Is it being converted to  
Unicode correctly, or at all?
I don't know, how to make sure of this point?
You could show us some of the output from the database query. As well
as
   print the_output
you should
   print repr(the_output)
and show us both, and also tell us what you *expect* to see.
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?
We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.
So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.
This is how I retrieve the data:
db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
                         passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()
Thanks all for your nice help.

Hey,
I added use_unicode and charset keyword params to the connect() method

Hey, that was a brilliant idea -- I was just about to ask you to try
use_unicode=True, charset="utf8" ... what were the actual values that
you used?

Let's suppose that you used charset="XXXX" ... as far as I can tell,
not being a mysqldb user myself, this means that your data tables and/
or your default connection don't use XXXX as an encoding. If so, this
might be an issue you might like to take up with whoever created the
database that you are using.
and I got the following:
u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
\u0634\u0647\u0631'
So characters are getting converted successfully.

I guess so -- U+06nn sure are Arabic characters :)

However as suggested above, "converted from what?" might be worth
pursuing if you like to understand what is going on instead of just
applying magic recipes ;-)

Well, using the previous recipe for sending the mail:http://code.activestate.com/recipes/473810/
I got the following error:

Traceback (most recent call last):
  File "HtmlMail.py", line 52, in <module>
    s.sendmail(sender, receiver , msg.as_string())

[big snip]
_handle_text
    self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)

Again, any ideas guys? :)

That recipe appears to have been written by an ascii bigot for ascii
bigots :-(

Try reading the docs for email.charset (that's the charset module in
the email package).

Cheers,
John
 
H

Hussein B

On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
Hey,
I'm retrieving records from MySQL database that contains non english
characters.
Can you reveal which language???
Arabic
Then I create a String that contains HTML markup and column values
from the previous result set.
+++++
markup = u'''<table>.....'''
for row in rows:
    markup = markup + '<tr><td>' + row['id']
markup = markup + '</table>
+++++
Then I'm sending the email according to this tip:
http://code.activestate.com/recipes/473810/
Well, the email contains ????? characters for each non english ones.
Any ideas?
There's so many places where this could go wrong and you haven't  
narrowed down the problem.
Are the characters stored in the database correctly?
Yes they are.
How do you KNOW that they are stored correctly? What makes you so
sure?
Because MySQL Query Browser displays them correctly, in addition I use
BIRT as the reporting system and it shows them correctly.
Are they stored consistently (i.e. all using the same encoding, not  
some using utf-8 and others using iso-8859-1)?
Yes.
So what is the encoding used to store them?
Tables are created with UTF-8 encoding option
What are you getting out of the database? Is it being converted to  
Unicode correctly, or at all?
I don't know, how to make sure of this point?
You could show us some of the output from the database query. As well
as
   print the_output
you should
   print repr(the_output)
and show us both, and also tell us what you *expect* to see.
The result of print repr(row['name']) is '??? ??????'
The '?' characters are supposed to be Arabic characters.
Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?
We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.
So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.
This is how I retrieve the data:
db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
                         passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()
Thanks all for your nice help.
Hey,
I added use_unicode and charset keyword params to the connect() method

Hey, that was a brilliant idea -- I was just about to ask you to try
 use_unicode=True, charset="utf8" ... what were the actual values that
you used?

I didn't supply values for them the first times.
Let's suppose that you used charset="XXXX" ... as far as I can tell,
not being a mysqldb user myself, this means that your data tables and/
or your default connection don't use XXXX as an encoding. If so, this
might be an issue you might like to take up with whoever created the
database that you are using.
and I got the following:
u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
\u0634\u0647\u0631'
So characters are getting converted successfully.

I guess so -- U+06nn sure are Arabic characters :)

However as suggested above, "converted from what?" might be worth
pursuing if you like to understand what is going on instead of just
applying magic recipes ;-)
Well, using the previous recipe for sending the mail:http://code.activestate.com/recipes/473810/
I got the following error:
Traceback (most recent call last):
  File "HtmlMail.py", line 52, in <module>
    s.sendmail(sender, receiver , msg.as_string())

[big snip]
_handle_text
    self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)
Again, any ideas guys? :)

That recipe appears to have been written by an ascii bigot for ascii
bigots :-(

Try reading the docs for email.charset (that's the charset module in
the email package).

Every thing is working now, I did the following:
t = MIMEText(markup.encode('utf-8'), 'html', 'utf-8')
Cheers,
John

Thank you all guys and especially you John, I owe you a HUGE bottle of
beer :D
 
J

John Machin

I didn't supply values for them the first times.

I guessed that! I was referring to the fact that you didn't tell us
what values you did eventually supply that made it generate seemingly
reasonable Arabic letters in unicode!! Was it charset="utf8" that did
the trick?
Let's suppose that you used charset="XXXX" ... as far as I can tell,
not being a mysqldb user myself, this means that your data tables and/
or your default connection don't use XXXX as an encoding. If so, this
might be an issue you might like to take up with whoever created the
database that you are using.
I guess so -- U+06nn sure are Arabic characters :)
However as suggested above, "converted from what?" might be worth
pursuing if you like to understand what is going on instead of just
applying magic recipes ;-)
[big snip]
_handle_text
    self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
115-118: ordinal not in range(128)
Again, any ideas guys? :)
That recipe appears to have been written by an ascii bigot for ascii
bigots :-(
Try reading the docs for email.charset (that's the charset module in
the email package).

Every thing is working now, I did the following:
t = MIMEText(markup.encode('utf-8'), 'html', 'utf-8')
Thank you all guys and especially you John, I owe you a HUGE bottle of
beer :D

Thanks for the kind thought, but beer decreases grey-cell count and
increases girth ... I don't need any assistance with those matters :)

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top