Claudio Grondi wrote:
(e-mail address removed) wrote:
Here is my script:
from mechanize import *
from BeautifulSoup import *
import StringIO
b = Browser()
f = b.open("
http://www.translate.ru/text.asp?lang=ru")
b.select_form(nr=0)
b["source"] = "hello python"
html = b.submit().get_data()
soup = BeautifulSoup(html)
print soup.find("span", id = "r_text").string
OUTPUT:
привет
питон
----------
In russian it looks like:
"привет питон"
How can I translate this using standard Python libraries??
--
Pak Andrei,
http://paxoblog.blogspot.com, icq://97449800
Translate to what and with what purpose?
Assuming your intention is to get a Python Unicode string, what about:
strHTML = 'привет
питон'
strUnicodeHexCode = strHTML.replace('&#','\u').replace(';','')
strUnicode = eval("u'%s'"%strUnicodeHexCode)
?
I am sure, there is a more elegant and direct solution, but just wanted
to provide here some quick response.
Claudio Grondi
Thank you, Claudio.
Really interest solution, but it doesn't work...
In [19]: strHTML = 'привет
питон'
In [20]: strUnicodeHexCode = strHTML.replace('&#','\u').replace(';','')
In [21]: strUnicode = eval("u'%s'"%strUnicodeHexCode)
In [22]: print strUnicode
---------------------------------------------------------------------------
exceptions.UnicodeEncodeError Traceback (most
recent call last)
C:\Documents and Settings\dron\<ipython console>
C:\usr\lib\encodings\cp866.py in encode(self, input, errors)
16 def encode(self,input,errors='strict'):
17
---> 18 return codecs.charmap_encode(input,errors,encoding_map)
19
20 def decode(self,input,errors='strict'):
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-5: character maps to <undefined>
In [23]: print strUnicode.encode("utf-8")
ÑВЗÑВИÑÐ’ÐÑБ┤ÑБ╖ÑÐ’Ð ÑВЗÑÐ’ÐÑÐ’Ð ÑВЖÑВЕ
<-- it's not my string "привет питон"
In [24]: strUnicode.encode("utf-8")
Out[24]:
'\xe1\x82\x87\xe1\x82\x88\xe1\x82\x80\xe1\x81\xb4\xe1\x81\xb7\xe1\x82\x90
\xe1\x82\x87\xe1\x82\x80\xe1\x82\x90\xe1\x82\x86\xe1\x82\
x85' <-- and too many chars