trouble getting google through urllib

D

Dr. Locke Z2A

So I'm writing a bot in python that will be able to do all kinds of
weird shit. One of those weird shit is the ability to translate text
from one language to another, which I figured I'd use google translate
to do. Here is the section for translation that I'm having trouble
with:

elif(line[abuindex+1]=="translate"): #if user inputs
translate
text=""
for i in range(abuindex+2, len(line)): #concantenate all
text to be translated
text=text+"%20"+line

t_url="http://translate.google.com/translate_t?text='"+text+"'&hl=en&langpair=es|en&tbb=1"
print "url: %s" % t_url #debug msg
urlfi=urllib.urlopen(t_url) #make a file object from what
google sends
t_html=urlfi.read( ) #read from urlfi file
print "html: %s" % t_html #debug msg
print "text: %s" % text #debug msg

This uses urllib to open the url and abuindex+2 is the first word in
the string to be translated and line is an array of the message sent to
the bot from the server. After this I'll add something to parse through
the html and take out the part that is the translated text. The problem
is that when I run this the html output is the following (I asked it to
translate como estas here):

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
<style><!--
body {font-family: arial,sans-serif}
div.nav {margin-top: 1ex}
div.nav A {font-size: 10pt; font-family: arial,sans-serif}
span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight:
bold}
div.nav A,span.big {font-size: 12pt; color: #0000cc}
div.nav A {font-size: 10pt; color: black}
A.l:link {color: #6f6f6f}
A.u:link {color: green}
//--></style>
<script><!--
var rc=403;
//-->
</script>
</head>
<body text=#000000 bgcolor=#ffffff>
<table border=0 cellpadding=2 cellspacing=0 width=100%><tr><td
rowspan=3 width=1% nowrap>
<b><font face=times color=#0039b6 size=10>G</font><font face=times
color=#c41200 size=10>o</font><font face=times color=#f3c518
size=10>o</font><font face=times color=#0039b6 size=10>g</font><font
face=times color=#30a72f size=10>l</font><font face=times color=#c41200
size=10>e</font>&nbsp;&nbsp;</b>
<td>&nbsp;</td></tr>
<tr><td bgcolor=#3366cc><font face=arial,sans-serif
color=#ffffff><b>Error</b></td></tr>
<tr><td>&nbsp;</td></tr></table>
<blockquote>
<H1>Forbidden</H1>
Your client does not have permission to get URL
<code>/translate_t?text='%20como%20estas'&amp;hl=en&amp;langpair=es%7Cen&amp;tbb=1</code>
from this server.

<p>
</blockquote>
<table width=100% cellpadding=0 cellspacing=0><tr><td
bgcolor=#3366cc><img alt="" width=1 height=4></td></tr></table>
</body></html>

Does anyone know how I would get the bot to have permission to get the
url? When I put the url in on firefox it works fine. I noticed that in
the output html that google gave me it replaced some of the characters
in the url with different stuff like the "&amp" and "%7C", so I'm
thinking thats the problem, does anyone know how I would make it keep
the url as I intended it to be?
 
W

Will McGugan

Dr. Locke Z2A said:
Does anyone know how I would get the bot to have permission to get the
url? When I put the url in on firefox it works fine. I noticed that in
the output html that google gave me it replaced some of the characters
in the url with different stuff like the "&amp" and "%7C", so I'm
thinking thats the problem, does anyone know how I would make it keep
the url as I intended it to be?

Google doesnt like Python scripts. You will need to pretend to be a
browser by setting the user-agent string in the HTTP header.

Will McGugan
 
D

Duncan Booth

Will McGugan said:
Google doesnt like Python scripts. You will need to pretend to be a
browser by setting the user-agent string in the HTTP header.
and possibly also run the risk of having your system blocked by Google if
they figure out you are lying to them?
 
F

Fredrik Lundh

Dr. Locke Z2A said:
<H1>Forbidden</H1>
Your client does not have permission to get URL
<code>/translate_t?text='%20como%20estas'&amp;hl=en&amp;langpair=es%7Cen&amp;tbb=1</code>
from this server.
Does anyone know how I would get the bot to have permission to get the
url?

http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's
system without express permission in advance from Google."

official API:s are available here:

http://code.google.com/

</F>
 
W

Will McGugan

Duncan said:
and possibly also run the risk of having your system blocked by Google if
they figure out you are lying to them?

It is possible. I wrote a 'googlewhack' (remember them?) script a while
ago, which pretty much downloaded as many google pages as my adsl could
handle. And they didn't punish me for it. Although apparently they do
issue short term bans on IP's that abuse their service.

It is best to play nice of course. I would recommend using their
official APIs if possible!


Will McGugan
 
D

Dr. Locke Z2A

I looked at those APIs and it would appear that SOAP isn't around
anymore and there are no APIs for google translate :( Can anyone tell
me how to set the user-agent string in the HTTP header?
 
A

Amit Khemka

I looked at those APIs and it would appear that SOAP isn't around
anymore and there are no APIs for google translate :( Can anyone tell
me how to set the user-agent string in the HTTP header?

import urllib2
req = urllib2.Request('http://www.google.com')
# add 'some' user agent header
req.add_header('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7.8) Gecko/20050524 Fedora/1.5 Firefox/1.5')
up = urllib2.urlopen(req)

cheers,
amit
--
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

Google doesnt like Python scripts. You will need to pretend to be a
It is possible. I wrote a 'googlewhack' (remember them?) script a while
ago, which pretty much downloaded as many google pages as my adsl could
handle. And they didn't punish me for it. Although apparently they do
issue short term bans on IP's that abuse their service.

For Google, that load must be piss in the ocean. I bet for Google to
even notice the abuse, it must be something really, really severe.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top