Makin search on the other site and getting data and writing in xml

A

altemurbugra

Hi
is it possible to make search on for example on google without api with
a list of words
1- there is word list
2- the script will take the words from the list by turn
3-it iwll make the search
4-will get results
5-will write the results as xml file.

i dont mean only google, for other sites aswell

I hope we get a result
 
F

Fredrik Lundh

is it possible to make search on for example on google without api with
a list of words
1- there is word list
2- the script will take the words from the list by turn
3-it iwll make the search
4-will get results
5-will write the results as xml file.

http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's system without express
permission in advance from Google."

</F>
 
A

Adam Jones

I dont mean only google, also other sites aswell

Google expressly forbids doing any form of automated search outside of
their api. If you want to write a script that will run Google searches,
you have to use the api to do so. As far as I know most of the other
search sites have the same requirement.

Yes, it is possible to query a bunch of search sites and dump the
results into an xml file. It is not even all that hard. In fact, I bet
running a search on the relevant terms will probably produce something
that almost does what you want.

-Adam
 
A

altemurbugra

Thank you very much for your explications. I dont mean a search engine.
for example a dictionary site for searching words.
 
A

altemurbugra

For example i give you an example about making search on one of the
site and get the result.

# #!/usr/bin/python
# # -*- coding: windows-1254; -*-
#
# import urllib
# dictionary = {} # wow, it's actually a dictionary
# words = ['apple', 'banana', 'cheese']
# for word in words:
# dictionary[word] =
urllib.urlopen("http://www.example.com/look.php?w=" + word).read()
#
# print dictionary

i dont know how i can get the words from a txt file for searching by
turn
 
S

Steven D'Aprano

http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's system without express
permission in advance from Google."

I'm not just being a pedantic weasel here, but what's an automated query?
Google's ToS is a legal document (maybe), and if both parties don't agree
on the meanings of terms, well, then it is a lousy legal document and a
recipe for trouble.

Google don't define "automated query"it, and I don't think they can. In
fact, the closest they come to defining it is to list three things they
want to prevent, NONE of which have anything to do with the distinction
between automated and non-automated.

(What on earth is "meta-searching"? If you're going to use terms which
don't have a commonly understood meaning, define what they mean.)

If I want to search for "foo", and I type "foo" into the Firefox search
box, is that an automated query?

What if I type "gg: foo" into Konqueror's address bar, which expands to
"http://www.google.com/search?q=foo"? Is it okay if I type the URL by hand
myself?

Can I use the browser to save the search page to a local HTML file? If
Google says no, how can they possibly hope to stop me?

What if I type this command into my shell?

elinks --dump "http://www.google.com/search?q=foo" > output.html

What if I type

wget "http://www.google.com/search?q=foo"

into the shell? Surely that's no more automated than typing "foo"
into Google's search box. (wget doesn't in fact work, as Google recognises
its user-agent string and blocks it, EVEN in cases where I am using wget
manually. What, can't Google themselves tell the difference between
automatic and non-automatic searching?)

Where is the line I must not cross?

The thing is, Google doesn't want people "reselling" their services, and I
respect Google's intention. But trying to draw a distinction between
"automated" and "non-automated" requests is difficult if not impossible,
as can be seen by the heavy-handed way Google blocks the manual use of
wget. I don't condone the gross abuse of Google's service, but I don't
think an artificial distinction between automated and non-automated is a
useful way to go about it.

Of course, what I think isn't important. If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless), they can. But the point is, I
see no ethical nor legal reason why a user can't create a script which is
called MANUALLY by the user and does what a browser does, namely send and
receive data from websites (which may or may not include Google).

And that, it seems to me, is what the Original Poster wanted.
 
S

Steve Holden

Steven said:
I'm not just being a pedantic weasel here, but what's an automated query?
Google's ToS is a legal document (maybe), and if both parties don't agree
on the meanings of terms, well, then it is a lousy legal document and a
recipe for trouble.

Google don't define "automated query"it, and I don't think they can. In
fact, the closest they come to defining it is to list three things they
want to prevent, NONE of which have anything to do with the distinction
between automated and non-automated.

The fact remains that Google can chop your searching ability off at the
knees if *they* determine that you have broken the terms of service, so
whether you agree or not becomes slightly academic.

regards
Steve
 
F

Fredrik Lundh

Steven said:
Google don't define "automated query"it, and I don't think they can.

the phrases they use are well understood in the SE business. that's
good enough for everyone involved (including courts; see below).
(What on earth is "meta-searching"? If you're going to use terms which
don't have a commonly understood meaning, define what they mean.)
http://en.wikipedia.org/wiki/Metasearch_engine

If I want to search for "foo", and I type "foo" into the Firefox search
box, is that an automated query?

nope. unless you're a robot.
What if I type "gg: foo" into Konqueror's address bar, which expands to
"http://www.google.com/search?q=foo"? Is it okay if I type the URL by hand
myself?

nope. unless you're a robot.
Can I use the browser to save the search page to a local HTML file? If
Google says no, how can they possibly hope to stop me?

what you do with the search results once you've gotten them is outside
the scope of that clause.
What if I type this command into my shell?

elinks --dump "http://www.google.com/search?q=foo" > output.html

What if I type

wget "http://www.google.com/search?q=foo"

into the shell? Surely that's no more automated than typing "foo"
into Google's search box.

neither is automated, unless you're a robot.
Where is the line I must not cross?

letting a program generate search requests based on something other than
"human wants to find something and types some keywords into a prompt
somewhere".
And that, it seems to me, is what the Original Poster wanted.

the OP wanted to read keywords from a text file generated in some
unknown fashion. that's bot behaviour, not human behaviour.
Of course, what I think isn't important. If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless)

well, "here's some random guy who didn't understand the terms used in
the contract" isn't a valid defense in court; courts are more interested
in whether people with experience from the relevant field can reasonably
be expected to understand the contract. but this isn't about court
cases, of course; it's about getting banned by Google for abusing their
services.

</F>
 
D

Diez B. Roggisch

GOOGLE IS NOT OUR SUBJECT ANY MORE.

MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
MY GOAL IS MAKING A SEARCH ON
www.onelook.com, for example


"""
Can you send me the list of words in the index? May I extract it from your
site?
No, sorry. If you're thinking about writing a script to systematically copy
OneLook.com's word list, please don't. It's not yours to copy, for one
thing. But also, it wastes tremendous bandwidth and slows things down for
other users. We have software in place to detect the abuse of our service
and we'll alert your ISP if you violate our trust in you. If you're looking
for a decent-sized downloadable word list, try WordNet, which offers that
and much more. If you're working on a project for school or academic
research, let us know and we might be able to help steer you in the right
direction.
"""

Consider this: if you'd offered the courtesy of a occasional lemonade for
you neighbours, does that mean that you like them stomping around in your
kitchen?

Nearly all of sites that offer a service like this will have policies of
that kind. So - get a grip, stop shouting, and start thinking if what you
are trying to do is legal or social. If not, and you don't care - be my
guest, but don't ask for help here!

Diez
 
F

Fredrik Lundh

GOOGLE IS NOT OUR SUBJECT ANY MORE.

MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
MY GOAL IS MAKING A SEARCH ON
www.onelook.com, for example

this is usenet; you don't "own" the threads you start. if there's a
subthread that you don't find relevant to your original question, just
ignore it.

</F>
 
A

altemurbugra

I dont mean google
i dont mean onelook.com

these are only examples

i hop eyou understand what i mean
 
G

George Sakkis

I dont mean google
i dont mean onelook.com

these are only examples

i hop eyou understand what i mean

Apparently, *you* don't understand what they're trying to tell you. It
roughly boils down to the following:

- All (except perhaps the most trivial small) sites disallow in their
Terms of Service the unregulated harvesting of their content by
webbots, both for legal and technical reasons. It's not just Google or
Onelook that does this.
- Yes, it is technically possible to attempt to violate their ToS,
running their risk to be caught (with whatever consequences this
implies).
- Yes, you *might* be able to get away with it (at least for some time)
running in stealth mode.
- No, people here are not willing to help you go down this road, you're
on your own.

Hope this helps,
George
 
L

Lawrence D'Oliveiro

If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless), they can.

What they define as their terms of service doesn't have to stand up in
court. They're not a public service, after all. If you do something that
they don't like, they are free to try to block you from their servers, they
don't need to appeal to any other authority.

wget --user-agent="I'm not Microsoft Internet Explorer, I'm Wget" -O - \
http://www.google.co.nz/search\?q=test
 
L

Lawrence D'Oliveiro

The fact remains that Google can chop your searching ability off at the
knees ...

No they can't. They can only chop off your ability to use Google.
 
B

Ben Finney

Steve Holden said:
Lawrence said:
No they can't. They can only chop off your ability to use Google.
[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

Seems like a fairly important distinction. Google has the power to
"chop your searching ability off at the knees" only to the extent that
you grant them that power.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top