internet searching program

K

KillSwitch

Is it possible to make a program to search a site on the internet,
then get certain information from the web pages that match and display
them? Like, you would put in keywords to be searched for on
youtube.com, then it would search youtube.com, get the names of the
videos, the links, and the embed information? Or something like that.
 
S

Steven D'Aprano

Is it possible to make a program to search a site on the internet, then
get certain information from the web pages that match and display them?
Like, you would put in keywords to be searched for on youtube.com, then
it would search youtube.com, get the names of the videos, the links, and
the embed information? Or something like that.

Search the Internet? Hmmm... I'm not sure, but I think Google does
something quite like that, but I don't know if they do it with a computer
program or an army of trained monkeys.
 
K

KillSwitch

No, I mean to search the internet really fast and display only REALLY
SPECIFIC information about certain web pages. Like, stuff Google
wouldn't have. For instance, in the youtube example, it would name the
names of the videos, the url's of them, and the EMBED information
without you having to view the site. And BTW, I am extremely new to
Python. I am going to buy a few books soon, and I can really start my
learning then. And I'll look up some tut's on the 'net in the
meantime.
 
M

Michael Tobis

I think you are talking about "screen scraping".

Your program can get the html for the page, and search for an
appropriate pattern.

Look at the source for a YouTube view page and you will see a string

var embedUrl = 'http://....

You can write code to search for that in the html text.

But embedUrl is not a standard javascript item; it is part of the
YouTube design. You will be relying on that, and if YouTube changes
how they provide such a string, you will be out of luck; at best you
will have to rewrite part of your code.

In other words, screen scraping relies on the site not doing a major
reworking and on you doing a little reverse engineering of the page
source.

So how to get the html into your program? In python the answer to that
is urllib.

http://docs.python.org/lib/module-urllib.html

mt
 
A

alex23

Is it possible to make a program to search a site on the internet,
then get certain information from the web pages that match and display
them? Like, you would put in keywords to be searched for on
youtube.com, then it would search youtube.com, get the names of the
videos, the links, and the embed information? Or something like that.

You might find the mechanize module handy:
http://wwwsearch.sourceforge.net/mechanize/

And possibly BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/
 
G

greg

Michael said:
I think you are talking about "screen scraping".

Your program can get the html for the page, and search for an
appropriate pattern.

However, it wouldn't be "really fast", because you
still have to fetch all the pages that might contain
data you're looking for.

Google searches are fast because they've already
fetched all the web pages in the world and indexed
them.

You might get somewhere using a program that does
a site-specific google search to find potentially
relevant pages, then goes and looks at those pages
for further information.

Another possibility might be to crawl the site and
build your own index based on the information you're
interested in.
 
K

KillSwitch

Thanks a lot for all of everyones help, I am really looking forward to
learning the ins and ous of python or my first programming language.
 
B

BAnderton

I was doing something very similar on my windows XP machine a year ago
(with python 2.4) and used Mayukh Bose's Internet Explorer controller
(see http://www.mayukhbose.com/python/IEC/index.php for details/
download). It worked very nicely for my needs and was rather
intuitive (generally much easier and required fewer brain cells than
using urllib) ... here's some clips from the project:

# this window will be for our initial data pull-in
ie = IEController()
ie.Navigate('http://<whateverYourSiteNameIs>')
ie.ClickButton(caption='Advanced')
...
ie.SetInputValue('search_string',strUserID)
ie.ClickButton(name='image')
...
strAllText = ie.GetDocumentText() # gets all html source code
from current page.
...
ie.CloseWindow()

I wish someone could make a similar one for Firefox.
 
S

Support Desk

Google does'nt allow use of their API's anymore, I belive Yahoo has one or
you could do something like below.

searchstring = 'stuff here'

x = os.popen('lynx -dump http://www.google.com/search?q=%s' %
searchstring).readlines()


-----Original Message-----
From: Steven D'Aprano [mailto:[email protected]]
Sent: Friday, August 08, 2008 11:22 PM
To: (e-mail address removed)
Subject: Re: internet searching program

Is it possible to make a program to search a site on the internet, then
get certain information from the web pages that match and display them?
Like, you would put in keywords to be searched for on youtube.com, then
it would search youtube.com, get the names of the videos, the links, and
the embed information? Or something like that.

Search the Internet? Hmmm... I'm not sure, but I think Google does
something quite like that, but I don't know if they do it with a computer
program or an army of trained monkeys.
 
A

alex23

Google does'nt allow use of their API's anymore, I belive Yahoo has one

Are you sure?

"Google Custom Search enables you to search over a website or a
collection of websites. You can harness the power of Google to create
a search engine tailored to your needs and interests, and you can
present the results in your website. Your custom search engine can
prioritize or restrict search results based on websites you specify."

http://code.google.com/apis/customsearch/
 
M

maoxw

S

Support Desk

Yes, I believe the custom search allows you to embed a google search into your website and customize it, but they no longer allow you to use a script to access search results unless you go about it in a roundabout way

http://googlesystem.blogspot.com/2006/12/googles-soap-search-api-no-longer.html


-----Original Message-----
From: (e-mail address removed) [mailto:[email protected]]
Sent: Tuesday, August 12, 2008 3:09 AM
To: (e-mail address removed)
Subject: Re: internet searching program

Are you sure?

"Google Custom Search enables you to search over a website or a
collection of websites. You can harness the power of Google to create
a search engine tailored to your needs and interests, and you can
present the results in your website. Your custom search engine can
prioritize or restrict search results based on websites you specify."

http://code.google.com/apis/customsearch/

http://www.muffler-silencer.com

V-4 type Series Muffler

B type Series Muffler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top