Problem in Web scraping

K

karthiga

Is it possible to fetch dynamic content from two websites at the same
page.

eg: www.imdb.com from this site we can search the movies, fetch the
rating,content,director and from www.watchfreemovies.ch we can search
the same film name, fetch the download links, and versions
 
B

Bart Van der Donck

karthiga said:
Is it possible to fetch dynamic content from two websites at the same
page.
eg:www.imdb.comfrom this site we can search the movies, fetch the
rating,content,director and fromwww.watchfreemovies.chwe can search
the same film name, fetch the download links, and versions

Yes, this can be achieved by so-called content-grabbers (see e.g. PHP,
Perl, ...). I once wrote www.ajax-cross-domain.com which offers a
javascript solution, although the core mechanism is (must be) server-
side.

The real problem of your plan is that you're never sure that it will
continue to work. At any moment, the websites in question can alter
their HTML, potentially causing your script to fail every time. Large
websites such as IMDB usually tend to modify their pages quite often.

Legal issues are also involved. Unless explicit approval, you are
usually not allowed to systematically parse their data for your own
purposes.

Hope this helps,
 
S

Scott Sauyet

karthiga said:
Is it possible to fetch dynamic content from two websites at the same
page.

eg: www.imdb.comfrom this site we can search the movies, fetch the
rating,content,director and from www.watchfreemovies.ch we can search
the same film name, fetch the download links, and versions

In the browser this is generally not possible [1], unless the sites
cooperate by, for example, providing a JSONP API [2] or by supporting
CORS [3]. But it's very easy to write a server-side proxy that loads
the external sites as though they were local.

-- Scott

[1] http://en.wikipedia.org/wiki/Same_origin_policy
[2] http://en.wikipedia.org/wiki/JSONP
[3] http://en.wikipedia.org/wiki/Cross-Origin_Resource_Sharing
 
J

Jeremy J Starcher

karthiga said:
Is it possible to fetch dynamic content from two websites at the same
page.
In the browser this is generally not possible [1], unless the sites
cooperate by, for example, providing a JSONP API [2] or by supporting
CORS [3]. But it's very easy to write a server-side proxy that loads
the external sites as though they were local.

If this is for your own personal use (and not something you want to share/
distribute) then a UserScript[1] isn't limited by the same domain origin
limitation.

[1] I have a lot of experience doing similar things in IE and Firefox,
both using a plugin to get Userscript abilities. Some other browsers
(like Chrome) have some UserScript abilities built in, but YMMV.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top