Downloading full page source of a web page

H

helpmefinda

Hello everyone,

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
or would I need to use another connection method.

I looked through all the flags that you could set while setting all
the functions necessary to retrieve a file, but I did not find any
flags that would do what I wanted...
found here : http://msdn2.microsoft.com/en-us/library/aa385473.aspx

this is my current code for connecting (minus error checking):


hINet = InternetOpen("InetHTTP/1.0", INTERNET_OPEN_TYPE_PRECONFIG,
NULL, NULL, 0);

hConnection = InternetConnect( hINet,tempsite.c_str(),
INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);

hData = HttpOpenRequest(hConnection, "GET",
csite.site.substr(endposition,csite.site.length()).c_str(), NULL,
NULL, NULL, INTERNET_FLAG_KEEP_CONNECTION, 0);

httpSendRequestSucceeded = HttpSendRequest(hData, NULL, 0, 0, 0);

internetReadFileSucceeded = InternetReadFile(hData, (LPVOID)buffer,
(ULONG)(BUFFSIZE-1), &dwBytesRead);


Thanks a lot,
Rob
 
V

Victor Bazarov

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
[..]

Wrong newsgroup. Whatever "wininet" is, it's not part of C++ language
or the Standard library. You should consider asking in the newsgroup
for your platform or your compiler (if compiler contains that it its
package).

V
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Hello everyone,

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
or would I need to use another connection method.

It seems to me that your understanding of the HTTP protocol and of how
the internet works is a bit lacking. Try to read up on those things and
you'll find that your question will be answered.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top