Extracting html source from a web page...

K

Konrad Rotuski

if you want to see HTML source code 'manually' i'd recommend attaching to IE process using VS.NET

as for getting HTML source via code i think you should look at the IObjectWithSite interface and related ones .. have a look at : http://weblogs.asp.net/stevencohn/articles/60948.aspx for more information

HTH

Konrad
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
 
J

Joerg Jooss

I am trying to get at the source of a web page. Looking at the
innerHTML element is only part of the story. In IE, right-clicking
on various different parts of the page gives me different results
when I click on view_source.

Because you're looking at multiple sources...
The source I need is contained inside IFRAME tags (which contain
references to jsp pages)... The html content isn't available when I
look at the innerHTML of the document returned in the
DocumentComplete event of the WebBrowser control. My question is
basically, how do I get the html generated by the jsp page in the
IFRAME?

Simply download the contents referenced by the IFRAME's SRC attribute using
Systetm.Net.WebClient or System.Net.WebRequest.
Better yet, how do I get the complete html as it is rendered
by IE?

There's no such thing. You're basically looking at two distinct HTML
documents at the same time.

Cheers,
 
R

Raed Sawalha

I did that with one of my pages ,like this

Code:
//We check the extension of file if it is HTML or NOT

//lets say we have this string containg the file name 

string html = "http://localhost/Project/test.htm";

if(html.EndsWith(".htm") || html.EndsWith(".html"))

{ 

//Remove white spaces

html = html.Trim();

//Construct string builder object 

StringBuilder sBuilder = new StringBuilder(); 

string temp="";

try

{

//Request 

System.Net.HttpWebRequest webrequest = (HttpWebRequest)System.Net.WebRequest.Create(html);

//Get 

System.Net.HttpWebResponse webresponse=(HttpWebResponse)webrequest.GetResponse();

//Read the content of HTML file

StreamReader webstream = new StreamReader(webresponse.GetResponseStream(),Encoding.Default);

//Loop until End-Of-File

while((temp=webstream.ReadLine())!= null)

{

sBuilder.Append(temp + "\n\r");

}

//Save the content in temporary variable

string HtmlContent = sBuilder.ToString();



hope that what u need?



Regards



I am trying to get at the source of a web page.  Looking at the innerHTML element is only part of the story.  In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.  

The source I need is contained inside IFRAME tags (which contain references to jsp pages)...  The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control.  My question is basically, how do I get the html generated by the jsp page in the IFRAME?  Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
 
N

news

I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
 
Joined
Feb 15, 2009
Messages
1
Reaction score
0
biterScripting cat command will show you the html source for any web page

If you want to view/see, store, edit, manipulate html source of a web page, just use the cat command in biterscripting as follows.

cat "http : / / www . somesite . com / somepage . html"

Above will show the html source on screen.

cat "http : / / www . somesite . com / somepage . html" > file1.html

Above will copy the html source to the local file file1.html.

You can then use any of the stream editor commands (char/word/line/string append, insert, extract, alter, etc.) to extract just the portion you need from the html source or even automatically edit the html source.

If you don't have biterscripting yet, you can download it free from their website biterscripting.com .

Cheers.
Patrick.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top