Using WebRequest to get the rendered HTML of protected page, returns login page


S

Stephen Miller

I have an ASPX report and I want to capture the rendered HTML and
write to a file on the webserver. Several posts suggest using
WebRequest to make a second call to the page, and screen-scrape the
resulting HTML. The technique typically described is:

'-- Get the current URL and request page
Dim url As String =
System.Web.HttpContext.Current.Request.Url.AbsoluteUri
Dim req As System.Net.WebRequest = System.Net.WebRequest.Create(url)

Dim result As System.Net.WebResponse = req.GetResponse()
Dim ReceiveStream As Stream = result.GetResponseStream()

Dim read() As Byte = New Byte(512) {}
Dim bytes As Integer = ReceiveStream.Read(read, 0, 512)

'-- Read contents and append to StringBuilder
Dim sbPage As New System.Text.StringBuilder()
While (bytes > 0)
Dim encode As System.Text.Encoding =
System.Text.Encoding.GetEncoding("utf-8")
sbPage.Append(encode.GetString(read, 0, bytes))
bytes = ReceiveStream.Read(read, 0, 512)
End While

My problem is that
Firstly, doesn't this necessitate a second round trip to the server
adding performance overheads?
Secondly, my report is password protected (authentication mode is
Forms) and this technique redirects to the designated login form.

Is there another way to get a string representation of the rendered
HTML? I have been fooling around with the OutputStream without any
luck.


As a side note, writing the HTML to file is part of a dodgy workaround
that shells to a DOS program and converts the resulting HTML to PDF
format, prior to flushing the current response and sending the PDF
instead. I have looked at dozens of commercial products but haven't
found one that can convert the rendered ASPX page to PDF on the fly
(allowing me to provide all report layout in ASPX mark-up). Is anyone
aware of a commercial product that can resultant do this?

I know SQL Server 2000 Reporting Services has just become available,
but I don't have VS2003.

Regards,

Stephen
 
Ad

Advertisements

R

Rick Strahl [MVP]

Hi Stephen,

Is the Report and ASPX page in the same application? If so you might want to
look into just calling Server.Execute() to execute the page which allows you
to run the page and pass in your own HTMLTextWriter() and then retrieve the
result.

Something like this:

public static string AspTextMerge(string TemplatePageAndQueryString,ref
string ErrorMessage)
{
string MergedText = "";

// *** Save the current request information
HttpContext Context = HttpContext.Current;

// *** Fix up the path to point at the templates directory
TemplatePageAndQueryString = Context.Request.ApplicationPath +
"/templates/" + TemplatePageAndQueryString;

// *** Now call the other page and load into StringWriter
StringWriter sw = new StringWriter();
try
{
Context.Server.Execute(TemplatePageAndQueryString,sw);
MergedText = sw.ToString();
}
catch
{
MergedText = null;
}

return MergedText;
}

FWIW, using an HTTP request is not much slower in this situation - the above
code also requires a fair amount of overhead as ASP.Net has to perform some
fixup to 'fake' this request through Execute. I've used HTTP in a number of
situations with good results - your only concern will be not tying up the
ASP.Net thread for too long waiting for the report to finish - if that's the
case you may have to do this asynchronously...

+++ Rick ---

--

Rick Strahl
West Wind Technologies
http://www.west-wind.com/
http://www.west-wind.com/weblog/
 
S

Stephen robinson

Hi Stephen,

I have been looking to do exactly the same as you now for about 2 weeks.
What I have found is that although there are lots of comercial products
out there none really do screen scraping. I thik th ework around is as
follows. If you create a response filter you can take a copy of the
output buffer and write it to a file. Mark this file to sit in a
virtual directory (so you get the stylesheet). Then using .net pass the
HTML file into a 3rd party product such as ABCPDF or HTMLDraw (Image)
check out www.webgoo.com for these products - image products are much
cheaper than PDF ones. I have the first part working (Copy of the file)
but I now need to strip out the javascript. Then that should be it. If
you drop me a line on my email I will dive you more details. One person
mentioned that in version 2.0 of .net you can create dynamic images
which seeing as we already have the output stream my be the exact
solution.

I hope this helps

Steve
 
Joined
Apr 19, 2007
Messages
1
Reaction score
0
I have a similar problem.My page takes lot of time to process so making the second time call to generate the pdf is adding lot of overhead.I am using a third party tool which generates the pdf given the parameter as the url of the page,it can also take the html stream to generate the pdf.How do I generate the htmlstream of the current web request and pass it to the tool.I tried caching too but having problems with the memory issues.Any help is appreciated.

Stephen Miller said:
I have an ASPX report and I want to capture the rendered HTML and
write to a file on the webserver. Several posts suggest using
WebRequest to make a second call to the page, and screen-scrape the
resulting HTML. The technique typically described is:

'-- Get the current URL and request page
Dim url As String =
System.Web.HttpContext.Current.Request.Url.AbsoluteUri
Dim req As System.Net.WebRequest = System.Net.WebRequest.Create(url)

Dim result As System.Net.WebResponse = req.GetResponse()
Dim ReceiveStream As Stream = result.GetResponseStream()

Dim read() As Byte = New Byte(512) {}
Dim bytes As Integer = ReceiveStream.Read(read, 0, 512)

'-- Read contents and append to StringBuilder
Dim sbPage As New System.Text.StringBuilder()
While (bytes > 0)
Dim encode As System.Text.Encoding =
System.Text.Encoding.GetEncoding("utf-8")
sbPage.Append(encode.GetString(read, 0, bytes))
bytes = ReceiveStream.Read(read, 0, 512)
End While

My problem is that
Firstly, doesn't this necessitate a second round trip to the server
adding performance overheads?
Secondly, my report is password protected (authentication mode is
Forms) and this technique redirects to the designated login form.

Is there another way to get a string representation of the rendered
HTML? I have been fooling around with the OutputStream without any
luck.


As a side note, writing the HTML to file is part of a dodgy workaround
that shells to a DOS program and converts the resulting HTML to PDF
format, prior to flushing the current response and sending the PDF
instead. I have looked at dozens of commercial products but haven't
found one that can convert the rendered ASPX page to PDF on the fly
(allowing me to provide all report layout in ASPX mark-up). Is anyone
aware of a commercial product that can resultant do this?

I know SQL Server 2000 Reporting Services has just become available,
but I don't have VS2003.

Regards,

Stephen
 
Joined
Apr 16, 2008
Messages
1
Reaction score
0
Don't use WebRequest to render your own HTML. Just override the render sub routine

To output the HTML from your page with out calling it a second time you just need to override the render subroutine. The example function below is called when the page is rendered and should be added to your page's code behind file.

Protected Overloads Overrides Sub Render(ByVal writer As HtmlTextWriter)
Dim sbOut As New StringBuilder()
Dim swOut As New IO.StringWriter(sbOut)
Dim htwOut As New HtmlTextWriter(swOut)
MyBase.Render(htwOut)
Dim sOut As String = sbOut.ToString()
writer.Write(sOut)
End Sub

The 'sOut' variable is your HTML string you will need to write to a file and save and if you have any javascript in it you should probably run through a few regular expressions to strip them out before writing to file. Hope this helps. Good luck.
 
Ad

Advertisements

Joined
Sep 17, 2009
Messages
3
Reaction score
0
Exporting exact page to the "html" and then to "pdf" - struck with a problem

Hi,
I am struck with a similar problem. I have a web page that consists of many gridviews one inside other and can be drilled down by the user (AJAX panel refresh).
I am using the below function to save the page as HTML. The problem with below code is - it uses System.Web.HttpContext.Current.Request.Url and renders the page ; Hence the drilldowns of grids are not saved in the same state in the html. [example - I have 5 items on the grid. I click 2nd item to see the "full details" which drills down the row and displays another grid below. But the code below takes Request.url and renders and saves. So, my drilldowns won't be reflected in the HTML. Please help me solve the problem.


private void SaveWebPage_as_HTML()
{

// Initialize the WebRequest.
string urlToConvert = (System.Web.HttpContext.Current.Request.Url).ToString();
WebRequest myRequest = WebRequest.Create(urlToConvert);
// Return the response.
WebResponse myResponse = myRequest.GetResponse();

// Obtain a 'Stream' object associated with the response object.
Stream ReceiveStream = myResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");

// Pipe the stream to a higher level stream reader with the required encoding format.

StreamReader readStream = new StreamReader(ReceiveStream, encode, true, 255);

// Read 256 charcters at a time.
Char[] read = new Char[256];
int count = readStream.Read(read, 0, 256);
using (StreamWriter sw = new StreamWriter("Invoice1.html"))
{
while (count > 0)
{
// Dump the 256 characters on a string and display the string onto the console.
String str = new String(read, 0, count);
sw.Write(str);
count = readStream.Read(read, 0, 256);
}

}

// Close the response to free resources.
myResponse.Close();
}


One way i thought was ... to tap the current page's response stream. Coded something like this...
// Pipe the stream to a higher level stream reader with the required encoding format.
Stream ReceiveStream = System.Web.HttpContext.Current.Response.OutputStream;
StreamReader readStream = new StreamReader(ReceiveStream, encode, true, 255);

But when i run this i am getting "Can not read stream" error when StreamReader is executed. Anyway to resolve this???. Am i doing correct thing, in 1st place? I appreciate any help...

Thanks - Rao
 
Last edited:
Ad

Advertisements


Top