Keeping whitespace in responseText, etc.

E

e271828

I'm trying to access the source of an HTML page with as few alterations
from the actual source (as in, that seen from the View Source option)
as I can. The method document.documentElement.innerHTML returns the
HTML source, but adds HEAD and other elements if they are absent from
the source, and takes out whitespace (i.e., line feeds, carriage
returns and tabs) within tags and between tags. The follow function:

function xhr() {

xhr = new XMLHttpRequest()
xhr.open("GET","test-page.html",true);
xhr.onreadystatechange = function() {
if (xhr.readyState==4) {
alert(xhr.responseText);
}
}
xhr.send(null)
}

doesn't add or alter any tags that are absent in the source, and does
not take out line feeds within tags; it does, however, still take out
all non-line-feed whitespace within tags and all whitespace in general
between tags.

It seems that preserving whitespace is all that I need, but I haven't
found a way to do that through my searches. So is there any way to get
the unaltered HTML source of a page without innerHTML or applets, like
a better version of the XMLHttpRequest object's responseText method?

Thanks,
Eric
 
M

Martin Honnen

e271828 wrote:

alert(xhr.responseText);

doesn't add or alter any tags that are absent in the source, and does
not take out line feeds within tags; it does, however, still take out
all non-line-feed whitespace within tags and all whitespace in general
between tags.

responseText gives you the text as the browser decodes it from the HTTP
response body. There might be issues with responseText with properly
decoding characters depending on the encoding of the response but I
don't think that the white space stripping occurs that you claim above.

I suspect rather that you use Mozilla respectively Firefox and that the
white space issue you notice is simply the somehow broken alert dialog
in Mozilla where lots of white space is collapsed and not rendered.
For example if you do e.g.
alert(['Line 1', ' Line 2', 'Line 3'].join('\r\n'))
with Mozilla then the alert dialog will not show the white space at the
beginning of Line 2 at all.

Is that Mozilla you are using? Then I think the issue you see is simply
alerting the responseText and not white space missing in responseText.

Or which browser do you have where you think that white space gets lost
when using responseText?
 
E

e271828

You were right, more or less. Unlike innerHTML, responseText doesn't
alter the HTML it gets; but when shown in an alert box it can seem like
responseText mangles whitespace . When you try
responseText.split("\t") or .split("\n") in a for loop for as many
results those methods return, however, you will see that the number of
the last alert plus 1 equals however many tabs or new lines you have in
your actual source (unlike innerHTML).


--


But now I've encountered another problem I haven't been able to find an
answer to: how do I get external URL's (say, http://www.google.com) to
open in the XMLHttpRequest, instead of just local files?


Martin said:
e271828 wrote:

alert(xhr.responseText);

doesn't add or alter any tags that are absent in the source, and does
not take out line feeds within tags; it does, however, still take out
all non-line-feed whitespace within tags and all whitespace in general
between tags.

responseText gives you the text as the browser decodes it from the HTTP
response body. There might be issues with responseText with properly
decoding characters depending on the encoding of the response but I
don't think that the white space stripping occurs that you claim above.

I suspect rather that you use Mozilla respectively Firefox and that the
white space issue you notice is simply the somehow broken alert dialog
in Mozilla where lots of white space is collapsed and not rendered.
For example if you do e.g.
alert(['Line 1', ' Line 2', 'Line 3'].join('\r\n'))
with Mozilla then the alert dialog will not show the white space at the
beginning of Line 2 at all.

Is that Mozilla you are using? Then I think the issue you see is simply
alerting the responseText and not white space missing in responseText.

Or which browser do you have where you think that white space gets lost
when using responseText?
 
M

Martin Honnen

e271828 wrote:

But now I've encountered another problem I haven't been able to find an
answer to: how do I get external URL's (say, http://www.google.com) to
open in the XMLHttpRequest, instead of just local files?

Inside the normal browser context the same origin policy applies for
request XMLHttpRequest makes thus you can only successfully make
requests back to the server your document with the script comes from.
So with client-side script you would need to have your own server side
script function as a proxy to fetch URLs from e.g. www.google.com.

Outside of the browser sandbox (e.g. on Windows in a HTA (HTML
application) or with a Windows Script Host script or an ASP page or with
the Mozilla browser if you write an extension) those restrictions do not
apply.

IE also has a zone model with different security zones which can be
configured separatedly where for the trusted zone for instance you could
change the settings to allow the request to other domains.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,220
Latest member
MathewSant

Latest Threads

Top