Alternative to documentElement.innerHTML?

K

Kyle

I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

-Kyle
 
R

Randy Webb

Kyle said:
I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

Validate your (X)HTML and you will solve a lot of those problems. Along
with dropping tables for layout.

Read the group FAQ, it discusses how to read a text file (2 methods),
which is what you are trying to do.
 
P

PeEmm

Kyle skrev, On 1/25/2004 6:51 AM:
I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

-Kyle

The DOM naturally only functions as expected, if the HTML source is as
expected, i.e. is valid due to standards. The examples you give above
are malformed HTML, so the DOM tries to do something about the mishmash.
 
K

Kyle

Randy Webb said:
Validate your (X)HTML and you will solve a lot of those problems. Along
with dropping tables for layout.

This code is resident in a Mozilla extension, not a page that I've
written. It isn't my HTML that I need to parse so I have no control
over it's validity.
Read the group FAQ, it discusses how to read a text file (2 methods),
which is what you are trying to do.

I don't understand what you mean here. As far as I know, the "file"
does not exist anywhere in the filesystem so this is untrue. I assume
this content is somewhere in memory because "View Source" and Sherlock
plugins make use of the real source without accessing the page a 2nd
time.

Thanks for any input.

--Kyle
 
K

Kyle

PeEmm said:
Kyle skrev, On 1/25/2004 6:51 AM:

The DOM naturally only functions as expected, if the HTML source is as
expected, i.e. is valid due to standards. The examples you give above
are malformed HTML, so the DOM tries to do something about the mishmash.

I should have been more clear. This is a Mozilla Chrome extension, so
I assume that I should have access to the same methods that Mozilla
uses to display the source with "View Source" and retrieve the source
for parsing with Sherlock plugins. Thanks,

--Kyle
 
R

Randy Webb

Kyle said:
This code is resident in a Mozilla extension, not a page that I've
written. It isn't my HTML that I need to parse so I have no control
over it's validity.
Ok.



I don't understand what you mean here. As far as I know, the "file"
does not exist anywhere in the filesystem so this is untrue. I assume
this content is somewhere in memory because "View Source" and Sherlock
plugins make use of the real source without accessing the page a 2nd
time.

My response was in direct relation to the assumption (that is now
incorrect) that you were trying to read the HTML code of an HTML file,
and you wanted the original code, not the rendered code (they are
different).

If you load a page, and then do
javascript:alert(document.documentElement.innerHTML);
In the address bar, and then view the source of the page, on very very
few occasions will they be the same code.

Example:
When I open IE, it opens to about:blank. (actually, all of my browsers
are set to open to about:blank)
View>Source gives this code:
<HTML></HTML>
And thats it.
javascript:alert(document.documentElement.innerHTML);
alerts this:
<HEAD></HEAD>
<BODY></BODY>

In Mozilla, about:blank view>Source gives this code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head><title></title></head>
<body></body>
</html>

I line broke it for readability.

javascript:alert(document.documentElement.innerHTML);
gives this code:

<head><title></title></head><body></body>

Note the missing DTD and HTML tags.

In order to get the original, written code, of a webpage, into a
variable that the page's javascript can use, you have to read the file
from the server. And the only two ways I know of to do that is with an
HTTPRequestObject or a JAVA applet, hence my suggestion to consult the FAQ.

Whether any of that helps with you trying to read a Mozilla Skin plugin,
I don't know :(
 
L

Lasse Reichstein Nielsen

Randy Webb said:
If you load a page, and then do
javascript:alert(document.documentElement.innerHTML);
In the address bar, and then view the source of the page, on very very
few occasions will they be the same code.

Yes, browsers build the innerHTML structure from the current structure
of the document, whereas the view-source shows the original source code.
That means that innerHTML is "unparsing" the DOM tree structure, and
it would be surpricing if it gave exactly the same formatting as the
original source, even if the structure was the same.

....
In Mozilla, about:blank view>Source gives this code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html> ....

javascript:alert(document.documentElement.innerHTML);
gives this code:

<head><title></title></head><body></body>

Note the missing DTD and HTML tags.

Not surpricing since you ask for the *inner*HTML of the HTML element.
If Mozilla supported the "outerHTML" property, you could also show
the HTML tag. The document type element is even harder to find. It
is the first child of the document element (where the HTML element
is the second).

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top