ASP.NET and Internet Explorers DOM

K

Kevin Spencer

JavaScript? CSS?

What exactly are you asking here?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.
 
C

Craig

Well my friend is accessing the elements of an HTML page using the IE DOM
and javascript.

I know of a way to simulate a POST to a website using VB in my code behind
pages. It returns the entire HTML page into a variable. Then I would like to
be able to parse out the elements of the HTML page in my code and manipulate
them from there.

Can I do this in some way?
 
K

Kevin Spencer

I know of a way to simulate a POST to a website using VB in my code behind
pages. It returns the entire HTML page into a variable. Then I would like
to be able to parse out the elements of the HTML page in my code and
manipulate them from there.

Can I do this in some way?

Hi Craig,

The short answer is "yes." However, the long answer is not going to be
something you want to hear. Browser developers have been working on this for
a dozen years now.

There is a darned good reason for the development of XML and XHTML. HTML was
never designed for extensibility, but it has been extended in many different
ways by many browser vendors, and as a result, the rules for HTML have
become monstrously complex, and differ from one browser to another. A visit
to the W3C web site at http://w3c.org may be a good starting point. Another
good reference on this subject is the Wikipedia entry for "Tag Soup" -
http://en.wikipedia.org/wiki/Tag_Soup. Be sure and follow the link to the
entry on "Quirks Mode."

The biggest problem here is that there were originally no standards for
HTML, and what standards there were (yes, I know, that's a contradiction)
have been adhered to horribly by HTML developers. Combine this with the
variety of proprietary elements and other HTML objects, JavaScript
functions, etc., produced by browser vendors, and you have a real mess on
your hands.

The standards that exist now for HTML 4.0 and HTML 4.01 (the latest
standard) are at best inconsistent and complex. Some tags require closing
tags. Some do not. Attributes may or may not be quoted. Most attributes are
name=value pairs, but some do not require a value. HTML tags may overlap one
another. And so on.

And HTML developers may or may not adhere to a given set of standards. HTML
is text, and many HTML documents are hand-typed, to at least some extent,
and contain errors of various types. How do you determine where the end of a
table is, if, for example, there is no </table> tag? Some browsers just give
up. Others try to make an educated guess.

The HTML DOM is a relative newcomer to the fray, and there are several of
them, defined in various DTDs hosted on the W3C web site. Which DOM do you
want to work with? Is one specified? What if none is specified in the HTML
document? What if one is specified, but not adhered to?

Now add CSS and JavaScript to the mix. Not all of an HTML document may be
HTML. JavaScript may add HTML to the document, or change HTML in the
document. CSS may change the location, size, or may other characteristics of
HTML elements. Of course, CSS does not actually change the DOM, but only how
it looks. This may or may not be important to your needs.

At any rate, these are the major issues that browser vendors have been
wrestling with for over a decade. XHTML is a valiant attempt to bring some
sanity to the mix, by at the very least, defining a base set of rules that
constitute a "well-formed" document, using the rules of XML. For example,
all HTML elements must be terminated, either with a closing tag or with a
terminator "/" at the end of a single tag. All attributes must be pairs, and
must be quoted. HTML tags must not overlap. Every XHTML document must have a
DTD associated with it, and must conform to the rules of the DTD. As XHTML
supplants HTML in the WWW, things will improve tremendously. And there is a
wide committment by browser vendors to build browsers that agree to the
standards recommended by the W3C, ECMA, and other international standards
organizations. Most browser vendors actively participate in the evolution of
these standards. Tomorrow will be a brighter day for all of us!

But for now, you have a real problem there. Recently, we had a similar
requirement. We had to POST to a number of HTML forms on the W3C and parse
the results returned. Of course, to POST to a form means that a requirement
is that we be able to parse the page that contains the form, change the
contents of the form, and then create an HTTP POST message containing the
form values that we wanted to send. My solution was to make the problem
smaller. We didn't care what other HTML was in the page, just the contents
of the form. So I spent several weeks studying up on HTML forms, the rules
of HTML for form elements, and wrote a number of classes for parsing HTML
forms, specifically, one for each form element, including the form itself.
The structure of these classes mirrored the DOM structure of the elements
themselves. I wrote quite a few Regular Expressions to identify such
elements and parse them in an HTML document. This was no small challenge. I
discovered quickly how badly HTML developers mangle the existing standards,
and as I had no choice regarding whether or not to parse the specific forms
we were working with, I had to accomodate them.

Eventually, I came up with some rather nice classes that could parse and
modify the contents of most HTML forms, and POST it to the server to get a
response. Fortunately, the data we were looking for in the response was
easily identifiable in the HTML returned by the form handler, and a couple
of Regular Expressions easily extracted it.

Would I say that this set of classes could parse any form on the Internet?
Hardly. I can only say that we were not able to find any that it could not
parse. But unlike Microsoft, we weren't building a browser, or distributing
a product used by millions of people all over the world. They have my
sympathy.

So, I would recommend to you that you adopt a similar approach. Identify the
specific needs of your app. Most probably, it doesn't need to parse
everything. Identify the specific things it needs to parse, and what it
needs to do with the data. Concentrate on that. With any luck and skill, it
shouldn't take more than a month or so to work something out.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top