HTML Agility Pack Terminology

E

eBob.com

I need to use the HTML Agility Pack for the first time and, so far at least,
don't find the documentation very helpful. It doesn't help that I am not an
HTML expert. My initial problem is that I don't understand how HAP is using
the terms "node" and "element"? I thought in HTML that everything is an
element, or part of an element.

I am experimenting with some sample code I found which displays nodes and
everything seems to be there; i.e. everything seems to be a node. I haven't
been able yet to figure out how to alter the sample code to show me
elements.

Any help would be greatly appreciated.

(PS Also, please, what is the difference between this group and
comp.infosystems.www.authoring.html?)

Thanks, Bob
 
J

Jukka K. Korpela

10.10.2011 12:16 said:
I need to use the HTML Agility Pack for the first time and, so far at
least, don't find the documentation very helpful.

Which documentation? According to page
http://htmlagilitypack.codeplex.com/documentation
"This project does not have documentation yet."
It doesn't help that I am not an HTML expert.

Well I am, and I still fail to see what HTML Agility Pack is for. Their
main page doesn't really say what the package and how it is to be used.
But undoubtedly it is useful for _something_.
My initial problem is that I don't understand how
HAP is using the terms "node" and "element"? I thought in HTML that
everything is an element, or part of an element.

That's an easier question. But the answer is not that short.

In classic HTML, we have elements, but they are parts of the HTML
document. The correspondence between HTML and DOM was defined
separately, in various specifications or just by implementations. In a
more modern view, being phased in in HTML5, an HTML document _is_ a
document tree, with a DOM framework, and what classic HTML calls HTML
documents are just serializations (linearizations) of the tree.

A DOM tree may contain nodes other than HTML element nodes. For example,
if a serialized HTML document contains <p>foo<b>bar</b></p>, then the
document tree contains, in addition to HTML element nodes, an unnamed
text node containing the string "foo". Such an approach is needed for
"mixed content" elements like p (elements that may contain both text and
inner elements) - if you don't construct nodes for the text strings, you
cannot make the document tree reflect the intended structure.
 
E

eBob.com

Jukka K. Korpela said:
Which documentation? According to page
http://htmlagilitypack.codeplex.com/documentation
"This project does not have documentation yet."

But, none-the-less, there is a file named HtmlAgilityPack.Documentation.chm
available from
this web page: http://htmlagilitypack.codeplex.com/releases/view/44954

It contains some very helpful detail, but what I'd like to find, and have
not been able to, is a
tutorial/overview.

Thanks for the discussion re "node".

Bob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top