HTML Agility Pack Terminology

Discussion in 'HTML' started by eBob.com, Oct 10, 2011.

  1. eBob.com

    eBob.com Guest

    I need to use the HTML Agility Pack for the first time and, so far at least,
    don't find the documentation very helpful. It doesn't help that I am not an
    HTML expert. My initial problem is that I don't understand how HAP is using
    the terms "node" and "element"? I thought in HTML that everything is an
    element, or part of an element.

    I am experimenting with some sample code I found which displays nodes and
    everything seems to be there; i.e. everything seems to be a node. I haven't
    been able yet to figure out how to alter the sample code to show me
    elements.

    Any help would be greatly appreciated.

    (PS Also, please, what is the difference between this group and
    comp.infosystems.www.authoring.html?)

    Thanks, Bob
     
    eBob.com, Oct 10, 2011
    #1
    1. Advertising

  2. eBob.com

    dorayme Guest

    dorayme, Oct 10, 2011
    #2
    1. Advertising

  3. 10.10.2011 12:16, eBob.com wrote:

    > I need to use the HTML Agility Pack for the first time and, so far at
    > least, don't find the documentation very helpful.


    Which documentation? According to page
    http://htmlagilitypack.codeplex.com/documentation
    "This project does not have documentation yet."

    > It doesn't help that I am not an HTML expert.


    Well I am, and I still fail to see what HTML Agility Pack is for. Their
    main page doesn't really say what the package and how it is to be used.
    But undoubtedly it is useful for _something_.

    > My initial problem is that I don't understand how
    > HAP is using the terms "node" and "element"? I thought in HTML that
    > everything is an element, or part of an element.


    That's an easier question. But the answer is not that short.

    In classic HTML, we have elements, but they are parts of the HTML
    document. The correspondence between HTML and DOM was defined
    separately, in various specifications or just by implementations. In a
    more modern view, being phased in in HTML5, an HTML document _is_ a
    document tree, with a DOM framework, and what classic HTML calls HTML
    documents are just serializations (linearizations) of the tree.

    A DOM tree may contain nodes other than HTML element nodes. For example,
    if a serialized HTML document contains <p>foo<b>bar</b></p>, then the
    document tree contains, in addition to HTML element nodes, an unnamed
    text node containing the string "foo". Such an approach is needed for
    "mixed content" elements like p (elements that may contain both text and
    inner elements) - if you don't construct nodes for the text strings, you
    cannot make the document tree reflect the intended structure.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Oct 10, 2011
    #3
  4. eBob.com

    eBob.com Guest

    "Jukka K. Korpela" <> wrote in message
    news:j6uehf$li3$...
    > 10.10.2011 12:16, eBob.com wrote:
    >
    >> I need to use the HTML Agility Pack for the first time and, so far at
    >> least, don't find the documentation very helpful.

    >
    > Which documentation? According to page
    > http://htmlagilitypack.codeplex.com/documentation
    > "This project does not have documentation yet."


    But, none-the-less, there is a file named HtmlAgilityPack.Documentation.chm
    available from
    this web page: http://htmlagilitypack.codeplex.com/releases/view/44954

    It contains some very helpful detail, but what I'd like to find, and have
    not been able to, is a
    tutorial/overview.

    Thanks for the discussion re "node".

    Bob

    >
    >> It doesn't help that I am not an HTML expert.

    >
    > Well I am, and I still fail to see what HTML Agility Pack is for. Their
    > main page doesn't really say what the package and how it is to be used.
    > But undoubtedly it is useful for _something_.
    >
    >> My initial problem is that I don't understand how
    >> HAP is using the terms "node" and "element"? I thought in HTML that
    >> everything is an element, or part of an element.

    >
    > That's an easier question. But the answer is not that short.
    >
    > In classic HTML, we have elements, but they are parts of the HTML
    > document. The correspondence between HTML and DOM was defined separately,
    > in various specifications or just by implementations. In a more modern
    > view, being phased in in HTML5, an HTML document _is_ a document tree,
    > with a DOM framework, and what classic HTML calls HTML documents are just
    > serializations (linearizations) of the tree.
    >
    > A DOM tree may contain nodes other than HTML element nodes. For example,
    > if a serialized HTML document contains <p>foo<b>bar</b></p>, then the
    > document tree contains, in addition to HTML element nodes, an unnamed text
    > node containing the string "foo". Such an approach is needed for "mixed
    > content" elements like p (elements that may contain both text and inner
    > elements) - if you don't construct nodes for the text strings, you cannot
    > make the document tree reflect the intended structure.
    >
    > --
    > Yucca, http://www.cs.tut.fi/~jkorpela/
    >
     
    eBob.com, Oct 11, 2011
    #4
  5. eBob.com

    eBob.com Guest

    eBob.com, Oct 11, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Karl Seguin

    Re: html agility pack

    Karl Seguin, Apr 13, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    807
    Ken Cox [Microsoft MVP]
    Apr 13, 2005
  2. Tim Jones
    Replies:
    0
    Views:
    387
    Tim Jones
    Jan 31, 2004
  3. Paul
    Replies:
    63
    Views:
    1,269
  4. Jax2008 Jax2008
    Replies:
    0
    Views:
    327
    Jax2008 Jax2008
    Jun 19, 2008
  5. Alexander Farber

    pack 'C3U*' not same as pack 'C3(xC)*'

    Alexander Farber, Jun 23, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    142
    Ilmari Karonen
    Jun 23, 2005
Loading...

Share This Page