Parsing xhtml with libxml

Discussion in 'Ruby' started by Jon Smirl, Dec 16, 2005.

  1. Jon Smirl

    Jon Smirl Guest

    If you get errors complaining of undefined entities like   when
    parsing xhtml it means you need to install the DTD for xhtml 1.0 or
    1.1.

    Example of a doctype for xhtml 1.1:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

    You want to install the DTDs locally following the model in /etc/xml.
    If you don't libxml will fetch the DTD from www.w3.org each time you
    parse a document. Needing to install these DTDs was not obvious to me
    and should be part of the documentation. There a rpm for xhtml 1.0 -
    "xhtml1-dtds-1.0-7". I couldn't find one for xhtml 1.1 so I downloaded
    it piecemeal from w3.org.

    Installing the DTD does not automatically turn on validation. If you
    want to validate you need to turn it on:
    XML::parser::default_validity_checking =3D TRUE

    XML::parser::default_load_external_dtd controls the loading of the
    'external subset' (the definition for the character entities like
    &amp;. It is defaulted to TRUE.

    XML::parser::default_load_external_dtd is broken. This fixes it.

    Index: ruby_xml_parser.c
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D=3D=3D=3D=3D
    RCS file: /var/cvs/xml-tools/libxml-ruby/ruby_xml_parser.c,v
    retrieving revision 1.1.1.1
    diff -r1.1.1.1 ruby_xml_parser.c
    274c274
    < if (xmlSubstituteEntitiesDefaultValue)
    ---
    > if (xmlLoadExtDtdDefaultValue)

    916c916
    < ruby_xml_parser_default_load_external_dtd_set,=
    0);
    ---
    > ruby_xml_parser_default_load_external_dtd_get,=

    0);
    918c918
    < ruby_xml_parser_default_load_external_dtd_get,=
    1);
    ---
    > ruby_xml_parser_default_load_external_dtd_set,=

    1);


    Sam's patches for libxml are also needed:
    http://www.intertwingly.net/blog/2005/11/05/Patch-for-libxml2s-Ruby-binding
     
    Jon Smirl, Dec 16, 2005
    #1
    1. Advertising

  2. Jon Smirl wrote:
    > If you get errors complaining of undefined entities like &nbsp; when
    > parsing xhtml it means you need to install the DTD for xhtml 1.0 or
    > 1.1.
    >
    > Example of a doctype for xhtml 1.1:
    > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    > "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    >
    > <snip explanation & code due to ruby-forum.com />
    >
    > Sam's patches for libxml are also needed:
    > http://www.intertwingly.net/blog/2005/11/05/Patch-for-libxml2s-Ruby-binding


    Thank you for this!



    E
    --
    This document is NOT valid XHTML 1.0!

    --
    Posted via http://www.ruby-forum.com/.
     
    Eero Saynatkari, Dec 16, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ian Gregory
    Replies:
    1
    Views:
    533
  2. Olav
    Replies:
    3
    Views:
    4,373
  3. jwang

    libxml: Parsing XML Question?

    jwang, Jul 6, 2004, in forum: C Programming
    Replies:
    5
    Views:
    427
    TLOlczyk
    Jul 7, 2004
  4. subimage
    Replies:
    11
    Views:
    357
    Mathieu Blondel
    Jun 8, 2006
  5. John7481

    Parsing html by XML::libXML

    John7481, Aug 12, 2004, in forum: Perl Misc
    Replies:
    2
    Views:
    243
Loading...

Share This Page