Parsing generic XML

Discussion in 'Java' started by Roedy Green, Jun 11, 2008.

  1. Roedy Green

    Roedy Green Guest

    I have some XML, namely PAD files, for which I have no schema, though
    I probably could cook one up in a day or two.

    Similarly I have some XHTML, I want to screenscrape where, I really
    only care about the <table <tr and <td elements.

    So what I am after is some sort of extremely relaxed schema that will
    eat pretty well anything so long as the tags balance.

    I tried parsing without any schema at all, and it choked on &nbsp;
    entities.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Jun 11, 2008
    #1
    1. Advertising

  2. On Jun 11, 10:40 am, Roedy Green <>
    wrote:
    > I have some XML, namely PAD files, for which I have no schema, though
    > I probably could cook one up in a day or two.  
    >
    > Similarly I have some XHTML, I want to screenscrape where, I really
    > only care about the <table <tr and <td elements.  
    >
    > So what I am after is some sort of extremely relaxed schema that will
    > eat pretty well anything so long as the tags balance.
    >
    > I tried parsing without any schema at all, and it choked on &nbsp;
    > entities.


    Entity references (&nbsp; and friends) only have meaning with respect
    to a schema or DTD which maps them to entities (eg.,   in the
    case of &nbsp;). XML documents which contain entity references MUST
    contain a definition somewhere; there's not really any avoiding it.

    Fortunately, for XHTML that's easy; there's a published DTD.

    In the case of PAD files you may have to replace the entity references
    with entities manually, if you can't find a schema that defines them.

    Any basic XML parser (jdom, dom4j, sax, w3c dom, et multiple cetera)
    should accept any well-formed document if you turn off validation.

    -o
     
    Owen Jacobson, Jun 11, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Murat Tasan
    Replies:
    1
    Views:
    8,073
    Chaitanya
    Feb 3, 2009
  2. Replies:
    2
    Views:
    448
  3. minlearn
    Replies:
    2
    Views:
    464
    red floyd
    Mar 13, 2009
  4. John Levine
    Replies:
    0
    Views:
    747
    John Levine
    Feb 2, 2012
  5. Erik Wasser
    Replies:
    5
    Views:
    483
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page