saxparser ignore <!doctype> line

Discussion in 'Java' started by SnooPac, Jan 5, 2005.

  1. SnooPac

    SnooPac Guest

    Hi.

    I'm trying to use SAXParser simply to extract some attribute values from an
    XML file. No validations or anything.
    So I have it all set up and working....mostly.

    The problem is that the XML file (which can't be changed) has the line at
    the start that says:
    <!DOCTYPE BlaCfg SYSTEM "../../../path/to/dtd/file/BlaCfg.dtd">

    And this dtd cannot be found.

    There are some workarounds I can do (like regenerating this XML file
    temporarily without this line, or making a dummy dtd file, or...?), but I'd
    rather just have my SAXParser ignore this line entirely. Is there any
    property I can set to do this?

    Here is some of my code, by the way...I think its very standard:

    DefaultHandler handler = new MyTempXmlParser();
    SAXParserFactory factory = SAXParserFactory.newInstance();

    // Parse the input
    SAXParser saxParser = factory.newSAXParser();
    saxParser.parse( new File("path/to/xml/file/file.xml"), handler);

    I was hoping maybe for some sort of factory.setFeature() line that might do
    this?
    Or maybe within my handler?

    Any hints would be appreciated.

    Thanks,
    Aiman
     
    SnooPac, Jan 5, 2005
    #1
    1. Advertisements

  2. /SnooPac/:

    > DefaultHandler handler = new MyTempXmlParser();
    > SAXParserFactory factory = SAXParserFactory.newInstance();


    Try setting, explicitly:

    factory.setValidating(false);

    It should be equivalent to setting the
    "http://xml.org/sax/features/validation" feature to false.

    > // Parse the input
    > SAXParser saxParser = factory.newSAXParser();
    > saxParser.parse( new File("path/to/xml/file/file.xml"), handler);
    >
    > I was hoping maybe for some sort of factory.setFeature() line that might do
    > this?


    http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description

    --
    Stanimir
     
    Stanimir Stamenkov, Jan 5, 2005
    #2
    1. Advertisements

  3. SnooPac

    SnooPac Guest

    > Try setting, explicitly:
    >
    > factory.setValidating(false);


    Although i havent done this exactly, I did check factory.isValidating() and
    it was already set to false.

    >

    http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description

    Thanks for the list.
    From that page, I tried factory.setFeature("resolve-dtd-uris", false);
    but from that i got the exception: org.xml.sax.SAXNotRecognizedException:
    Feature: resolve-dtd-uris

    So it doesnt look like my jdk's (sun 1.4.2 i believe) matches whats in that
    webpage you sent :)
    Anyway, I dont really know if (or think that) setting "resolve-dtd-uris" to
    false would have helped anyway, but it looked like the closest thing to what
    I wanted.

    Thanks though.
     
    SnooPac, Jan 5, 2005
    #3
  4. SnooPac <> wyplu³(a):
    > I was hoping maybe for some sort of factory.setFeature() line that might
    > do this?
    > Or maybe within my handler?



    Hi,
    I thing you rather need to set your own EntityResolver:


    final SAXBuilder parser = new SAXBuilder();
    parser.setEntityResolver(new EntityResolver()
    {
    public InputSource resolveEntity(String publicId, String systemId)
    {
    if (/*publicId is not right*/)
    return new InputSource(
    new ByteArrayInputStream("<?xml version='1.0'
    encoding='UTF-8'?>".getBytes()));
    else
    return null;
    }
    });


    BR.
    --
    Pawe³ Stobiñski // SQ9NRY/5 // GG 0x4F9228
    "Windows [n.], A 32-bit extension and GUI shell to a 16-bit patch to
    an 8-bit operating system originally coded for a 4-bit microprocessor
    and produced by a 2-bit company without 1 bit of sense
     
    Pawe³ Stobiñski, Jan 5, 2005
    #4
  5. SnooPac <> wyplu³(a):
    > I was hoping maybe for some sort of factory.setFeature() line that might
    > do this?
    > Or maybe within my handler?


    Hi,
    I think you rather need to set your own EntityResolver:


    final SAXBuilder parser = new SAXBuilder();
    parser.setEntityResolver(new EntityResolver()
    {
    public InputSource resolveEntity(String publicId, String systemId)
    {
    if (/publicId is not right/)
    return new InputSource(
    new ByteArrayInputStream("<?xml version='1.0'
    encoding='UTF-8'?>".getBytes()));
    else
    return null;
    }
    });


    BR.
    --
    Pawe³ Stobiñski // SQ9NRY/5 // GG 0x4F9228
    "Windows [n.], A 32-bit extension and GUI shell to a 16-bit patch to
    an 8-bit operating system originally coded for a 4-bit microprocessor
    and produced by a 2-bit company without 1 bit of sense
     
    Pawe³ Stobiñski, Jan 5, 2005
    #5
  6. SnooPac

    SnooPac Guest

    Ok, I used a variation of this, and it seems to be working.
    Although I don't really understand it, so it seems rather cludgy.

    I didn't use saxbuilder or anything. I just overrode the resolveEntity()
    function from DefaultHandler (in my handler class).
    The other problem I had was that my publicId was being reported as null, and
    systemId was reported as file://path/to/myfile.dtd.
    So I cant really do anything in my if statement concerning publicId.
    so now I have my resolveEntity() returning "<?xml ..." regardless. I guess I
    could do some kind of processing on systemId (like check if it starts with
    file:// and ends with .dtd, or something else that's simple).

    Another thing I don't understand, is what exactly happens when I return that
    <?xml... stuff? Will the parser now continue parsing the document using the
    InputSource thats returned instead of the <doctype> line from the xml file?

    Thanks again,
    Aiman

    "Pawe³ Stobiñski" <> wrote in message
    news:crhgne$due$...
    > SnooPac <> wyplu³(a):
    > > I was hoping maybe for some sort of factory.setFeature() line that might
    > > do this?
    > > Or maybe within my handler?

    >
    > Hi,
    > I think you rather need to set your own EntityResolver:
    >
    >
    > final SAXBuilder parser = new SAXBuilder();
    > parser.setEntityResolver(new EntityResolver()
    > {
    > public InputSource resolveEntity(String publicId, String systemId)
    > {
    > if (/publicId is not right/)
    > return new InputSource(
    > new ByteArrayInputStream("<?xml version='1.0'
    > encoding='UTF-8'?>".getBytes()));
    > else
    > return null;
    > }
    > });
    >
    >
    > BR.
    > --
    > Pawe³ Stobiñski // SQ9NRY/5 // GG 0x4F9228
    > "Windows [n.], A 32-bit extension and GUI shell to a 16-bit patch to
    > an 8-bit operating system originally coded for a 4-bit microprocessor
    > and produced by a 2-bit company without 1 bit of sense
    >
     
    SnooPac, Jan 5, 2005
    #6
  7. SnooPac <> wyplu³(a):
    > Another thing I don't understand, is what exactly happens when I return
    > that <?xml... stuff? Will the parser now continue parsing the document
    > using the InputSource thats returned instead of the <doctype> line from
    > the xml file?


    No, in fact that <?xml... return string served as example. In the case it
    should rather be <!DOCTYPE.., or any transformation of input data to avoid
    referring to any unknown locations.

    BR.
    --
    Pawe³ Stobiñski // SQ9NRY/5 // GG 0x4F9228
    "Windows [n.], A 32-bit extension and GUI shell to a 16-bit patch to
    an 8-bit operating system originally coded for a 4-bit microprocessor
    and produced by a 2-bit company without 1 bit of sense
     
    Pawe³ Stobiñski, Jan 5, 2005
    #7
  8. /SnooPac/:

    >> Try setting, explicitly:
    >>
    >> factory.setValidating(false);

    >
    > Although i havent done this exactly, I did check factory.isValidating() and
    > it was already set to false.
    >
    > http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description


    One could notice that on the above page the "validation" feature has
    unspecified default value so one probably need calling
    'setValidating(false)' anyway although the Java API describes it is
    the default. You may try calling even:

    factory.setFeature("http://xml.org/sax/features/validation", false);

    You didn't say: did you actually try setting it explicitly? You
    stated you only checked 'getValidating()', but could be misleading
    sometimes.

    > Thanks for the list.
    > From that page, I tried factory.setFeature("resolve-dtd-uris", false);
    > but from that i got the exception: org.xml.sax.SAXNotRecognizedException:
    > Feature: resolve-dtd-uris
    >
    > So it doesnt look like my jdk's (sun 1.4.2 i believe) matches whats in that
    > webpage you sent :)
    > Anyway, I dont really know if (or think that) setting "resolve-dtd-uris" to
    > false would have helped anyway, but it looked like the closest thing to what
    > I wanted.


    I don't think this feature is any use for you.

    --
    Stanimir
     
    Stanimir Stamenkov, Jan 6, 2005
    #8
  9. /SnooPac/:

    >> factory.setValidating(false);

    >
    > Although i havent done this exactly, I did check factory.isValidating() and
    > it was already set to false.


    From my experience the default behavior of parser implementations
    is to validate when a DOCTYPE declaration has been encountered and
    not to validate if no DOCTYPE has been specified, regardless of what
    the initial JAXP factory.getValidation() returns. So you should set
    factory.setValidation(true/false) explicitly if you want to trigger
    exact behavior.

    --
    Stanimir
     
    Stanimir Stamenkov, Jan 6, 2005
    #9
  10. SnooPac

    Rashmi L

    Joined:
    Apr 21, 2016
    Messages:
    1
    Likes Received:
    0
    Hi,

    In my XML i have the content <!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">.

    Hence while parsing i m getting SAXParseException: doctype not allowed in content.

    So i tried using factory.setFeature("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">", false);

    But it gave org.xml.sax.SAXNotRecognizedException:

    So i override the method resolveEntity method in my custom handler as shown below.
    @Override
    public InputSource resolveEntity(String publicId, String systemId)
    {
    if (systemId.contains("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">")) {
    return new InputSource(new StringReader(""));
    } else {
    return null;
    }

    }

    But this method is not getting called in parsing. May i know the details about this method when this method will be called?

    Please answer anybody.
     
    Rashmi L, Apr 21, 2016
    #10
    1. Advertisements

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Similar Threads
  1. Per Magnus L?vold
    Replies:
    0
    Views:
    2,480
    Per Magnus L?vold
    Nov 16, 2004
  2. Bekkali Hicham

    Question about SAXParser

    Bekkali Hicham, Jul 12, 2003, in forum: XML
    Replies:
    0
    Views:
    920
    Bekkali Hicham
    Jul 12, 2003
  3. hitectahir

    Speeding up Xerces SAXParser

    hitectahir, Sep 20, 2003, in forum: XML
    Replies:
    0
    Views:
    603
    hitectahir
    Sep 20, 2003
  4. User
    Replies:
    0
    Views:
    1,143
  5. Bobo

    Unicode and SAXParser

    Bobo, Dec 3, 2003, in forum: XML
    Replies:
    3
    Views:
    2,549
    Richard Tobin
    Dec 6, 2003
  6. Replies:
    3
    Views:
    6,265
    Tor Iver Wilhelmsen
    Oct 23, 2006
  7. Rob Meade
    Replies:
    6
    Views:
    547
    Rob Meade
    Mar 1, 2004
  8. Larry Lindstrom
    Replies:
    19
    Views:
    1,639
    Jonathan N. Little
    Jun 12, 2012
Loading...