DOM parsing - Document root element is missing.

Discussion in 'Java' started by Rico, Oct 17, 2004.

  1. Rico

    Rico Guest

    The following piece of code :

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
    .newInstance();
    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
    Document doc = docBuilder.parse(filename);


    ends in "Document root element is missing" for the following XML:

    <?xml version="1.0" encoding="utf-8"?>
    <EmailSender>
    <db_name>master</db_name>
    <document_type>document_New</document_type>
    <emailID />
    <document_ID>23983</document_ID>
    </EmailSender>


    I don't really know how the XML is being produced but a space between the
    last double-quote and the last '?' seems to solve the problem.
    So does changing double-quotes to single-quotes.

    Is it something wrong with the XML document or am I missing something
    about the usage of the API ?

    Thanks. Regards,
    Rico.
     
    Rico, Oct 17, 2004
    #1
    1. Advertising

  2. Rico

    xarax Guest

    "Rico" <> wrote in message
    news:p...
    > The following piece of code :
    >
    > DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
    > .newInstance();
    > DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
    > Document doc = docBuilder.parse(filename);
    >
    >
    > ends in "Document root element is missing" for the following XML:
    >
    > <?xml version="1.0" encoding="utf-8"?>
    > <EmailSender>
    > <db_name>master</db_name>
    > <document_type>document_New</document_type>
    > <emailID />
    > <document_ID>23983</document_ID>
    > </EmailSender>
    >
    >
    > I don't really know how the XML is being produced but a space between the
    > last double-quote and the last '?' seems to solve the problem.
    > So does changing double-quotes to single-quotes.
    >
    > Is it something wrong with the XML document or am I missing something
    > about the usage of the API ?


    The first line of the XML file is not XML syntax.
    That's according to the rules of XML.

    <?xml version='1.0' encoding='UTF-8' ?>

    The first line above is an example of a correct
    XML header. It is *not* XML, because the keywords
    must be specified in the correct order. (Attribute
    keywords that appear within the XML body can be
    specified in any order.) I use single quotes in
    preference to double quotes, but the space appearing
    before the final ? is required.
     
    xarax, Oct 17, 2004
    #2
    1. Advertising

  3. Rico

    Rico Guest

    On Sun, 17 Oct 2004 15:39:17 +0000, xarax wrote:
    > "Rico" <> wrote in message
    >> I don't really know how the XML is being produced but a space between the
    >> last double-quote and the last '?' seems to solve the problem.
    >> So does changing double-quotes to single-quotes.


    > The first line of the XML file is not XML syntax.
    > That's according to the rules of XML.
    >
    > <?xml version='1.0' encoding='UTF-8' ?>
    >
    > The first line above is an example of a correct
    > XML header. It is *not* XML, because the keywords
    > must be specified in the correct order. (Attribute
    > keywords that appear within the XML body can be
    > specified in any order.) I use single quotes in
    > preference to double quotes, but the space appearing
    > before the final ? is required.


    Thanks for the input xarax. However, I don't think so, after checking,
    that the space is required. The file is produced by a program written in
    VB.Net and I am reading it using the Java DOM package.

    For some reason, if I somehow modify and save the file before getting my
    program to read it, the parsing goes fine. No missing root element or
    anything. That's what was happening when I added the space, to match what
    worked when I had been testing my program using my own files.

    Any further pointers would be very much appreciated. Thanks.

    Rico.
     
    Rico, Oct 18, 2004
    #3
  4. Rico

    Sudsy Guest

    Rico wrote:
    <snip>
    > Any further pointers would be very much appreciated. Thanks.
    >
    > Rico.


    So as soon as you touch the file with an editor it parses
    correctly? So now you have enough information to start on
    the process of discovery!
    Edit the file, making no changes, save, and exit.
    Next use a binary comparator to check for differences.
    Perhaps it's as simple as the ^Z used to mark end-of-file
    in the M$ world.
    Possibly the line termination characters: \r\n in the M$
    world, \n in *NIX. Could be the cause, as the problem is
    manifesting itself in the first line of the file, no?

    --
    Java/J2EE/JSP/Struts/Tiles/C/UNIX consulting and remote development.
     
    Sudsy, Oct 18, 2004
    #4
  5. Rico

    Rico Guest

    On Sun, 17 Oct 2004 23:38:15 -0400, Sudsy wrote:
    > Rico wrote:
    >> Any further pointers would be very much appreciated. Thanks.


    > So as soon as you touch the file with an editor it parses
    > correctly? So now you have enough information to start on
    > the process of discovery!
    > Edit the file, making no changes, save, and exit.
    > Next use a binary comparator to check for differences.
    > Perhaps it's as simple as the ^Z used to mark end-of-file
    > in the M$ world.


    Thanks Sudsy. This sounds like a good line of reasoning. Both my Java
    program and the VB.NET program are running on Win2K Pro.
    Vim on Cygwin reports that I've got an "incomplete last line"
    So the above guess could be in the right direction...

    Appending "\n" to the file had Vim not complaining anymore but there's
    some rubbish characters before the header, which even Textpad displays in
    binary mode for the unmodified file coming from the VB.NET program.

    It turns out the machine on which the VB.NET program was compiled is
    running some Unicode settings that produced garbage on my PC. Textpad
    manages to get rid of that upon saving and that's why I could parse it
    afterwards.

    Rico.
     
    Rico, Oct 18, 2004
    #5
  6. Rico

    Sudsy Guest

    Rico wrote:
    <snip>
    > Appending "\n" to the file had Vim not complaining anymore but there's
    > some rubbish characters before the header, which even Textpad displays in
    > binary mode for the unmodified file coming from the VB.NET program.


    If you check the archives you'll find mention of a BOM, or Byte Order Mark.
    It sounds like you'll have to perform some pre-processing of this file
    before trying to parse it. But I think you already know this by now...

    --
    Java/J2EE/JSP/Struts/Tiles/C/UNIX consulting and remote development.
     
    Sudsy, Oct 18, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve Carrow
    Replies:
    0
    Views:
    552
    Steve Carrow
    Jul 28, 2003
  2. Replies:
    5
    Views:
    14,506
  3. deepak

    Root element is missing

    deepak, Feb 25, 2008, in forum: ASP .Net
    Replies:
    0
    Views:
    577
    deepak
    Feb 25, 2008
  4. cordata5

    The root element is missing and soapExtension

    cordata5, Nov 18, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    1,485
    Dan Rogers
    Nov 19, 2004
  5. Balasubramanian Ramanathan

    ...Root Element is missing Error!

    Balasubramanian Ramanathan, Dec 22, 2004, in forum: ASP .Net Web Services
    Replies:
    1
    Views:
    438
    Dan Rogers
    Dec 22, 2004
Loading...

Share This Page