Problem in parsing xml document with japanese text

Discussion in 'XML' started by Prakash, Jan 9, 2004.

  1. Prakash

    Prakash Guest

    Hi all,

    I am trying a parse a xml document containing japanese text by
    constructing a DOMBuilder object. The document created after parsing
    is empty. If the xml document does not contain japanese characters,
    the parsing goes thro' properly.

    Here is the sample document that is causing the problem.

    <root>
    <aolist>
    <ao>
    <attribute id="Identifier"
    dictName="Identifier"><value>3</value></attribute>
    <attribute id="EventTime" dictName="EventTime"><value>2003ǯ06·î26Æü
    13»þ</value></attribute>
    </ao>
    </aolist>
    </root>

    Here is the sample code written to parse the xml document. The above
    xml string is present in newformatedstr and is passed to DOMBuilder
    parse method, after wrapping it into a xml structure using
    MemBufInputSource/Wrapper4InputSource.

    DOMBuilder *parser_p = NULL;

    {
    // Wraps formattedMsg_r to create input structure for the
    parser
    MemBufInputSource memBuf_p(
    (const XMLByte*)newformattedstr.data()
    , newformattedstr.length()
    , "info"
    , false
    );
    Wrapper4InputSource msg(&memBuf_p);

    // Sets up the parser
    static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
    DOMImplementation *impl_p =
    DOMImplementationRegistry::getDOMImplementation(gLS);
    assert(impl_p);

    parser_p = ((DOMImplementationLS*)impl_p)->createDOMBuilder(

    DOMImplementationLS::MODE_SYNCHRONOUS,
    0);
    assert(parser_p);

    parser_p->setFeature(XMLUni::fgDOMNamespaces, false);
    parser_p->setFeature(XMLUni::fgXercesSchema, false);
    parser_p->setFeature(XMLUni::fgXercesSchemaFullChecking,
    false);
    parser_p->setFeature(XMLUni::fgDOMValidateIfSchema, true);
    parser_p->setFeature(XMLUni::fgDOMDatatypeNormalization,
    true);

    // Pointer to the temporary xml strucutre
    DOMDocument* tempDoc_p = NULL;

    try
    {
    parser_p->resetDocumentPool();
    tempDoc_p = parser_p->parse(msg);
    }
    catch (const XMLException& e)
    {
    ......
    }
    catch (...)
    {
    ...............
    }

    // Root node of the temporal document.
    DOMNode* tempRootNode_p =
    (DOMNode*)tempDoc_p->getDocumentElement();
    assert(tempRootNode_p);

    XMLCh *tempDoc = theSerializer_p->writeToString(*tempDoc_p);
    assert(tempDoc);

    char *output_tempDoc = XMLString::transcode(tempDoc);
    outfile<<output_tempDoc<<endl;

    parser_p->release();
    }


    I am using xerces 2.2.0. As I understand from the documents, this
    version supports internationalization. But not use why it is not able
    to parse. I tried both UTF-8 and UTF-16 encoding, it doesn't help.

    Any pointers on how to solve this problem will be of great help.

    Best Regards
    Prakash
     
    Prakash, Jan 9, 2004
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. shilpa
    Replies:
    8
    Views:
    3,945
    Chris Uppal
    Sep 5, 2005
  2. Sriv Chakravarthy

    Errors parsing Japanese chars

    Sriv Chakravarthy, Jul 8, 2003, in forum: XML
    Replies:
    1
    Views:
    1,017
    Alan J. Flavell
    Jul 8, 2003
  3. Tony Prichard
    Replies:
    0
    Views:
    791
    Tony Prichard
    Dec 12, 2003
  4. Michael Sullivan
    Replies:
    14
    Views:
    264
    James Gray
    Jul 21, 2009
  5. Erik Wasser
    Replies:
    5
    Views:
    527
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page