XML: how to read in chinese characters

Discussion in 'ASP General' started by msnews.microsoft.com, Jun 26, 2006.

  1. Hey there, I'm having trouble reading Simple Chinese characters from an XML
    document in an ASP file, I want to update the database based on what is in
    the file. Everytime, I read in the characters they come out as ??.
    Here's a snippet. Also here is my sample xml file:


    Response.Charset = "utf-8"
    Set objXML = Server.CreateObject("Microsoft.XMLDOM")
    Set objLst = Server.CreateObject("Microsoft.XMLDOM")
    Set objHdl = Server.CreateObject("Microsoft.XMLDOM")

    objXML.async = False

    objXML.Load (Request.ServerVariables( "APPL_PHYSICAL_PATH" ) &
    strDirectoryUpload & "\\" & actualName)

    If objXML.parseError.errorCode <> 0 Then
    ' handle the error
    intErrorCode = 3
    strErrorMessage = "There was a parser error in the xml file."

    Set objLst = objXML.getElementsByTagName("item")
    noOfHeadlines = objLst.length
    If IsNumeric(noOfHeadlines) Then
    noOfHeadlines = Clng(noOfHeadlines)
    End If
    For i = 0 To (noOfHeadlines-1)
    Set objHdl = objLst.item(i)

    strItemID = objHdl.childNodes(0).text
    strTitle = objHdl.childNodes(1).text
    strLastUpdate = objHdl.childNodes(2).text
    strEventCopy = objHdl.childNodes(3).text
    strSortOrder = objHdl.childNodes(4).text
    strContent = objHdl.childNodes(5).text
    strLanguage1 = objHdl.childNodes(6).text <---- this is where I am
    getting the chinese characters.
    msnews.microsoft.com, Jun 26, 2006
  2. Assuming that the XML file is encoded as UTF-8 (ie. if you open it directly
    in a browser or a text editor it looks ok) Then the line above shouldn't be
    a problem.

    What DB are you using and does the data type of the field you are placing
    the text in accept unicode characters?

    Anthony Jones, Jun 27, 2006
  3. msnews.microsoft.com

    surf_doggie Guest

    surf_doggie, Jun 28, 2006
  4. msnews.microsoft.com

    MFedatto Guest

    For Chinese you have to use a diferent CharSet.

    For Chinese Simplified you may use GB2312, GBK, GB18030, HZ ou
    ISO-2022-CN and for Chinese Traditional you may use Big5, Big5-HKSCS or
    MFedatto, Jun 28, 2006
