MSXML: interpretation of encoded characters

Discussion in 'XML' started by DrewM, Oct 2, 2003.

  1. DrewM

    DrewM Guest

    I have a big xml doc full of greek &encoded; characters, which I am
    attempting to split up with the MSXML dom object and store chunks in a
    database. (Working with ASP)

    The problem I'm currently having is that as soon as I load the doc into
    xmldom ("MSXML2.DOMDocument.3.0") the encoded characters get
    interpreted. When they get inserted into the database they have lost
    their original format.

    e.g. "♂" gets inserted as "♂" when I really need it to be
    preserved as "♂".

    How can I stop the MSXML dom object from interpreting the characters?

    Thanks

    Drew
     
    DrewM, Oct 2, 2003
    #1
    1. Advertising

  2. DrewM

    Martin Boehm Guest

    "DrewM" <> wrote in message
    news:3f7c2d7a$0$122$

    > e.g. "♂" gets inserted as "♂" when I really need it to be
    > preserved as "♂".
    >
    > How can I stop the MSXML dom object from interpreting the characters?


    You cannot. XML is the serialized form of a DOM tree according to a
    certain character encoding table. Entities get replaced with their
    respective meanings as the document gets loaded into memory.

    What you have is not the string '♂', but an entity saying "I am
    Unicode character 9794", and in-memory repesentation will reflect
    exactly this fact.

    The string '♂' would repesented as "&amp;#x2642;" in the XML
    file, but then it is not a greek character anymore.

    What may help you is using the unicode datatypes for your rows - nchar,
    nvarchar or, FWIW, ntext (you do not use them, so your character is
    displayed as two characters 'â™', as it indeed is two bytes long).

    If you then put data from your database back to XML, the correct entity
    will be used again.

    Martin
     
    Martin Boehm, Oct 2, 2003
    #2
    1. Advertising

  3. DrewM

    DrewM Guest

    Martin Boehm wrote:

    > What may help you is using the unicode datatypes for your rows - nchar,
    > nvarchar or, FWIW, ntext (you do not use them, so your character is
    > displayed as two characters 'â™', as it indeed is two bytes long).
    >
    > If you then put data from your database back to XML, the correct entity
    > will be used again.


    Thanks for your reply, Martin.

    The column I'm inserting into is ntext. The characters get inserted like
    'â™,' all the same. Is that what I'd expect?

    When I pull the text back out of the database the stay as weird
    characters instead of going back to the correct entities - but that may
    be my error.

    I guess my core question is should the characters look like 'â™,' in my
    ntext database column?

    Thanks

    Drew
     
    DrewM, Oct 2, 2003
    #3
  4. DrewM

    Martin Boehm Guest

    "DrewM" <> wrote in message
    news:3f7c4b53$0$126$

    >> What may help you is using the unicode datatypes for your rows -
    >> nchar, nvarchar or, FWIW, ntext [...]

    >
    > [...]
    >
    > The column I'm inserting into is ntext. The characters get inserted
    > like 'â™,' all the same. Is that what I'd expect?


    Since I am not sure what exactly you do, maybe could you post some small
    code snippets showing your XML and ASP? What version of SQL Server do
    you use?
    Maybe Q239530 might help you, but I guess you know that already.

    Martin

    P.S.: I am not online again until next Monday, so do not wait. ;-)
     
    Martin Boehm, Oct 2, 2003
    #4
  5. DrewM

    Phelim Guest

    Hi.
    I have some similar problems and was wondering who here could help...

    I have some large greek and russian encoded xml files, and when I try
    to
    display them in html, the encoding seems to stop half at certain
    spots..
    Here is an example of greek xml...

    <option number="1">Να
    προλάβει
    μήπως τα
    μηχανήματα
    σβήσουν
    λόγω
    υψηλών
    πιέσεως σε
    περιπτώσεις
    που η
    θερμοκρασία
    περιβάλλοντος
    είναι
    υψηλή και
    τα μηχανήματα
    ΀˜{%!</option><option number="2">Î?α ανακαλÏ?Ï?ει
    μηÏ?ανήμαÏ?α Ï?οÏ? Ï?Ï?Ï?Ï?ν έÏ?οÏ?ν κάÏ?οια
    διαÏ?Ï?οή αÏ?Ï? Ï?ην Ï?ελεÏ?Ï?αία Ï?οÏ?ά Ï?οÏ?
    έγινε έλεγÏ?οÏ? και Ï?α οÏ?οία μÏ?οÏ?εί να
    Ï?Ï?ειάζεÏ?αι να εÏ?ιÏ?Î</option>

    I dont want to display the proper characters here, I just want the xml
    to be formed properly with encoded characters so that it can be parsed
    later on...
     
    Phelim, Oct 6, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest
    Replies:
    2
    Views:
    759
    Guest
    Jun 20, 2007
  2. Replies:
    0
    Views:
    369
  3. TP
    Replies:
    8
    Views:
    485
  4. Ken Fine
    Replies:
    1
    Views:
    160
    Anthony Jones
    May 25, 2006
  5. sprite
    Replies:
    2
    Views:
    296
    sprite
    Sep 2, 2010
Loading...

Share This Page