MSXML: interpretation of encoded characters

D

DrewM

I have a big xml doc full of greek &encoded; characters, which I am
attempting to split up with the MSXML dom object and store chunks in a
database. (Working with ASP)

The problem I'm currently having is that as soon as I load the doc into
xmldom ("MSXML2.DOMDocument.3.0") the encoded characters get
interpreted. When they get inserted into the database they have lost
their original format.

e.g. "♂" gets inserted as "♂" when I really need it to be
preserved as "♂".

How can I stop the MSXML dom object from interpreting the characters?

Thanks

Drew
 
M

Martin Boehm

e.g. "♂" gets inserted as "♂" when I really need it to be
preserved as "♂".

How can I stop the MSXML dom object from interpreting the characters?

You cannot. XML is the serialized form of a DOM tree according to a
certain character encoding table. Entities get replaced with their
respective meanings as the document gets loaded into memory.

What you have is not the string '♂', but an entity saying "I am
Unicode character 9794", and in-memory repesentation will reflect
exactly this fact.

The string '♂' would repesented as "♂" in the XML
file, but then it is not a greek character anymore.

What may help you is using the unicode datatypes for your rows - nchar,
nvarchar or, FWIW, ntext (you do not use them, so your character is
displayed as two characters 'â™', as it indeed is two bytes long).

If you then put data from your database back to XML, the correct entity
will be used again.

Martin
 
D

DrewM

Martin said:
What may help you is using the unicode datatypes for your rows - nchar,
nvarchar or, FWIW, ntext (you do not use them, so your character is
displayed as two characters 'â™', as it indeed is two bytes long).

If you then put data from your database back to XML, the correct entity
will be used again.

Thanks for your reply, Martin.

The column I'm inserting into is ntext. The characters get inserted like
'â™,' all the same. Is that what I'd expect?

When I pull the text back out of the database the stay as weird
characters instead of going back to the correct entities - but that may
be my error.

I guess my core question is should the characters look like 'â™,' in my
ntext database column?

Thanks

Drew
 
M

Martin Boehm

What may help you is using the unicode datatypes for your rows -
nchar, nvarchar or, FWIW, ntext [...]

[...]

The column I'm inserting into is ntext. The characters get inserted
like 'â™,' all the same. Is that what I'd expect?

Since I am not sure what exactly you do, maybe could you post some small
code snippets showing your XML and ASP? What version of SQL Server do
you use?
Maybe Q239530 might help you, but I guess you know that already.

Martin

P.S.: I am not online again until next Monday, so do not wait. ;-)
 
P

Phelim

Hi.
I have some similar problems and was wondering who here could help...

I have some large greek and russian encoded xml files, and when I try
to
display them in html, the encoding seems to stop half at certain
spots..
Here is an example of greek xml...

<option number="1">Να
προλάβει
μήπως τα
μηχανήματα
σβήσουν
λόγω
υψηλών
πιέσεως σε
περιπτώσεις
που η
θερμοκρασία
περιβάλλοντος
είναι
υψηλή και
τα μηχανήματα
΀˜{%!</option><option number="2">Î?α ανακαλÏ?Ï?ει
μηÏ?ανήμαÏ?α Ï?οÏ? Ï?Ï?Ï?Ï?ν έÏ?οÏ?ν κάÏ?οια
διαÏ?Ï?οή αÏ?Ï? Ï?ην Ï?ελεÏ?Ï?αία Ï?οÏ?ά Ï?οÏ?
έγινε έλεγÏ?οÏ? και Ï?α οÏ?οία μÏ?οÏ?εί να
Ï?Ï?ειάζεÏ?αι να εÏ?ιÏ?Î</option>

I dont want to display the proper characters here, I just want the xml
to be formed properly with encoded characters so that it can be parsed
later on...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top