M
Mark
Hi...
I've been doing a lot of work both creating and consuming web services, and
I notice there seems to be a discontinuity between a number of the different
cogs in the wheel centering around windows-1252 and that it is not equivalent
to iso-8859-1.
Looking in the registry under HKEY_CLASSES_ROOT\MIME\Database\Charset and
\Codepage, it seems that all variations on iso-8859-1 (latin1, etc) are
mapped to code page 1252, which I'm assuming is windows-1252 in execution
terms. So if I set the codepage=1252 and Response.Charset=iso-8859-1 in ASP,
it seems that I'm *really* going to get out windows-1252, not iso-8859-1.
This becomes somewhat noticable in html since a lot of commonly used elements
(like the free-floating bullet •), which *aren't* really 8859-1, get
interpreted as such in browsers.
I occasionally run into problems, however, because MSXML doesn't appear to
be using the mime database to determine how to process the encoding
declaration (or at least it's got some different mapping hidden somewhere).
MSXML appears to treat the range 128-159 the way the ansi standard defines
them - undefined control sequences. As such, when you're processing xml
(either xml to xml or xml to html via xsl), if you get what is *intended* to
be a bullet (149) or curly quotes or any of those other extensions that are
really windows-1252 in your xml, msxml won't make the association and
translate the characters properly going between character sets. And
unfortunately a lot of web services don't accept or generate "windows-1252"
as an encoding declaration.
So...
1) Am I correct in assuming that MSXML is using different encoding routines
than IIS/ASP?
2) Is there a @Codepage I can specify that will produce real latin 1 in asp?
3) Will ASP.Net be more standards compliant? and/or does ASP.Net use the
mime database under the covers too?
4) just as an aside anybody have a clue why when output via xsl for
encoding utf-8 doesn't display properly in IE?
Thanks
-Mark
I've been doing a lot of work both creating and consuming web services, and
I notice there seems to be a discontinuity between a number of the different
cogs in the wheel centering around windows-1252 and that it is not equivalent
to iso-8859-1.
Looking in the registry under HKEY_CLASSES_ROOT\MIME\Database\Charset and
\Codepage, it seems that all variations on iso-8859-1 (latin1, etc) are
mapped to code page 1252, which I'm assuming is windows-1252 in execution
terms. So if I set the codepage=1252 and Response.Charset=iso-8859-1 in ASP,
it seems that I'm *really* going to get out windows-1252, not iso-8859-1.
This becomes somewhat noticable in html since a lot of commonly used elements
(like the free-floating bullet •), which *aren't* really 8859-1, get
interpreted as such in browsers.
I occasionally run into problems, however, because MSXML doesn't appear to
be using the mime database to determine how to process the encoding
declaration (or at least it's got some different mapping hidden somewhere).
MSXML appears to treat the range 128-159 the way the ansi standard defines
them - undefined control sequences. As such, when you're processing xml
(either xml to xml or xml to html via xsl), if you get what is *intended* to
be a bullet (149) or curly quotes or any of those other extensions that are
really windows-1252 in your xml, msxml won't make the association and
translate the characters properly going between character sets. And
unfortunately a lot of web services don't accept or generate "windows-1252"
as an encoding declaration.
So...
1) Am I correct in assuming that MSXML is using different encoding routines
than IIS/ASP?
2) Is there a @Codepage I can specify that will produce real latin 1 in asp?
3) Will ASP.Net be more standards compliant? and/or does ASP.Net use the
mime database under the covers too?
4) just as an aside anybody have a clue why when output via xsl for
encoding utf-8 doesn't display properly in IE?
Thanks
-Mark