Special character to &abc equivalents

Colin Peters · May 7, 2005

Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file
they are just ä.

1. I'm using a StreamReader to read the file and I have found that if I
don't use System.Text.Encoding.UTF7 then the characters are lost
completely. Is this the correct way, or is there a way to automatically
get the Stream Reader to select the correct encoding, or use other code
to determine which would be best?

2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process a
string in order to change the ä to &äuml; and so on.

Thanks in advance for any replies.

Yunus Emre ALPÖZEN [MCAD.NET] · May 7, 2005

My advice u set underlying operating system encoding whatever u want. And
use streamreader and streamwriter with System.Text.Encoding.Default which
uses underlying OS encoding.

I had same problems with Turkish encoding but this is the best solution
(IMHO)

Joerg Jooss · May 7, 2005

Colin said:
Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file
they are just ä.

1. I'm using a StreamReader to read the file and I have found that if
I don't use System.Text.Encoding.UTF7 then the characters are lost
completely.

UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

Is this the correct way, or is there a way to
automatically get the Stream Reader to select the correct encoding,
or use other code to determine which would be best?

In general, there's no way to guess a character encoding because
there's no universal metadata that could tell you what encoding is
being used.

To put it differently: You must know the encoding, or allow the user to
switch between possible encodings.

2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process
a string in order to change the ä to &äuml; and so on.

That's not necessary if the page is encoded correctly.

Cheers,

Colin Peters · May 7, 2005

Yunus said:
My advice u set underlying operating system encoding whatever u want. And
use streamreader and streamwriter with System.Text.Encoding.Default which
uses underlying OS encoding.

I had same problems with Turkish encoding but this is the best solution
(IMHO)

Unfortunately I'm using shared hosting. I have little influence over
operating system parameters.

Thanks anyway.

Colin Peters · May 7, 2005

Joerg said:
UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

Thanks anyway.

Juan T. Llibre · May 7, 2005

You can set the encoding as a Page directive.

<%@Page Language="VB" ResponseEncoding="UTF-8"%>

<%@Page Language="C#" ResponseEncoding="ISO-8859-1"%>

Joerg Jooss · May 7, 2005

Colin said:
Windows-1252?

I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

There are only a few default instances in Encoding. You can construct
all encodings by name using Encoding.GetEncoding(), e.g.

Encoding enc = Encoding.GetEncoding("ISO-8859-1").

Cheers,

Colin Peters · May 7, 2005

Aha! The penny has dropped. Or in this case, the Euro.

Many thanks to all.

Guest · May 9, 2005

Server.HtmlEncode(string) will convert any "special chars" from a text file
to the relevant &abc; equivalent without having to worry about codepages... I
use it in my chat application to prevent malicious code being inserted into
the database.

Regards,

Paul Parkinson (www.elysaria.com)

Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
How to convert MS Word special characters to HTML codes?	1	Mar 31, 2012
Cookies special characters.	1	Oct 20, 2008
abc don't play well with private method	7	May 5, 2010
character array to string	0	Mar 31, 2011
Special characters in URL	2	Oct 8, 2007
Special characters	2	Sep 15, 2009
XML and special characters ...	2	Nov 27, 2005

Special character to &abc equivalents

Colin Peters

Yunus Emre ALPÖZEN [MCAD.NET]

Joerg Jooss

Colin Peters

Colin Peters

Juan T. Llibre

Joerg Jooss

Colin Peters

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads