Special character to &abc equivalents

Discussion in 'ASP .Net' started by Colin Peters, May 7, 2005.

  1. Colin Peters

    Colin Peters Guest

    Hi,

    I'm reading a file and writing it to the html output for a page.

    I've come across two difficulties which I would like to solve.

    The files contain special characters from European alphabets, namely
    those which have the two little dots above the vowels called umlauts.

    Normally, these are rendered in html using "%auml;", but in the file
    they are just ä.

    1. I'm using a StreamReader to read the file and I have found that if I
    don't use System.Text.Encoding.UTF7 then the characters are lost
    completely. Is this the correct way, or is there a way to automatically
    get the Stream Reader to select the correct encoding, or use other code
    to determine which would be best?

    2. Having read the character from the file, it is output literally to
    the html, which I guess is to be expected. Is there a way to process a
    string in order to change the ä to &äuml; and so on.

    Thanks in advance for any replies.
    Colin Peters, May 7, 2005
    #1
    1. Advertising

  2. My advice u set underlying operating system encoding whatever u want. And
    use streamreader and streamwriter with System.Text.Encoding.Default which
    uses underlying OS encoding.

    I had same problems with Turkish encoding but this is the best solution
    (IMHO)

    --

    Thanks,
    Yunus Emre ALPÖZEN
    BSc, MCAD.NET

    "Colin Peters" <> wrote in message
    news:...
    > Hi,
    >
    > I'm reading a file and writing it to the html output for a page.
    >
    > I've come across two difficulties which I would like to solve.
    >
    > The files contain special characters from European alphabets, namely those
    > which have the two little dots above the vowels called umlauts.
    >
    > Normally, these are rendered in html using "%auml;", but in the file they
    > are just ä.
    >
    > 1. I'm using a StreamReader to read the file and I have found that if I
    > don't use System.Text.Encoding.UTF7 then the characters are lost
    > completely. Is this the correct way, or is there a way to automatically
    > get the Stream Reader to select the correct encoding, or use other code to
    > determine which would be best?
    >
    > 2. Having read the character from the file, it is output literally to the
    > html, which I guess is to be expected. Is there a way to process a string
    > in order to change the ä to &äuml; and so on.
    >
    > Thanks in advance for any replies.
    >
    >
    Yunus Emre ALPÖZEN [MCAD.NET], May 7, 2005
    #2
    1. Advertising

  3. Colin Peters

    Joerg Jooss Guest

    Colin Peters wrote:

    > Hi,
    >
    > I'm reading a file and writing it to the html output for a page.
    >
    > I've come across two difficulties which I would like to solve.
    >
    > The files contain special characters from European alphabets, namely
    > those which have the two little dots above the vowels called umlauts.
    >
    > Normally, these are rendered in html using "%auml;", but in the file
    > they are just ä.
    >
    > 1. I'm using a StreamReader to read the file and I have found that if
    > I don't use System.Text.Encoding.UTF7 then the characters are lost
    > completely.


    UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

    > Is this the correct way, or is there a way to
    > automatically get the Stream Reader to select the correct encoding,
    > or use other code to determine which would be best?


    In general, there's no way to guess a character encoding because
    there's no universal metadata that could tell you what encoding is
    being used.

    To put it differently: You must know the encoding, or allow the user to
    switch between possible encodings.


    > 2. Having read the character from the file, it is output literally to
    > the html, which I guess is to be expected. Is there a way to process
    > a string in order to change the ä to &äuml; and so on.


    That's not necessary if the page is encoded correctly.

    Cheers,
    --
    http://www.joergjooss.de
    mailto:
    Joerg Jooss, May 7, 2005
    #3
  4. Colin Peters

    Colin Peters Guest

    Yunus Emre ALPÖZEN [MCAD.NET] wrote:

    > My advice u set underlying operating system encoding whatever u want. And
    > use streamreader and streamwriter with System.Text.Encoding.Default which
    > uses underlying OS encoding.
    >
    > I had same problems with Turkish encoding but this is the best solution
    > (IMHO)
    >


    Unfortunately I'm using shared hosting. I have little influence over
    operating system parameters.

    Thanks anyway.
    Colin Peters, May 7, 2005
    #4
  5. Colin Peters

    Colin Peters Guest

    Joerg Jooss wrote:

    > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?



    I didn't see this as an option provided by Intellisense for the class:
    System.Text.Encoding

    Thanks anyway.

    > Colin Peters wrote:
    >
    >
    >>Hi,
    >>
    >>I'm reading a file and writing it to the html output for a page.
    >>
    >>I've come across two difficulties which I would like to solve.
    >>
    >>The files contain special characters from European alphabets, namely
    >>those which have the two little dots above the vowels called umlauts.
    >>
    >>Normally, these are rendered in html using "%auml;", but in the file
    >>they are just ä.
    >>
    >>1. I'm using a StreamReader to read the file and I have found that if
    >>I don't use System.Text.Encoding.UTF7 then the characters are lost
    >>completely.

    >
    >
    > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?
    >
    >
    >>Is this the correct way, or is there a way to
    >>automatically get the Stream Reader to select the correct encoding,
    >>or use other code to determine which would be best?

    >
    >
    > In general, there's no way to guess a character encoding because
    > there's no universal metadata that could tell you what encoding is
    > being used.
    >
    > To put it differently: You must know the encoding, or allow the user to
    > switch between possible encodings.
    >
    >
    >
    >>2. Having read the character from the file, it is output literally to
    >>the html, which I guess is to be expected. Is there a way to process
    >>a string in order to change the ä to &äuml; and so on.

    >
    >
    > That's not necessary if the page is encoded correctly.
    >
    > Cheers,
    Colin Peters, May 7, 2005
    #5
  6. You can set the encoding as a Page directive.

    <%@Page Language="VB" ResponseEncoding="UTF-8"%>

    <%@Page Language="C#" ResponseEncoding="ISO-8859-1"%>





    Juan T. Llibre
    ASP.NET MVP
    http://asp.net.do/foros/
    Foros de ASP.NET en Español
    Ven, y hablemos de ASP.NET...
    ======================

    "Colin Peters" <> wrote in message
    news:...
    > Joerg Jooss wrote:
    >
    > > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

    >
    >
    > I didn't see this as an option provided by Intellisense for the class:
    > System.Text.Encoding
    >
    > Thanks anyway.


    >> Colin Peters wrote:
    >>>Hi,
    >>>
    >>>I'm reading a file and writing it to the html output for a page.
    >>>
    >>>I've come across two difficulties which I would like to solve.
    >>>
    >>>The files contain special characters from European alphabets, namely
    >>>those which have the two little dots above the vowels called umlauts.

    4>>>
    >>>Normally, these are rendered in html using "%auml;", but in the file
    >>>they are just ä.
    >>>
    >>>1. I'm using a StreamReader to read the file and I have found that if
    >>>I don't use System.Text.Encoding.UTF7 then the characters are lost
    >>>completely.

    >>
    >>
    >> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?
    >>
    >>
    >>>Is this the correct way, or is there a way to
    >>>automatically get the Stream Reader to select the correct encoding,
    >>>or use other code to determine which would be best?

    >>
    >>
    >> In general, there's no way to guess a character encoding because
    >> there's no universal metadata that could tell you what encoding is
    >> being used.
    >>
    >> To put it differently: You must know the encoding, or allow the user to
    >> switch between possible encodings.
    >>
    >>
    >>
    >>>2. Having read the character from the file, it is output literally to
    >>>the html, which I guess is to be expected. Is there a way to process
    >>>a string in order to change the ä to &äuml; and so on.

    >>
    >>
    >> That's not necessary if the page is encoded correctly.
    >>
    >> Cheers,
    Juan T. Llibre, May 7, 2005
    #6
  7. Colin Peters

    Joerg Jooss Guest

    Colin Peters wrote:

    > Joerg Jooss wrote:
    >
    > > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or

    > Windows-1252?
    >
    >
    > I didn't see this as an option provided by Intellisense for the class:
    > System.Text.Encoding


    There are only a few default instances in Encoding. You can construct
    all encodings by name using Encoding.GetEncoding(), e.g.

    Encoding enc = Encoding.GetEncoding("ISO-8859-1").

    Cheers,
    --
    http://www.joergjooss.de
    mailto:
    Joerg Jooss, May 7, 2005
    #7
  8. Colin Peters

    Colin Peters Guest

    Aha! The penny has dropped. Or in this case, the Euro.

    Many thanks to all.



    Joerg Jooss wrote:

    > Colin Peters wrote:
    >
    >
    >>Joerg Jooss wrote:
    >>
    >> > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or

    >>Windows-1252?
    >>
    >>
    >>I didn't see this as an option provided by Intellisense for the class:
    >>System.Text.Encoding

    >
    >
    > There are only a few default instances in Encoding. You can construct
    > all encodings by name using Encoding.GetEncoding(), e.g.
    >
    > Encoding enc = Encoding.GetEncoding("ISO-8859-1").
    >
    > Cheers,
    Colin Peters, May 7, 2005
    #8
  9. Server.HtmlEncode(string) will convert any "special chars" from a text file
    to the relevant &abc; equivalent without having to worry about codepages... I
    use it in my chat application to prevent malicious code being inserted into
    the database.

    Regards,

    Paul Parkinson (www.elysaria.com)

    "Colin Peters" wrote:

    > Aha! The penny has dropped. Or in this case, the Euro.
    >
    > Many thanks to all.
    >
    >
    >
    > Joerg Jooss wrote:
    >
    > > Colin Peters wrote:
    > >
    > >
    > >>Joerg Jooss wrote:
    > >>
    > >> > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or
    > >>Windows-1252?
    > >>
    > >>
    > >>I didn't see this as an option provided by Intellisense for the class:
    > >>System.Text.Encoding

    > >
    > >
    > > There are only a few default instances in Encoding. You can construct
    > > all encodings by name using Encoding.GetEncoding(), e.g.
    > >
    > > Encoding enc = Encoding.GetEncoding("ISO-8859-1").
    > >
    > > Cheers,

    >
    =?Utf-8?B?UGF1bCBQYXJraW5zb24=?=, May 9, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jiong Feng
    Replies:
    0
    Views:
    802
    Jiong Feng
    Nov 19, 2003
  2. Bruce Sam
    Replies:
    15
    Views:
    7,877
    John C. Bollinger
    Nov 19, 2004
  3. vsgdp

    ABC inheriting from ABC

    vsgdp, Sep 24, 2005, in forum: C++
    Replies:
    1
    Views:
    302
    vsgdp
    Sep 24, 2005
  4. Gunter Henriksen

    x.abc vs x['abc']

    Gunter Henriksen, May 13, 2009, in forum: Python
    Replies:
    1
    Views:
    345
    alex23
    May 15, 2009
  5. Replies:
    4
    Views:
    142
Loading...

Share This Page