Diacritical marks in array don't translate

Discussion in 'Javascript' started by jiverbean, Nov 11, 2005.

  1. jiverbean

    jiverbean Guest

    Dear group members,

    I cam across a glitch in Javascript and I don't know how to solve it
    elegantly.

    I have an array with strings of German words:

    profile[1] = "Fröhliches Fräulein";

    Because HTML doesn't or didn't allow some of these characters, I wrote:

    profile[1] = "Fröhliches Fräulein";

    but when I use an alert(profile[1]); the dialog displays the escape
    codes instead of the diacritical marks. I then figured the unescape()
    function would solve the problem, but not. I don't want to write:

    profile[1] = "Fr%190hliches Fr%191ulein";
    alert(unescape(profile[1]));

    The numbers in the above example only serve to illustrate the idea. I
    don't know where to look the exact numbers up, unless they are the
    ASCII codes. I haven't tried the last technique yet, but I'm pondering
    the issue

    Any suggestions,
    Jean Biver

    ________________________________________________
    Check out my home page at http://homepage.internet.lu/aibiver
    Please recommend my seti@home profile at
    http://setiathome2.ssl.berkeley.edu/fcgi-bin/fcgi?cmd=view_feedback&id=26539
     
    jiverbean, Nov 11, 2005
    #1
    1. Advertising

  2. jiverbean

    Safalra Guest

    On 11 Nov 2005 07:09:43 -0800, jiverbean wrote:
    > I cam across a glitch in Javascript and I don't know how to solve it
    > elegantly.
    >
    > I have an array with strings of German words:
    >
    > profile[1] = "Fröhliches Fräulein";
    >
    > Because HTML doesn't or didn't allow some of these characters


    Then you need to use a different character set for your document. Try
    ISO-8859-1, which allows the standard European accented characters.

    --
    Safalra (Stephen Morley)
    http://www.safalra.com/programming/javascript/
     
    Safalra, Nov 11, 2005
    #2
    1. Advertising

  3. jiverbean wrote:

    > I have an array with strings of German words:
    >
    > profile[1] = "Fröhliches Fräulein";
    >
    > Because HTML doesn't or didn't allow some of these characters,


    That's an urban legend that will probably never die. HTML allows these
    characters, HTTP is and has been 8-bit-safe. You just need to declare
    that with the Content-Type header and, for offline use,

    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=...">
    ...
    </head>

    A good reason for escaping 8-bit characters _in HTML_ is editing on
    different platforms without having the knowledge or facility (due to
    keyboard layout) to type them there.

    <http://www.htmlhelp.com/faq/html/design.html#entity-or-number>

    > I wrote:
    >
    > profile[1] = "Fr&ouml;hliches Fr&auml;ulein";


    JS (programming language) is not HTML (markup language). This source code
    has to be interpreted by the JS engine, and it does not and is not supposed
    to "know" how to handle SGML character entity references like "&ouml;".

    There is no problem you have to work around.

    > but when I use an alert(profile[1]); the dialog displays the escape
    > codes instead of the diacritical marks. I then figured the unescape()
    > function would solve the problem, but not. I don't want to write:
    >
    > profile[1] = "Fr%190hliches Fr%191ulein";
    > alert(unescape(profile[1]));


    It is not supposed to work anyway. unescape(), which is proprietary,
    accepts only 8-bit escape sequences (in contrast to standardized
    decodeURI*()). The above results in

    Fr<EM>0hliches Fr<EM>1ulein

    where <EM> is the character at code point 0x19 (31).

    > ________________________________________________
    > [...]


    Signatures are to be delimited by a line containing only "--<SP><CR><LF>".


    HTH

    PointedEars (a German)
     
    Thomas 'PointedEars' Lahn, Nov 11, 2005
    #3
  4. jiverbean

    Robert Guest

    jiverbean wrote:
    > Dear group members,
    >
    > I cam across a glitch in Javascript and I don't know how to solve it
    > elegantly.
    >
    > I have an array with strings of German words:
    >
    > profile[1] = "Fröhliches Fräulein";


    The fact that these words are in an array doesn't matter.
    The problem that you are probably having is that the encoding that your
    html and/or javascript is saved is in a different encoding than the
    encoding you specified in your HTML. Or maybe you forgot to specify the
    encoding and the encoding is wrongly auto-detected.

    The most useuful encoding in your case is probably UTF-8.
    So makes sure you have this in your header:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    Next make sure that your html/javascript file is in UTF-8 format.
    I myself use emeditor as a text editior, because it uses the correct
    Unicode terminology for saving files.


    >
    > Because HTML doesn't or didn't allow some of these characters, I wrote:
    >
    > profile[1] = "Fr&ouml;hliches Fr&auml;ulein";
    >
    > but when I use an alert(profile[1]); the dialog displays the escape
    > codes instead of the diacritical marks.


    That's because javascript is not html. Javascript has other mechanisms
    for escaping characters such as the \u for any unicode character.
    So you can write
    profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
    if you want to save your file in a different encoding as your output.

    Robert.
     
    Robert, Nov 11, 2005
    #4
  5. Robert wrote:

    > The problem that you are probably having is that the encoding that your
    > html and/or javascript is saved is in a different encoding than the
    > encoding you specified in your HTML. Or maybe you forgot to specify the
    > encoding and the encoding is wrongly auto-detected.
    >
    > The most useuful encoding in your case is probably UTF-8.
    > So makes sure you have this in your header:
    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>


    Utter nonsense.

    First, the above (which cannot be part of the [HTTP] header, but of
    the `head' element) will not suffice, the HTTP Content-Type header is
    important. Second, UTF-8, especially the German umlauts in it, is
    not compatible to ISO-8859-* (encoding is different), and you do not
    know that he used a Unicode editor for this file.

    > Next make sure that your html/javascript file is in UTF-8 format.


    He does not need to and should not want to if not necessary.
    ISO-8859-1(5) will suffice and will be more widely supported.

    > So you can write
    > profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
    > if you want to save your file in a different encoding as your output.


    Provided that the used script engine supports Unicode escape sequences.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Nov 11, 2005
    #5
  6. Robert wrote:

    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>


    I forgot to mention that the above is not Valid HTML. It is subject to
    error-correction if SGML NET is ignored; if not, it is equivalent to

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;

    XHTML != HTML.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Nov 11, 2005
    #6
  7. jiverbean

    Robert Guest

    Thomas 'PointedEars' Lahn wrote:
    > Robert wrote:
    >
    >
    >>The problem that you are probably having is that the encoding that your
    >>html and/or javascript is saved is in a different encoding than the
    >>encoding you specified in your HTML. Or maybe you forgot to specify the
    >>encoding and the encoding is wrongly auto-detected.
    >>
    >>The most useuful encoding in your case is probably UTF-8.
    >>So makes sure you have this in your header:
    >><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    >
    >
    > Utter nonsense.


    What part is utter nonsense?

    > First, the above (which cannot be part of the [HTTP] header, but of
    > the `head' element) will not suffice, the HTTP Content-Type header is
    > important.


    As can clearly seen by the syntax I was talking about HTML and not the
    header transmitted by a server.
    To my knowledge the content-type of the HTML file overrides the one
    given by the webserver. However I may be wrong about this and therefore
    made no comment about it before. It does not change the fact that as a
    good author you must provide the content-type in your webpage.


    > Second, UTF-8, especially the German umlauts in it, is
    > not compatible to ISO-8859-* (encoding is different),


    Where exactly did you see me write that these are compatible?

    > and you do not
    > know that he used a Unicode editor for this file.


    Where exactly did you see me write this?
    I actually made a suggestion for a good editor for him if he needed it.

    >>Next make sure that your html/javascript file is in UTF-8 format.

    >
    >
    > He does not need to and should not want to if not necessary.
    > ISO-8859-1(5) will suffice and will be more widely supported.


    There is huge support for Unicode.

    I cannot see his full needs in one word. Maybe he will need characters
    that are not in ISO-8859-1 soon. In any case ISO-8859-1 may suffice, but
    UTF-8 will suffice for sure and it's just as easy to use.

    >
    >>So you can write
    >>profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
    >>if you want to save your file in a different encoding as your output.

    >
    >
    > Provided that the used script engine supports Unicode escape sequences.


    Which it should in 2005.

    Don't make yourself look ridiculous by saying something is utter nonsense.
    Unicode is very important.
     
    Robert, Nov 11, 2005
    #7
  8. jiverbean

    Robert Guest

    Thomas 'PointedEars' Lahn wrote:
    > Robert wrote:
    >
    >
    >><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    >
    >
    > I forgot to mention that the above is not Valid HTML.


    The original poster did not specify if he wanted HTML or XHTML.

    > It is subject to
    > error-correction if SGML NET is ignored; if not, it is equivalent to
    >
    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;
    >
    > XHTML != HTML.


    Most browsers do not use real SGML parsers and will not see the
    difference between those two.
     
    Robert, Nov 11, 2005
    #8
  9. Robert wrote:

    > Thomas 'PointedEars' Lahn wrote:
    >> Robert wrote:
    >>> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    >>
    >> I forgot to mention that the above is not Valid HTML.

    >
    > The original poster did not specify if he wanted HTML or XHTML.


    However, he specified that he is using HTML right now. Why
    do you try to force XHTML (or HTML to be error-corrected,
    for that matter) on him, with all its ramifications?

    >> It is subject to
    >> error-correction if SGML NET is ignored; if not, it is equivalent to
    >>
    >> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;
    >>
    >> XHTML != HTML.

    >
    > Most browsers do not use real SGML parsers and will not see the
    > difference between those two.


    Relying on error-correction is error-prone.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Nov 11, 2005
    #9
  10. Robert wrote:

    > Thomas 'PointedEars' Lahn wrote:
    >> Robert wrote:
    >>> The problem that you are probably having is that the encoding that your
    >>> html and/or javascript is saved is in a different encoding than the
    >>> encoding you specified in your HTML. Or maybe you forgot to specify the
    >>> encoding and the encoding is wrongly auto-detected.
    >>>
    >>> The most useuful encoding in your case is probably UTF-8.
    >>> So makes sure you have this in your header:
    >>> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    >>
    >> Utter nonsense.

    >
    > What part is utter nonsense?
    >
    >> First, the above (which cannot be part of the [HTTP] header, but of
    >> the `head' element) will not suffice, the HTTP Content-Type header is
    >> important.

    >
    > As can clearly seen by the syntax I was talking about HTML and not the
    > header transmitted by a server.


    There is no such thing as a HTML header. There is the HTML `head'
    element, which is a completely different thing. To call the latter
    a "header" is inappropriate.

    > To my knowledge the content-type of the HTML file overrides the one
    > given by the webserver. However I may be wrong about this and therefore
    > made no comment about it before.


    It MAY override the default (before serving), there is no MUST.
    HTML 4.01, section 7.4.4, clearly states that:

    ,-<http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2>
    |
    | The http-equiv attribute can be used in place of the name attribute and
    | has a special significance when documents are retrieved via the Hypertext
    | Transfer Protocol (HTTP). HTTP servers may use the property name specified
    | by the http-equiv attribute to create an [RFC822]-style header in the HTTP
    | response.

    Most notably, the HTML 4.01 Specification does _not_ state that user agents
    MUST or MAY allow the Content-Type header to be overridden by the `meta'
    element _after_ the document was served with a different header value.

    > It does not change the fact that as a good author you must provide the
    > content-type in your webpage.


    For possible future non-HTTP use. Yes, indeed.

    >> Second, UTF-8, especially the German umlauts in it, is
    >> not compatible to ISO-8859-* (encoding is different),

    >
    > Where exactly did you see me write that these are compatible?


    Your statement is written in a way that is looks like as if the
    OP does not have the choice. You have been proposing the more
    complicated way when there is a simpler and still compliant one
    which I consider a Bad Thing, especially when addressing a newbie.

    > > > Next make sure that your html/javascript file is in UTF-8 format.

    > >
    > > He does not need to and should not want to if not necessary.
    > > ISO-8859-1(5) will suffice and will be more widely supported.

    >
    > There is huge support for Unicode.


    Especially on the Web, one has to consider to be backwards compatible.
    There are used UAs out there which does not support Unicode, so it is
    unwise to use or recommend that if not needed. And it is certainly
    not needed here.

    > Don't make yourself look ridiculous by saying something is utter
    > nonsense.


    I may have been a bit harsh but proposing to declare UTF-8 and using
    an Unicode-compatible editor where ISO-8859-* and any text editor
    sufficed seemed rather quite ridiculous to me.

    > Unicode is very important.


    Unicode is very important, I did not and do not doubt that. However,
    using and recommending it without thinking of the ramifications of its
    use only makes matters worse.


    Regards,
    PointedEars
     
    Thomas 'PointedEars' Lahn, Nov 11, 2005
    #10
  11. jiverbean

    Robert Guest

    Thomas 'PointedEars' Lahn wrote:
    >>>Utter nonsense.

    >>
    >>What part is utter nonsense?
    >>
    >>

    > There is no such thing as a HTML header. There is the HTML `head'
    > element, which is a completely different thing. To call the latter
    > a "header" is inappropriate.


    Maybe not 100% accurate, but not utter nonsense.

    >>It does not change the fact that as a good author you must provide the
    >>content-type in your webpage.

    >
    > For possible future non-HTTP use. Yes, indeed.


    So not utter nonsense too.

    >>>Second, UTF-8, especially the German umlauts in it, is
    >>>not compatible to ISO-8859-* (encoding is different),

    >>
    >>Where exactly did you see me write that these are compatible?

    >
    >
    > Your statement is written in a way that is looks like as if the
    > OP does not have the choice.


    I do not see it in that way. I clearly stated his problem and said UTF-8
    is probably best for him and how he could fix it.

    > You have been proposing the more
    > complicated way when there is a simpler and still compliant one
    > which I consider a Bad Thing, especially when addressing a newbie.


    I do not think it is the more complicated way. Even for a newbie Unicode
    awareness cannot come soon enough. Actually I do not know if this person
    is a newbie, because I have seen developers with years of experience,
    but no knowledge about character sets and encodings, and have the same
    problems that he is having.

    > Especially on the Web, one has to consider to be backwards compatible.
    > There are used UAs out there which does not support Unicode, so it is
    > unwise to use or recommend that if not needed. And it is certainly
    > not needed here.


    The sooner everyone adopts the Unicode standard, the faster these
    outdated user agents will be updated.

    > I may have been a bit harsh but proposing to declare UTF-8 and using
    > an Unicode-compatible editor where ISO-8859-* and any text editor
    > sufficed seemed rather quite ridiculous to me.


    Even notepad (windows xp) can save in UTF-8!

    Look it is obvious that we have different views towards using Unicode,
    and there is room for discussion. But to just put away with it as utter
    nonsense is insulting.
    I just wish someone made me aware of Unicode and encodings in my newbie
    days. Now the original poster is aware, he can inform himself further
    and can make a conscious decision. And when he decides to use ISO-8859-1
    instead, I am sure he is capable to change my suggestion to
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    instead
     
    Robert, Nov 12, 2005
    #11
  12. jiverbean

    Robert Guest

    Thomas 'PointedEars' Lahn wrote:
    > Robert wrote:
    >
    >
    >>Thomas 'PointedEars' Lahn wrote:
    >>
    >>>Robert wrote:
    >>>
    >>>><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    >>>
    >>>I forgot to mention that the above is not Valid HTML.

    >>
    >>The original poster did not specify if he wanted HTML or XHTML.

    >
    >
    > However, he specified that he is using HTML right now.


    Ok did not see that.
    Just copied it and did not think about removing the last slash.
     
    Robert, Nov 12, 2005
    #12
  13. Robert wrote:

    > Thomas 'PointedEars' Lahn wrote:
    >> Especially on the Web, one has to consider to be backwards compatible.
    >> There are used UAs out there which does not support Unicode, so it is
    >> unwise to use or recommend that if not needed. And it is certainly
    >> not needed here.

    >
    > The sooner everyone adopts the Unicode standard, the faster these
    > outdated user agents will be updated.


    Though I am all in for standards compliance (you will find me advocating
    Valid markup, W3C DOM compliant scripting and the like here the n-th time),
    especially but not exclusively people trying to make money on and from the
    Web can seldom afford this extreme attitude, sometimes in the literal
    sense.

    It has always been my opinion that a Web developer should try to get as
    much audience as possible if the odds for achieving this are acceptable.
    Since Unicode is not needed here and the alternative is easy to implement
    while having the advantage of broader support, I find them acceptable here.

    BTW, talking about adhering to standards, your From header is a violation
    of RFC2822, section 3.4.

    >> I may have been a bit harsh but proposing to declare UTF-8 and using
    >> an Unicode-compatible editor where ISO-8859-* and any text editor
    >> sufficed seemed rather quite ridiculous to me.

    >
    > Even notepad (windows xp) can save in UTF-8!


    Interesting, I did not know that. My work platforms are GNU/Linux
    and (seldom) Win2k (where I even more seldom use Notepad).


    PointedEars
     
    Thomas 'PointedEars' Lahn, Nov 12, 2005
    #13
  14. jiverbean

    Dag Sunde Guest

    "Thomas 'PointedEars' Lahn" <> wrote in message
    news:...
    > Robert wrote:
    >

    <snipped/>
    > Even notepad (windows xp) can save in UTF-8!
    >
    > Interesting, I did not know that. My work platforms are GNU/Linux
    > and (seldom) Win2k (where I even more seldom use Notepad).


    Yup...
    * ANSI
    * Unicode
    * Unicode Big Endian
    * UTF-8

    You can select it from a combobox in the save dialog.
    (ANSI is default).

    --
    Dag.
     
    Dag Sunde, Nov 12, 2005
    #14
  15. jiverbean

    Robert Guest

    Dag Sunde wrote:
    > "Thomas 'PointedEars' Lahn" <> wrote in message
    > news:...
    >
    >>Robert wrote:
    >>

    >
    > <snipped/>
    >
    >>Even notepad (windows xp) can save in UTF-8!
    >>

    > Yup...
    > * ANSI
    > * Unicode
    > * Unicode Big Endian
    > * UTF-8
    >
    > You can select it from a combobox in the save dialog.
    > (ANSI is default).


    Just wanted to comment that of course the "Unicode" selection is kinda
    ridiculous. What they meant was UTF-16 (Little Endian)

    Actually ANSI is kinda ridiculous too, because it has nothing to do with
    the American National Standards Institute.
     
    Robert, Nov 12, 2005
    #15
  16. jiverbean

    Dag Sunde Guest

    "Robert" <> wrote in message
    news:43760386$0$11063$4all.nl...
    > Dag Sunde wrote:
    >> "Thomas 'PointedEars' Lahn" <> wrote in message
    >> news:...
    >>
    >>>Robert wrote:
    >>>

    >>
    >> <snipped/>
    >>
    >>>Even notepad (windows xp) can save in UTF-8!
    >>>

    >> Yup...
    >> * ANSI
    >> * Unicode
    >> * Unicode Big Endian
    >> * UTF-8
    >>
    >> You can select it from a combobox in the save dialog.
    >> (ANSI is default).

    >
    > Just wanted to comment that of course the "Unicode" selection is kinda
    > ridiculous. What they meant was UTF-16 (Little Endian)
    >
    > Actually ANSI is kinda ridiculous too, because it has nothing to do with
    > the American National Standards Institute.


    Of course it is ridiculous...
    It's MS NotePad!

    ;-)

    --
    Dag.
     
    Dag Sunde, Nov 12, 2005
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. adamskim

    French diacritical marks

    adamskim, Dec 13, 2004, in forum: Java
    Replies:
    4
    Views:
    684
    Real Gagnon
    Dec 13, 2004
  2. Girish Sharma

    Diacritical marks in HTML?

    Girish Sharma, Nov 27, 2004, in forum: HTML
    Replies:
    11
    Views:
    4,010
    Jukka K. Korpela
    Dec 1, 2004
  3. Dado
    Replies:
    5
    Views:
    1,062
  4. Berteun Damman

    textwrap and combining diacritical marks

    Berteun Damman, Jun 28, 2007, in forum: Python
    Replies:
    1
    Views:
    337
    Berteun Damman
    Jun 28, 2007
  5. Paul Barry

    removing diacritical marks

    Paul Barry, Mar 17, 2006, in forum: Ruby
    Replies:
    2
    Views:
    228
    Paul Battley
    Mar 17, 2006
Loading...

Share This Page