a language encoding issue

Discussion in 'HTML' started by JD, Mar 31, 2009.

  1. JD

    JD Guest

    Hi,

    I am a yahoo group user. It has a table (<table> .. </table>) where I can
    enter text which will show up in the group's home page. I want to enter
    text in a non-English languag. So, I use the following pair to enclose my
    text.

    <SPAN LANG="UTF-8">
    ....
    </SPAN>

    But the text is not shown properly. I will have to set "charset" for
    browsers to encode properly (as shown below).

    <meta http-equiv="Content-Type" content="text/html; charset=...">

    The thing is that charset should be set in a page head section. I have no
    control over that during entering text into the yahoo group table. Is it
    possible to switch different language encouding inside a span (<SPAN> ..
    </SPAN>)?

    Any help would be much appreciated.

    JD
     
    JD, Mar 31, 2009
    #1
    1. Advertising

  2. JD wrote:

    > I am a yahoo group user. It has a table (<table> .. </table>) where I
    > can enter text which will show up in the group's home page. I want
    > to enter text in a non-English languag.


    Are you typing into the cell of a table? A <td> ... </td> ?

    > So, I use the following pair to enclose my text.
    >
    > <SPAN LANG="UTF-8">


    utf-8 is not a LANGuage. English, or French are languages.

    > ...
    > </SPAN>
    >
    > But the text is not shown properly. I will have to set "charset" for
    > browsers to encode properly (as shown below).
    >
    > <meta http-equiv="Content-Type" content="text/html; charset=...">
    >
    > The thing is that charset should be set in a page head section. I
    > have no control over that during entering text into the yahoo group
    > table.


    That never works, anyway. What charset do the page's response headers
    show. It may already be utf-8.

    > Is it possible to switch different language encouding inside
    > a span (<SPAN> .. </SPAN>)?


    <span> will be canceled by the next block element. Use <div> .. </div>
    instead.

    --
    -bts
    -Friends don't let friends drive Windows
     
    Beauregard T. Shagnasty, Mar 31, 2009
    #2
    1. Advertising

  3. JD wrote:

    > You were right. I am typing into a table cell. Also, it's wrong to
    > use UTF-8 in the language tag. I already corrected it but text is
    > still not properly shown. I check the source of the page. All the
    > charset occurences are utf-8 already. But none of them show up
    > between <head> .. </head>.


    Again, placing meta-charset lines doesn't do anything *unless* your
    server is sending as charset: none. In Firefox, while viewing the page,
    do Tools > Page Info and see what it says for encoding. You can also
    install the Web Developer Toolbar, and see Response Headers.
    http://chrispederick.com/work/web-developer/

    > I also change from the SPAN tag to DIV tag, but it doesn't help.


    Maybe if you would give a link to the page, or one the masses can
    access, and tell exactly what language you _want_ to use (you know, like
    Greek or Chinese or Russian), maybe someone will have some more advice.

    Please don't top-post.

    --
    -bts
    -Friends don't let friends drive Windows
     
    Beauregard T. Shagnasty, Mar 31, 2009
    #3
  4. JD wrote:

    > "Beauregard T. Shagnasty" wrote:
    >> Maybe if you would give a link to the page, or one the masses can
    >> access, and tell exactly what language you _want_ to use (you know, like
    >> Greek or Chinese or Russian), maybe someone will have some more advice.

    >
    > I would appreciate it very much if you or someone could look into the
    > following link:
    >
    > http://groups.yahoo.com/group/EnyoungCCCTO/


    That page's server (a Linux server running YTS/1/17/9 software) is
    already sending: Encoding ISO-8859-1

    You will not be able to change it. Follow Ben C's advice about using
    numeric character entities; that would be your only recourse.

    --
    -bts
    -Friends don't let friends drive Windows
     
    Beauregard T. Shagnasty, Mar 31, 2009
    #4
  5. JD

    JD Guest

    "Ben C" <> wrote in message
    news:...
    > On 2009-03-31, JD <> wrote:
    >> Hi,
    >>
    >> I am a yahoo group user. It has a table (<table> .. </table>) where I
    >> can
    >> enter text which will show up in the group's home page. I want to enter
    >> text in a non-English languag. So, I use the following pair to enclose
    >> my
    >> text.
    >>
    >><SPAN LANG="UTF-8">
    >> ...
    >></SPAN>
    >>
    >> But the text is not shown properly. I will have to set "charset" for
    >> browsers to encode properly (as shown below).
    >>
    >><meta http-equiv="Content-Type" content="text/html; charset=...">
    >>
    >> The thing is that charset should be set in a page head section. I have
    >> no
    >> control over that during entering text into the yahoo group table. Is it
    >> possible to switch different language encouding inside a span (<SPAN> ..
    >></SPAN>)?
    >>
    >> Any help would be much appreciated.

    >
    > No, you can't change the encoding half-way through the page.
    >
    > If the original encoding is, say, Latin-1 and you want to insert some
    > Chinese (which is not representable in Latin-1), you can use numeric
    > entities.
    >
    > e.g. like this 什么是
    >
    > You could write your original Chinese (or whatever is is) in a text
    > editor, then use a program called "recode" which you can download to
    > turn it into those entities. Then paste that into the web page.
    >
    > http://www.gnu.org/software/recode/


    Thank you both, Beauregard and Ben. You both answer my questions. I was
    wondering why the yahoo server sometimes changed my text into those funny
    numeric entrities. I always changed them back for easy maintenance. Now I
    will follow your instruction to recode first and then copy/paste. Thanks so
    much for the help.

    JD
     
    JD, Mar 31, 2009
    #5
  6. JD

    John Hosking Guest

    Beauregard T. Shagnasty wrote:
    > JD wrote:
    >
    >> ... Also, it's wrong to
    >> use UTF-8 in the language tag. I already corrected it but text is
    >> still not properly shown. I check the source of the page. All the
    >> charset occurences are utf-8 already. But none of them show up
    >> between <head> .. </head>.

    >
    > Again, placing meta-charset lines doesn't do anything *unless* your
    > server is sending as charset: none.
    >


    This is a very interesting statement to me, Beauregard, as I just
    responded to somebody in another group on this same subject. I acted as
    if I knew what I was talking about, but your statement makes me suddenly
    unsure.

    The poster (<>) was using
    meta http-equiv in her <head> but the W3C validator didn't find any
    encoding and therefore failed to check the page.

    In my response (<49d14ce9$>), I told her that the
    http-equiv was moot, since her server was sending "charset=none". Now
    you make me think that was an incorrect analysis of her problem.

    Would you care to pop over to c.i.w.a.html and clear things up in that
    thread? Or perhaps post again here with further explication or a pointer
    to something about what "none" means (Googling didn't help me here)?

    --
    John
     
    John Hosking, Mar 31, 2009
    #6
  7. John Hosking wrote:

    > Beauregard T. Shagnasty wrote:
    >> Again, placing meta-charset lines doesn't do anything *unless* your
    >> server is sending as charset: none.

    >
    > This is a very interesting statement to me, Beauregard, as I just
    > responded to somebody in another group on this same subject. I acted
    > as if I knew what I was talking about, but your statement makes me
    > suddenly unsure.


    Actually, I based my above reply on your post in the other group, 'cause
    I surely thought you knew what you were talkin' about. ;-)

    It sounds logical. If the server already sends one (such as the OP's
    sample page) like ISO-8859-1, then no manner of <meta> HTML code will
    change it. Unless - possibly - there isn't one from the server.

    --
    -bts
    -Friends don't let friends drive Windows
     
    Beauregard T. Shagnasty, Mar 31, 2009
    #7
  8. JD

    dorayme Guest

    In article <gqt2ak$987$>,
    "Beauregard T. Shagnasty" <> wrote:

    > JD wrote:
    >
    > > "Beauregard T. Shagnasty" wrote:
    > >> Maybe if you would give a link to the page, or one the masses can
    > >> access, and tell exactly what language you _want_ to use (you know, like
    > >> Greek or Chinese or Russian), maybe someone will have some more advice.

    > >
    > > I would appreciate it very much if you or someone could look into the
    > > following link:
    > >
    > > http://groups.yahoo.com/group/EnyoungCCCTO/

    >
    > That page's server (a Linux server running YTS/1/17/9 software) is
    > already sending: Encoding ISO-8859-1
    >


    Earlier you said "In Firefox, while viewing the page, do Tools > Page
    Info and see what it says for encoding". On my Mac FF, Tools > Page Info
    says UTF-8. Using Web Developer Tools/Validate HTML, No Character
    Encoding Found! Falling back to windows-1252.

    --
    dorayme
     
    dorayme, Mar 31, 2009
    #8
  9. dorayme wrote:

    > "Beauregard T. Shagnasty" wrote:
    >> JD wrote:
    >>> http://groups.yahoo.com/group/EnyoungCCCTO/

    >>
    >> That page's server (a Linux server running YTS/1/17/9 software) is
    >> already sending: Encoding ISO-8859-1

    >
    > Earlier you said "In Firefox, while viewing the page, do Tools > Page
    > Info and see what it says for encoding". On my Mac FF, Tools > Page
    > Info says UTF-8. Using Web Developer Tools/Validate HTML, No
    > Character Encoding Found! Falling back to windows-1252.


    Hmm, looking again (with Firefox 3.0.8 on Ubuntu), the Page Info says:

    Address: http://groups.yahoo.com/group/EnyoungCCCTO/
    Type: text/html
    Render Mode: Standards compliance mode
    Encoding: ISO-8859-1
    Size: 32.07 KB (32,842 bytes)
    Modified: Tue 31 Mar 2009 03:09:19 PM EDT

    Using WebDevTool Response Header:

    Date: Tue, 31 Mar 2009 19:12:55 GMT
    P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR
    ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi
    PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC
    GOV"
    Set-Cookie: GP=v=2&a=l&t=1238526775; path=/; expires=Tuesday,
    07-Apr-2009 23:59:59 GMT; domain=groups.yahoo.com
    G=v=7&data=qHKuZCakakv2HJi4rBE4p7uFLeg6C6lPiW5OQjRA4d1rrHas-nworB9tXQyYTnZo8JPclS43Xz1g8JVcMUlyqsYe2PZXHEawGK9lcMlkZBQk4VIEOLiMSEAUVQKd2A3ib_kE0pPV1FaPXiTJ8qbq9mapSxvKJZruXtn-3jw3X-uvEcFdkGAwvr3eQ68EapHvP6uapXBCdDY4aby-CU6nsXwA5AlT9dttCcpqdezzs4NeiaWsro3g9XCQBQmXn697TXiWLcDtZ81phyyxb97f70X1_k53GYXM91lG0an2sopz4WCtp5tXW2UwpitaR78YaMgLZkIf8EyxWHmlj0djQ7tuNboPatg5OlrMlABPYP72CxsfR3RvDTuspb3ZJyiLRD_QWAfd5NDtblbTtIvh959s21pyVdfvBZNffg03YEVKHbYQa_lfTV7DAI92Nvvp4RlgP1F7niYdL9P3C5Ylzi1eprDwDLjTzw&n=12;
    path=/group/EnyoungCCCTO/; domain=groups.yahoo.com
    Pragma: no-cache
    Expires: Fri, 01 Jan 1999 00:00:00 GMT
    Cache-Control: no-cache, must-revalidate, no-cache="Set-Cookie", private
    Vary: Accept-Encoding
    Content-Type: text/html
    Content-Encoding: gzip
    Age: 0
    Transfer-Encoding: chunked
    Connection: keep-alive
    Server: YTS/1.17.9

    200 OK

    Don't know what else to tell ya.

    --
    -bts
    -Friends don't let friends drive Windows
     
    Beauregard T. Shagnasty, Mar 31, 2009
    #9
  10. JD

    dorayme Guest

    In article <gqtq4a$6ip$>,
    "Beauregard T. Shagnasty" <> wrote:

    > dorayme wrote:
    >
    > > "Beauregard T. Shagnasty" wrote:
    > >> JD wrote:
    > >>> http://groups.yahoo.com/group/EnyoungCCCTO/
    > >>
    > >> That page's server (a Linux server running YTS/1/17/9 software) is
    > >> already sending: Encoding ISO-8859-1

    > >
    > > Earlier you said "In Firefox, while viewing the page, do Tools > Page
    > > Info and see what it says for encoding". On my Mac FF, Tools > Page
    > > Info says UTF-8. Using Web Developer Tools/Validate HTML, No
    > > Character Encoding Found! Falling back to windows-1252.

    >
    > Hmm, looking again (with Firefox 3.0.8 on Ubuntu), the Page Info says:
    >
    > Address: http://groups.yahoo.com/group/EnyoungCCCTO/
    > Type: text/html
    > Render Mode: Standards compliance mode
    > Encoding: ISO-8859-1
    > Size: 32.07 KB (32,842 bytes)
    > Modified: Tue 31 Mar 2009 03:09:19 PM EDT
    >

    I get similar *except* for encoding, where I get UTF-8 and:

    Modified: Wed, 1 Apr 2009 5:44:25 AM (obviously nothing - just due to
    you not living in the little beautest bit of the world <g>)

    Could this be a web browser sensitive matter? Got me?

    ....
    >
    > Don't know what else to tell ya.


    You could say quite how you are getting WebDevTool Response Header. Not
    sure I am getting this one? There is stuff in the page that comes up
    with Validate HTML but you talking a different command?

    --
    dorayme
     
    dorayme, Mar 31, 2009
    #10
  11. dorayme wrote:

    > You could say quite how you are getting WebDevTool Response Header.
    > Not sure I am getting this one? There is stuff in the page that comes
    > up with Validate HTML but you talking a different command?


    On the Web Developers Toolbar > Information > View Response Headers

    --
    -bts
    -Friends don't let friends drive Windows
     
    Beauregard T. Shagnasty, Apr 1, 2009
    #11
  12. JD

    dorayme Guest

    In article <gquahg$mjl$>,
    "Beauregard T. Shagnasty" <> wrote:

    > dorayme wrote:
    >
    > > You could say quite how you are getting WebDevTool Response Header.
    > > Not sure I am getting this one? There is stuff in the page that comes
    > > up with Validate HTML but you talking a different command?

    >
    > On the Web Developers Toolbar > Information > View Response Headers


    Yes, thanks... on this one I get similar to yours, esp.

    Content-Encoding: gzip
    Age: 2
    Transfer-Encoding: chunked
    Connection: keep-alive
    Server: YTS/1.17.9

    --
    dorayme
     
    dorayme, Apr 1, 2009
    #12
  13. JD

    John Hosking Guest

    Beauregard T. Shagnasty wrote:
    > John Hosking wrote:
    >
    >> Beauregard T. Shagnasty wrote:
    >>> Again, placing meta-charset lines doesn't do anything *unless* your
    >>> server is sending as charset: none.

    >> This is a very interesting statement to me, Beauregard, as I just
    >> responded to somebody in another group on this same subject. I acted
    >> as if I knew what I was talking about, but your statement makes me
    >> suddenly unsure.

    >
    > Actually, I based my above reply on your post in the other group, 'cause
    > I surely thought you knew what you were talkin' about. ;-)


    You *fool*. ;-)

    >
    > It sounds logical. If the server already sends one (such as the OP's
    > sample page) like ISO-8859-1, then no manner of <meta> HTML code will
    > change it. Unless - possibly - there isn't one from the server.
    >


    Yes. My uncertainty seems to come down to the fine difference between
    {0} and the empty set, or IOW, between "charset=none" and no charset
    statement from the server at all (where ISTM you said "charset: none").
    I guess we're in agreement, then. I hope.


    --
    John
    Let's just charter a boat and have fun together on the ship of fools.
     
    John Hosking, Apr 1, 2009
    #13
  14. JD

    JWS Guest

    JD wrote:

    > But the text is not shown properly.


    The Chinese text is in Big-5 encoding, not Unicode. It is shown OK
    in Firefox when I select "view, character encoding, auto-detect,
    Chinese".

    Perhaps you could try to convert the text to Unicode first. Maybe
    the page will then be readable without extra user intervention.
     
    JWS, Apr 1, 2009
    #14
  15. Ben C wrote:

    > I think the Page Info [in Firefox] may be telling you what encoding the
    > browser
    > decided to guess.


    I'm pretty sure that's what it does.

    > This may depend on your locale or something or just
    > what mood it's in.


    It may be POM-dependent, but more commonly it depends on the browsing
    history and on user's actions (if any) with the command View/Encoding. If
    you visit a page that does not declare its encoding in HTTP headers or in a
    meta tag, e.g. my test page
    http://www.cs.tut.fi/~jkorpela/chars/test8.htm
    and then experiment with View/Encoding, you'll notice that Page Info changes
    accordingly.

    So Page Info tells what encoding the browser is using to interpret the page.
    This might be something declared in HTTP or in meta, or something else.

    (POM = Phase Of the Moon.)

    >> Could this be a web browser sensitive matter? Got me?

    >
    > I tried the page, and the actual header says:
    >
    > Content-Type: text/html
    >
    > i.e. no charset.


    The actual HTTP header could depend on the browser. This would actually be
    quite OK in a situation where a document is available in different encodings
    and the server uses the Accept-Charset header sent by the browser to select
    the best encoding. In that scenario, the response should of course announce
    the negotiated encoding.

    > This means the browser would look for a meta tag. But there doesn't
    > seem to be one of those either.
    >
    > There is no specification for what happens next, so you get a
    > browser-specific guess.


    You get a guess, more or less, but the specification says _something_:

    "In addition to this list of priorities, the user agent may use heuristics
    and user settings. For example, many user agents use a heuristic to
    distinguish the various encodings used for Japanese text. Also, user agents
    typically have a user-definable, local default character encoding which they
    apply in the absence of other indicators."
    http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

    In my Firefox, the setting of default encoding is under
    Työkalut/Asetukset/Sisältö/Kirjasinlajit ja värit/Lisäasetukset. I guess
    that corresponds to something like Tools/Settings/Content/Fonts and
    colors/Additional settings. But this does not make any more sense.

    Who the [censored] got the idea of putting _default encoding_ setting under
    "Fonts and colors"?!

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Apr 1, 2009
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,869
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. Replies:
    1
    Views:
    23,371
    Real Gagnon
    Oct 8, 2004
  3. Replies:
    14
    Views:
    2,767
    aruna
    Jul 9, 2007
  4. newbie
    Replies:
    0
    Views:
    111
    newbie
    Nov 14, 2008
  5. Replies:
    2
    Views:
    373
Loading...

Share This Page