a language encoding issue

J

JD

Hi,

I am a yahoo group user. It has a table (<table> .. </table>) where I can
enter text which will show up in the group's home page. I want to enter
text in a non-English languag. So, I use the following pair to enclose my
text.

<SPAN LANG="UTF-8">
....
</SPAN>

But the text is not shown properly. I will have to set "charset" for
browsers to encode properly (as shown below).

<meta http-equiv="Content-Type" content="text/html; charset=...">

The thing is that charset should be set in a page head section. I have no
control over that during entering text into the yahoo group table. Is it
possible to switch different language encouding inside a span (<SPAN> ..
</SPAN>)?

Any help would be much appreciated.

JD
 
B

Beauregard T. Shagnasty

JD said:
I am a yahoo group user. It has a table (<table> .. </table>) where I
can enter text which will show up in the group's home page. I want
to enter text in a non-English languag.

Are you typing into the cell of a table? A said:
So, I use the following pair to enclose my text.

<SPAN LANG="UTF-8">

utf-8 is not a LANGuage. English, or French are languages.
...
</SPAN>

But the text is not shown properly. I will have to set "charset" for
browsers to encode properly (as shown below).

<meta http-equiv="Content-Type" content="text/html; charset=...">

The thing is that charset should be set in a page head section. I
have no control over that during entering text into the yahoo group
table.

That never works, anyway. What charset do the page's response headers
show. It may already be utf-8.
Is it possible to switch different language encouding inside
a span (<SPAN> .. </SPAN>)?

<span> will be canceled by the next block element. Use <div> .. </div>
instead.
 
B

Beauregard T. Shagnasty

JD said:
You were right. I am typing into a table cell. Also, it's wrong to
use UTF-8 in the language tag. I already corrected it but text is
still not properly shown. I check the source of the page. All the
charset occurences are utf-8 already. But none of them show up
between <head> .. </head>.

Again, placing meta-charset lines doesn't do anything *unless* your
server is sending as charset: none. In Firefox, while viewing the page,
do Tools > Page Info and see what it says for encoding. You can also
install the Web Developer Toolbar, and see Response Headers.
http://chrispederick.com/work/web-developer/
I also change from the SPAN tag to DIV tag, but it doesn't help.

Maybe if you would give a link to the page, or one the masses can
access, and tell exactly what language you _want_ to use (you know, like
Greek or Chinese or Russian), maybe someone will have some more advice.

Please don't top-post.
 
B

Beauregard T. Shagnasty

JD said:
I would appreciate it very much if you or someone could look into the
following link:

http://groups.yahoo.com/group/EnyoungCCCTO/

That page's server (a Linux server running YTS/1/17/9 software) is
already sending: Encoding ISO-8859-1

You will not be able to change it. Follow Ben C's advice about using
numeric character entities; that would be your only recourse.
 
J

JD

Ben C said:
No, you can't change the encoding half-way through the page.

If the original encoding is, say, Latin-1 and you want to insert some
Chinese (which is not representable in Latin-1), you can use numeric
entities.

e.g. like this 什么是

You could write your original Chinese (or whatever is is) in a text
editor, then use a program called "recode" which you can download to
turn it into those entities. Then paste that into the web page.

http://www.gnu.org/software/recode/

Thank you both, Beauregard and Ben. You both answer my questions. I was
wondering why the yahoo server sometimes changed my text into those funny
numeric entrities. I always changed them back for easy maintenance. Now I
will follow your instruction to recode first and then copy/paste. Thanks so
much for the help.

JD
 
J

John Hosking

Beauregard said:
Again, placing meta-charset lines doesn't do anything *unless* your
server is sending as charset: none.

This is a very interesting statement to me, Beauregard, as I just
responded to somebody in another group on this same subject. I acted as
if I knew what I was talking about, but your statement makes me suddenly
unsure.

The poster (<[email protected]>) was using
meta http-equiv in her <head> but the W3C validator didn't find any
encoding and therefore failed to check the page.

In my response (<[email protected]>), I told her that the
http-equiv was moot, since her server was sending "charset=none". Now
you make me think that was an incorrect analysis of her problem.

Would you care to pop over to c.i.w.a.html and clear things up in that
thread? Or perhaps post again here with further explication or a pointer
to something about what "none" means (Googling didn't help me here)?
 
B

Beauregard T. Shagnasty

John said:
This is a very interesting statement to me, Beauregard, as I just
responded to somebody in another group on this same subject. I acted
as if I knew what I was talking about, but your statement makes me
suddenly unsure.

Actually, I based my above reply on your post in the other group, 'cause
I surely thought you knew what you were talkin' about. ;-)

It sounds logical. If the server already sends one (such as the OP's
sample page) like ISO-8859-1, then no manner of <meta> HTML code will
change it. Unless - possibly - there isn't one from the server.
 
D

dorayme

"Beauregard T. Shagnasty said:
That page's server (a Linux server running YTS/1/17/9 software) is
already sending: Encoding ISO-8859-1

Earlier you said "In Firefox, while viewing the page, do Tools > Page
Info and see what it says for encoding". On my Mac FF, Tools > Page Info
says UTF-8. Using Web Developer Tools/Validate HTML, No Character
Encoding Found! Falling back to windows-1252.
 
B

Beauregard T. Shagnasty

dorayme said:
Earlier you said "In Firefox, while viewing the page, do Tools > Page
Info and see what it says for encoding". On my Mac FF, Tools > Page
Info says UTF-8. Using Web Developer Tools/Validate HTML, No
Character Encoding Found! Falling back to windows-1252.

Hmm, looking again (with Firefox 3.0.8 on Ubuntu), the Page Info says:

Address: http://groups.yahoo.com/group/EnyoungCCCTO/
Type: text/html
Render Mode: Standards compliance mode
Encoding: ISO-8859-1
Size: 32.07 KB (32,842 bytes)
Modified: Tue 31 Mar 2009 03:09:19 PM EDT

Using WebDevTool Response Header:

Date: Tue, 31 Mar 2009 19:12:55 GMT
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR
ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi
PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC
GOV"
Set-Cookie: GP=v=2&a=l&t=1238526775; path=/; expires=Tuesday,
07-Apr-2009 23:59:59 GMT; domain=groups.yahoo.com
G=v=7&data=qHKuZCakakv2HJi4rBE4p7uFLeg6C6lPiW5OQjRA4d1rrHas-nworB9tXQyYTnZo8JPclS43Xz1g8JVcMUlyqsYe2PZXHEawGK9lcMlkZBQk4VIEOLiMSEAUVQKd2A3ib_kE0pPV1FaPXiTJ8qbq9mapSxvKJZruXtn-3jw3X-uvEcFdkGAwvr3eQ68EapHvP6uapXBCdDY4aby-CU6nsXwA5AlT9dttCcpqdezzs4NeiaWsro3g9XCQBQmXn697TXiWLcDtZ81phyyxb97f70X1_k53GYXM91lG0an2sopz4WCtp5tXW2UwpitaR78YaMgLZkIf8EyxWHmlj0djQ7tuNboPatg5OlrMlABPYP72CxsfR3RvDTuspb3ZJyiLRD_QWAfd5NDtblbTtIvh959s21pyVdfvBZNffg03YEVKHbYQa_lfTV7DAI92Nvvp4RlgP1F7niYdL9P3C5Ylzi1eprDwDLjTzw&n=12;
path=/group/EnyoungCCCTO/; domain=groups.yahoo.com
Pragma: no-cache
Expires: Fri, 01 Jan 1999 00:00:00 GMT
Cache-Control: no-cache, must-revalidate, no-cache="Set-Cookie", private
Vary: Accept-Encoding
Content-Type: text/html
Content-Encoding: gzip
Age: 0
Transfer-Encoding: chunked
Connection: keep-alive
Server: YTS/1.17.9

200 OK

Don't know what else to tell ya.
 
D

dorayme

"Beauregard T. Shagnasty said:
Hmm, looking again (with Firefox 3.0.8 on Ubuntu), the Page Info says:

Address: http://groups.yahoo.com/group/EnyoungCCCTO/
Type: text/html
Render Mode: Standards compliance mode
Encoding: ISO-8859-1
Size: 32.07 KB (32,842 bytes)
Modified: Tue 31 Mar 2009 03:09:19 PM EDT
I get similar *except* for encoding, where I get UTF-8 and:

Modified: Wed, 1 Apr 2009 5:44:25 AM (obviously nothing - just due to
you not living in the little beautest bit of the world <g>)

Could this be a web browser sensitive matter? Got me?

....
Don't know what else to tell ya.

You could say quite how you are getting WebDevTool Response Header. Not
sure I am getting this one? There is stuff in the page that comes up
with Validate HTML but you talking a different command?
 
B

Beauregard T. Shagnasty

dorayme said:
You could say quite how you are getting WebDevTool Response Header.
Not sure I am getting this one? There is stuff in the page that comes
up with Validate HTML but you talking a different command?

On the Web Developers Toolbar > Information > View Response Headers
 
D

dorayme

"Beauregard T. Shagnasty said:
On the Web Developers Toolbar > Information > View Response Headers

Yes, thanks... on this one I get similar to yours, esp.

Content-Encoding: gzip
Age: 2
Transfer-Encoding: chunked
Connection: keep-alive
Server: YTS/1.17.9
 
J

John Hosking

Beauregard said:
Actually, I based my above reply on your post in the other group, 'cause
I surely thought you knew what you were talkin' about. ;-)

You *fool*. ;-)
It sounds logical. If the server already sends one (such as the OP's
sample page) like ISO-8859-1, then no manner of <meta> HTML code will
change it. Unless - possibly - there isn't one from the server.

Yes. My uncertainty seems to come down to the fine difference between
{0} and the empty set, or IOW, between "charset=none" and no charset
statement from the server at all (where ISTM you said "charset: none").
I guess we're in agreement, then. I hope.
 
J

JWS

JD said:
But the text is not shown properly.

The Chinese text is in Big-5 encoding, not Unicode. It is shown OK
in Firefox when I select "view, character encoding, auto-detect,
Chinese".

Perhaps you could try to convert the text to Unicode first. Maybe
the page will then be readable without extra user intervention.
 
J

Jukka K. Korpela

Ben said:
I think the Page Info [in Firefox] may be telling you what encoding the
browser
decided to guess.

I'm pretty sure that's what it does.
This may depend on your locale or something or just
what mood it's in.

It may be POM-dependent, but more commonly it depends on the browsing
history and on user's actions (if any) with the command View/Encoding. If
you visit a page that does not declare its encoding in HTTP headers or in a
meta tag, e.g. my test page
http://www.cs.tut.fi/~jkorpela/chars/test8.htm
and then experiment with View/Encoding, you'll notice that Page Info changes
accordingly.

So Page Info tells what encoding the browser is using to interpret the page.
This might be something declared in HTTP or in meta, or something else.

(POM = Phase Of the Moon.)
I tried the page, and the actual header says:

Content-Type: text/html

i.e. no charset.

The actual HTTP header could depend on the browser. This would actually be
quite OK in a situation where a document is available in different encodings
and the server uses the Accept-Charset header sent by the browser to select
the best encoding. In that scenario, the response should of course announce
the negotiated encoding.
This means the browser would look for a meta tag. But there doesn't
seem to be one of those either.

There is no specification for what happens next, so you get a
browser-specific guess.

You get a guess, more or less, but the specification says _something_:

"In addition to this list of priorities, the user agent may use heuristics
and user settings. For example, many user agents use a heuristic to
distinguish the various encodings used for Japanese text. Also, user agents
typically have a user-definable, local default character encoding which they
apply in the absence of other indicators."
http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

In my Firefox, the setting of default encoding is under
Työkalut/Asetukset/Sisältö/Kirjasinlajit ja värit/Lisäasetukset. I guess
that corresponds to something like Tools/Settings/Content/Fonts and
colors/Additional settings. But this does not make any more sense.

Who the [censored] got the idea of putting _default encoding_ setting under
"Fonts and colors"?!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top