Unicode (UTF-8)

  • Thread starter Luigi Donatello Asero
  • Start date
T

Toby Inkster

Mark said:
Change HTML editor to your program of choice. You can only select what's
in the drop-down list though, you can't add anything. I'm guessing it
gets that list from the programs associated with HTML files in Windows,
but I haven't looked into it that closely.

Nope -- it gets them from a magic list that it makes up. You can probably
tweak somewhere in the registry.

Might be easier to rename "c:\windows\notepad.exe" to
"c:\windows\crappy.exe" and put an editor of your choice where Notepad was.

I quite like SciTE (google for it).
 
J

Jonathan N. Little

dorayme said:

Well even though MS seems to be accommodating but having in the Control
Panel, Internet Options to Programs tab a drop list box on the HTML
Editor property, there will only be 'Notepad' as an option! To change
this requires a registry edit.

Under

HKLM\SOFTWARE\Microsoft\Internet Explorer

create a key 'View Source Editor'

then under 'View Source Editor' key create another key 'Editor Name'

then edit the (Default) value with path to EXE of desired program.
 
J

Jonathan N. Little

Toby said:
Nope -- it gets them from a magic list that it makes up. You can probably
tweak somewhere in the registry.

Might be easier to rename "c:\windows\notepad.exe" to
"c:\windows\crappy.exe" and put an editor of your choice where Notepad was.

I quite like SciTE (google for it).

No can do easily in Win2k+, you have to hack the SFC 'System File
Checker' system that newer Windows uses to backup and protect system
files. I had to do this to replace Notepad with Metapad...

As to the View Source Editor in IE, it requires a registry hack. I
posted earlier in the thread, but shall repeat.

Under key:

HKLM\SOFTWARE\Microsoft\Internet Explorer

create a key 'View Source Editor'

then under 'View Source Editor' key create another key 'Editor Name'

then edit the (Default) value with path to EXE of desired program.
 
L

Luigi Donatello Asero

You do not need choose an editor.
You can open Wordpad and save a file in Unicode.
You can save it as txt or html or php and so on.
You will notice that you have saved the changes if you browse the
corresponding page.
Nope -- it gets them from a magic list that it makes up. You can probably
tweak somewhere in the registry.

Might be easier to rename "c:\windows\notepad.exe" to
"c:\windows\crappy.exe" and put an editor of your choice where Notepad
was.

Are such changes allowed??
 
L

Luigi Donatello Asero

Luigi Donatello Asero said:
You do not need choose an editor.
You can open Wordpad and save a file in Unicode.


Sorry, not in Unicode if you want to include a file by php but
as a text document (at least using https)
You choose the extension as above (txt, html, php and the like)

Luigi Donatello Asero
https://www.scaiecat-spa-gigi.com/
我会说æ„大利语
 
F

Frank Olieu

_Jonathan N. Little_ skrev | wrote | écrivit (26-05-2006 15:24):
Well even though MS seems to be accommodating but having in the Control
Panel, Internet Options to Programs tab a drop list box on the HTML
Editor property, there will only be 'Notepad' as an option! To change
this requires a registry edit.

....or just replace 'Notepad.exe' with something else that you rename to...
'Notepad.exe'!
I use Notepad2 (based on Scintilla, like Scite) and it works just fine.
 
L

Luigi Donatello Asero

Frank Olieu said:
_Jonathan N. Little_ skrev | wrote | crivit (26-05-2006 15:24):


...or just replace 'Notepad.exe' with something else that you rename to...
'Notepad.exe'!
I use Notepad2 (based on Scintilla, like Scite) and it works just fine.

Is that not considered as a change in the register?
Are such changes allowed??
 
J

Jonathan N. Little

Frank said:
_Jonathan N. Little_ skrev | wrote | écrivit (26-05-2006 15:24):


...or just replace 'Notepad.exe' with something else that you rename to...
'Notepad.exe'!
I use Notepad2 (based on Scintilla, like Scite) and it works just fine.


But you cannot do that in Win2K and XP, the backed up original versions
will replace the counterfeit, it requires a hack to make it work....
 
J

Jonathan N. Little

Luigi said:
"Frank Olieu" <[email protected]> skrev i meddelandet


Is that not considered as a change in the register?

No that is called changing a file name. But replacing notepad.exe by
this method will only work in Win98 and earlier. Later versions require
hacking

Are such changes allowed??

No, Big Bill and his nerd squad will be notified the moment you use
Internet Explorer (MS has partnered with the NSA) and they will be at
your door within the hour....
 
J

Joe

Thanks, I will look next time I fire up my Winbox. I do recall
somewhere an obvious option simply not being in the dropdown for
this sort of thing... must look on eBay for a cheap but much
better box that can run XP comfortably, I think that OS has more
intelligence built in....
You're an Aussie - try the Aussie auction site at oztion.com.au
I seem to recall that the easy way to get IE to use a "not notepad"
editor for looking at HTML source is to "associate" html files with
another program.
(Thinks: In File Explorer (not IE), right-click a HTM(L) file ; go Open
with .. then browse to the file (eg Editpad.exe) that you want to use.

Of course, it's been a while and I may be talking total crap.

.... which of course, I am. *EDIT with* is what you need. And I've
forgotten how to do that.
 
L

Luigi Donatello Asero

Joe said:
You're an Aussie - try the Aussie auction site at oztion.com.au
I seem to recall that the easy way to get IE to use a "not notepad"
editor for looking at HTML source is to "associate" html files with
another program.
(Thinks: In File Explorer (not IE), right-click a HTM(L) file ; go Open
with .. then browse to the file (eg Editpad.exe) that you want to use.

Of course, it's been a while and I may be talking total crap.

... which of course, I am. *EDIT with* is what you need. And I've
forgotten how to do that.

Well, so far I have been able to display Russian and Chinese in
UTF-16.
The process I go through is a bit complicated for Chinese in Windows 98 as
I cannot write directly on the editor but I have to write using UTF-8
somewhere else, for example on my newsreader and then copy and paste in in
the editor and save it.
Opera seems to select the write encoding automatically.
 
I

ironcorona

Luigi said:
Well, so far I have been able to display Russian and Chinese in
UTF-16.

How come you're using UTF-16? Russian and Chinese can both be encoded
in UTF-8.

Though I would like to ask; how many Chinese symbols are there? Can you
encode them *all* in UTF-8? Also [and wildly off topic] how do you make
up new [written] words in Chinese?
 
L

Luigi Donatello Asero

ironcorona said:
How come you're using UTF-16? Russian and Chinese can both be encoded
in UTF-8.

Does that work on Wordpad and Notepad? Or perhaps is the server not properly
configured for the use of Unicode?
Though I would like to ask; how many Chinese symbols are there?
Where?

Can you
encode them *all* in UTF-8? Also [and wildly off topic] how do you make
up new [written] words in Chinese?

What do you mean by "new [written] words in Chinese"?
 
I

ironcorona

Luigi said:
Does that work on Wordpad and Notepad? Or perhaps is the server not properly
configured for the use of Unicode?

No, no, I was asking a question. Since UTF-8 is the de facto standard
on the net I was just wondering why you were choosing UTF-16

In the world. UTF can only support so many characters. I know that
there are thousands of Chinese characters out there and was just
wondering if they were all in UTF-8?
Can you
encode them *all* in UTF-8? Also [and wildly off topic] how do you make
up new [written] words in Chinese?

What do you mean by "new [written] words in Chinese"?

Well if I wanted to make up a new word in English, lets say blog for the
moment. I hear how it sounds and then use the English alphabet
characters that represent that sound [taking into account, obviously,
historical precedent on how certain words are spelled etc]. Since each
Chinese word needs a new character I was wondering what the system was
for creating new written words.
 
F

Frank Olieu

_Jonathan N. Little_ skrev | wrote | écrivit (27-05-2006 02:04):
But you cannot do that in Win2K and XP, the backed up original versions
will replace the counterfeit, it requires a hack to make it work....

In XP you need to replace /all/ instances of notepad.exe (backed up original
versions), and at some point, answer 'yes' if XP asks you whether you really
want to keep the 'counterfeit' (AFAIR).
But that's not really 'hacking', is it?

Luigi:
Are such changes allowed??

You mean by law? :)
 
T

Toby Inkster

ironcorona said:
How come you're using UTF-16? Russian and Chinese can both be encoded
in UTF-8.

A typical Chinese character will take up 16 bits in a UTF-16 file, but 24
bits in a UTF-8 file. Thus a UTF-8 file may be up to 50% bigger than
UTF-16. Most Western characters only use 8 bits in UTF-8, but 16 in
UTF-16, so for Western languages, UTF-8 can be up to 50% smaller than
UTF-16.

So if a page uses primarily non-Western characters, UTF-16 is often a
better choice.
 
A

Alan J. Flavell

A typical Chinese character will take up 16 bits in a UTF-16 file,
but 24 bits in a UTF-8 file. Thus a UTF-8 file may be up to 50%
bigger than UTF-16. Most Western characters only use 8 bits in
UTF-8, but 16 in UTF-16, so for Western languages, UTF-8 can be up
to 50% smaller than UTF-16.

Yes, but don't forget the markup!

I just tried saving the BBC News Chinese front page in the two
encodings:

81672 May 27 13:34 bbc-chinese-utf16.html
43759 May 27 13:33 bbc-chinese-utf8.html

In case that might be an unfair choice, I tried a Bank of China site:

141476 May 27 13:43 boc-tw-utf16.html
73144 May 27 13:44 boc-tw-utf8.html

I'm no expert in CJK issues, so anything that I say about those
details would need to be confirmed with more-authoritative sources.
If you're better informed about this then feel free to say so, and
I'll concede. But I would make a few points.

There were already well-established local encodings for different
varieties of Chinese, producing the preferred glyphs for respective
users. AIUI, the Han Unification involved in the Unicode
representation of CJK has not been to everyone's taste.[0]

The established codings are still widely used, e.g the BoC site was in
Big5 before I used Mozilla Composer's "save and change encoding"
option to produce the above unicode-encoded variants.

But the more HTML-technical aspect would be, how well supported is
utf-16, not only for rendering pages but also for forms submission
etc.? How well do the web search services index documents served in
utf-16 ? There's little doubt in my mind that utf-8 has been supported
for quite some years now in a wide range of browsers, and search
engine support has also been good recently; but widespread support for
the encoding schemes[1] of utf-16 has been more recent.

In due course I'd expect any remaining difficulties to be overcome,
but I'm uneasy about a blanket recommendation to use utf-16. Even if
you choose it as your compact storage encoding, there might be
something to be said for transcoding to utf-8 when you serve it out to
the web.

Even *if* you're worried about the file size, you might want to use
gzip compression, which is very widely supported for HTML nowadays.

10525 May 27 13:34 bbc-chinese-utf16.html.gz
9510 May 27 13:33 bbc-chinese-utf8.html.gz

11899 May 27 13:43 boc-tw-utf16.html.gz
10115 May 27 13:44 boc-tw-utf8.html.gz

As you can see, after gzip the files are of very similar sizes, which
isn't so surprising knowing that they contain the same information.

cheers

[0] http://en.wikipedia.org/wiki/Han_unification ,
http://tclab.kaist.ac.kr/~otfried/Mule/unihan.html etc.

[1] there are four flavours of utf-16: there's utf-16LE, utf-16BE,
and thirdly utf-16 with BOM, in its little- and big-endian flavours.
 
J

Jukka K. Korpela

ironcorona said:
How come you're using UTF-16?

I think Luigi Asero has refused to understand the principles of character
encoding. That might explain part of the phenomenon.
Russian and Chinese can both be encoded
in UTF-8.

Undoubtedly. Pretty much anything that can be expressed as written text in
computer-readable form can be encoded in UTF-8. More exactly, all Unicode
text can be encoded in UTF-8.
Though I would like to ask; how many Chinese symbols are there?

A few myriads. The exact number depends on your ontology of symbols. (Does a
symbol exist if it is known from one single written document only? What
about two?)
Can you encode them *all* in UTF-8?

No, because not all Chinese symbols have (yet) been included into Unicode.
Theoretically, you could encode them yourself, using Private Use code
points, which naturally have UTF-8 encoding, too, but that's hardly a
feasible solution in HTML authoring.
Also [and wildly off topic] how do
you make up new [written] words in Chinese?

I think it's really wildly off-topic, and a good book on Chinese writing
systems might help. The answer also depends on your definition of "word".
 
L

Luigi Donatello Asero

ironcorona said:
No, no, I was asking a question. Since UTF-8 is the de facto standard
on the net I was just wondering why you were choosing UTF-16

I tried to use UTF-8 but I could not...
Does php or the server need to be configured for UTF-8?
In the world. UTF can only support so many characters. I know that
there are thousands of Chinese characters out there and was just
wondering if they were all in UTF-8?
I am not sure, but I think that I had installed this
http://www.microsoft.com/windows/ie/ie6/downloads/recommended/ime/install.mspx
Can you
encode them *all* in UTF-8? Also [and wildly off topic] how do you make
up new [written] words in Chinese?

What do you mean by "new [written] words in Chinese"?

Well if I wanted to make up a new word in English, lets say blog for the
moment. I hear how it sounds and then use the English alphabet
characters that represent that sound [taking into account, obviously,
historical precedent on how certain words are spelled etc]. Since each
Chinese word needs a new character I was wondering what the system was
for creating new written words.

I guess that Chinese characters are based on the combination of several
signs.
There are basic signs (214 called "Radikale" in German in the book "Die
chinesische Schrift" - published by Assimil) on which structurally more
complicated ones are based.
So, graphically, you might be better off to learn to write the basic signs
in order to
write more complicated ones.
 
L

Luigi Donatello Asero

Luigi Donatello Asero said:
ironcorona said:
No, no, I was asking a question. Since UTF-8 is the de facto standard
on the net I was just wondering why you were choosing UTF-16

I tried to use UTF-8 but I could not...
Does php or the server need to be configured for UTF-8?
In the world. UTF can only support so many characters. I know that
there are thousands of Chinese characters out there and was just
wondering if they were all in UTF-8?
I am not sure, but I think that I had installed this
http://www.microsoft.com/windows/ie/ie6/downloads/recommended/ime/install.mspx
Can you
encode them *all* in UTF-8? Also [and wildly off topic] how do you make
up new [written] words in Chinese?

What do you mean by "new [written] words in Chinese"?

Well if I wanted to make up a new word in English, lets say blog for the
moment. I hear how it sounds and then use the English alphabet
characters that represent that sound [taking into account, obviously,
historical precedent on how certain words are spelled etc]. Since each
Chinese word needs a new character I was wondering what the system was
for creating new written words.

I guess that Chinese characters are based on the combination of several
signs.
There are basic signs (214 called "Radikale" in German in the book "Die
chinesische Schrift" - published by Assimil) on which structurally more
complicated ones are based.
So, graphically, you might be better off to learn to write the basic signs
in order to
write more complicated ones.
For example 人
is pronounced "ren"
and means person
If you write
æ„大利
you have "Italy"
if you add
人
æ„大利人
you have "Italian"
so assuming that you had not had the word displaying the nationality you
could have formed it by adding
人
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top