jtl.zheng said:
the Reader pick up one char from a local file
here how many bits "one char" has is decided by the local system
just as in Windwos "one char" has 8 bits
and in other system that may has 16bits or 32bits in one char
and all these bits will turn to 32bits Unicode in java
this is what the Reader do
is it correct?
Not quite, but you are close.
For a start, Readers don't read directly from files, they read from byte
streams (InputStream and its subclasses). That's not very important but
keening it in mind will help you understand the IO architecture better.
The number of bytes read to read one character depends on what character
encoding (or character set) the Reader has been configured to use. There are
many character sets, and you can configure a Reader to use whichever one you
want (for the character sets which come with Java anyway). You always have to
specify what character set to use, because Java has no way to tell (just from
the binary data in the file) which character set was used to write it. There
is also a default character set, which Java will use if you don't specify one
explicitly -- Java makes an assumption of what is likely to be correct based on
your operating system and locale[*]. It's OK to use the default if you are
just going to be reading or writing files on one computer (or in one office),
but if you are going to share data around the world then you will have to
consider carefully which character set(s) to use.
([*] It's actually set by the system property "file.encoding", which is
"Cp1252" on my machine but will be something different on yours.)
The important thing to realise is that it isn't the local file system or
operating system which "decides" what bytes are used to represent a character,
but the Reader itself (or rather the character encoding it has been configured
to use).
On this Windows machine there are lots of text files. Some use 8-bits per
character to represent text, those files can only hold a small subset of all
the Unicode characters, but that subset is large enough to use for British
English text. Some others hold text encoded as 16-bits per character (created
by Windows utilities). Still others hold text encoded in variable-length
encoding like UTF-8 (which uses 1 to 4 bytes per character) or UTF-16 (which
uses 2 or 4 bytes per character). If I want a Java program to read all of
those files correctly, then I have to /tell/ it what character encoding each
files uses. Many of them use an encoding ("Windows-1252") which is often used
by British English people using Windows, and that is what Java will assume a
file contains unless I tell it differently. So Java would read /some/ of the
files correctly, but not all. On you machine, Java would make a different
assumption, and so would read a different set of files correctly.
the InputStream pick up 8 bits one time for compose a byte
no matter what the system is
in windows ,in unix or other system, it always pick 8 bits one time,not
16bits or 32bits
is it correct?
That's right.
-- chris