Some questions about the IO package

J

jtl.zheng

I have learnt the java.IO these days
but it puzzle me so much because it has too much classes to learn,
such as InputStream,OutputStream,Writer,Reader and their numbers of
subclasses.
it take me so much time to learn and I can't get the point of the
java.IO

my question is:

1,Where do you use the java.IO,I means you use IO class to do what
usually?

2,Should I learn all the IO package's classes?
Or which classes are the keys I should learn strongly?
Which classes are mostly useful?

3,I don't understand the "stream" very much,
What do you think of the "stream"? Would you give me so examples to
explain?

4,Would you please give me some advices fo learning java.IO?


Thank you very much in advance..
: )
 
A

Andrew Thompson

jtl.zheng wrote:
...
my question is:

I count seven questions..
1,Where do you use the java.IO,I means you use IO class to do what
usually?

I/O -> Input and Output.

Reading data 'in' from a source (e.g. a file or URL).
Writing data 'out' to a sink (e.g. a file)
2,Should I learn all the IO package's classes?

You should investigate some carefully, and be familiar
with all of them (ideally).
Or which classes are the keys I should learn strongly?

The best ones to *start* *with* are the ones mentioned
in the Java Tutorial's I/O section.
<http://java.sun.com/docs/books/tutorial/essential/io/index.html>
...

At this point I'm stopping - go through the tutorial
(compile and run the examples) and it should be
much more clear to you.

Also not that there is a good group for Java beginners,
<http://groups.google.com/group/comp.lang.java.help>

HTH

Andrew T.
 
J

jtl.zheng

Thank you very much for you answers

I'm still in puzzle with the byte stream and char stream

what I think is:

in char stream

the Reader pick up one char from a local file
here how many bits "one char" has is decided by the local system
just as in Windwos "one char" has 8 bits
and in other system that may has 16bits or 32bits in one char
and all these bits will turn to 32bits Unicode in java
this is what the Reader do
is it correct?

in byte stream

the InputStream pick up 8 bits one time for compose a byte
no matter what the system is
in windows ,in unix or other system, it always pick 8 bits one time,not
16bits or 32bits
is it correct?

Thank you very much in advance
: )
 
J

jtl.zheng

I read that "So why talk about byte streams? Because all other stream
types are built on byte streams. "

is here "all other stream types" contain char stream?
 
M

Mark Space

jtl.zheng said:
I read that "So why talk about byte streams? Because all other stream
types are built on byte streams. "

is here "all other stream types" contain char stream?

I'm not an expert, but I think the answer is "Yes, certainly."

To expound on a previous question:
> what I think is:
>
> in char stream
>
> the Reader pick up one char from a local file
> here how many bits "one char" has is decided by the local system
> just as in Windwos "one char" has 8 bits
> and in other system that may has 16bits or 32bits in one char
> and all these bits will turn to 32bits Unicode in java
> this is what the Reader do
> is it correct?

I'm still not an expert, but I think the answer is "no."

On Windows, I've heard that the Java character set is a 16 bit one,
possibly Unicode 1.0. Windows itself uses Unicode 1.0 character set for
certain operations, so this isn't a stretch.

I've also heard that Java on Windows uses some form of variable length
encoding. So for many English characters, yes, a char stream reads one
byte. But some chars maybe be 16 or 24 bits, so some chars may involve
reading multiple bytes.

I honestly have no idea if Java on other systems is localized to a
different character set (maybe 16 bit characters by default), or if it
just adapts other character sets to it's own internal one, so I can't
really tell you what Java does on other systems.
 
C

Chris Uppal

jtl.zheng said:
I have learnt the java.IO these days
but it puzzle me so much because it has too much classes to learn,

The java.io package is pretty confusing.

The important thing to get clear is the difference between the classes which
are intended for reading/writing binary data, and the ones for reading/writing
text.

InputStream and OutputStream (and their many subclasses) are for handling
binary data. Ultimately all data which is written to or read from files, or
from the network, is binary.

Read and Writer (and their many subclasses) are for handling textual data.
Nearly all Readers and Writers are attached to an InputStream or OutputStream,
and what they do is translate between text an binary. They always use a
"character encoding" to do the translation. There are many, many, encodings
which are used on different operating systems, and in different countries.

Normally the name of the class XxxxReader or XxxxInputStream will tell you
whether it is for text or binary data. The confusing exception is
java.io.PrintStream which, for historical reasons, doesn't follow the normal
pattern. You should need to use that very much anyway except that
java.lang.System.out is an instance of that class.

1,Where do you use the java.IO,I means you use IO class to do what
usually?

For reading or writing data to files or the network.

2,Should I learn all the IO package's classes?
Or which classes are the keys I should learn strongly?

Look at InputStream/OutputStream and Reader/Writer first, they will give you
the overview -- if you understand them then everything else will be much
easier. InputStreamReader and OutputStreamWriter are what you use when you
have a binary stream and you want to read or write text to it -- they do the
translation. FileInputStream and FileOutputStream are how you read or write
binary data to files. If you want to write text to a file then you could use
an OutputStreamWriter wrapped around a FileOutputStream (which is what I would
probably do), or you may find it more convenient to use a FileWriter (which is
just a short-cut). Similarly for FileInputStream and FileReader. The other
critically important streams are the buffering streams
(BufferedInputStream/BufferedOuputStream and BufferedReader/BufferedWriter) --
you should /always/ consider what kind of buffering to use, and where to use
it (sometimes it is correct not to use buffering at all, but that is rare).

The other stream classes are relatively unimportant.
3,I don't understand the "stream" very much,
What do you think of the "stream"? Would you give me so examples to
explain?

4,Would you please give me some advices fo learning java.IO?

http://java.sun.com/docs/books/tutorial/essential/io/index.html

-- chris
 
C

Chris Uppal

jtl.zheng said:
the Reader pick up one char from a local file
here how many bits "one char" has is decided by the local system
just as in Windwos "one char" has 8 bits
and in other system that may has 16bits or 32bits in one char
and all these bits will turn to 32bits Unicode in java
this is what the Reader do
is it correct?

Not quite, but you are close.

For a start, Readers don't read directly from files, they read from byte
streams (InputStream and its subclasses). That's not very important but
keening it in mind will help you understand the IO architecture better.

The number of bytes read to read one character depends on what character
encoding (or character set) the Reader has been configured to use. There are
many character sets, and you can configure a Reader to use whichever one you
want (for the character sets which come with Java anyway). You always have to
specify what character set to use, because Java has no way to tell (just from
the binary data in the file) which character set was used to write it. There
is also a default character set, which Java will use if you don't specify one
explicitly -- Java makes an assumption of what is likely to be correct based on
your operating system and locale[*]. It's OK to use the default if you are
just going to be reading or writing files on one computer (or in one office),
but if you are going to share data around the world then you will have to
consider carefully which character set(s) to use.

([*] It's actually set by the system property "file.encoding", which is
"Cp1252" on my machine but will be something different on yours.)

The important thing to realise is that it isn't the local file system or
operating system which "decides" what bytes are used to represent a character,
but the Reader itself (or rather the character encoding it has been configured
to use).

On this Windows machine there are lots of text files. Some use 8-bits per
character to represent text, those files can only hold a small subset of all
the Unicode characters, but that subset is large enough to use for British
English text. Some others hold text encoded as 16-bits per character (created
by Windows utilities). Still others hold text encoded in variable-length
encoding like UTF-8 (which uses 1 to 4 bytes per character) or UTF-16 (which
uses 2 or 4 bytes per character). If I want a Java program to read all of
those files correctly, then I have to /tell/ it what character encoding each
files uses. Many of them use an encoding ("Windows-1252") which is often used
by British English people using Windows, and that is what Java will assume a
file contains unless I tell it differently. So Java would read /some/ of the
files correctly, but not all. On you machine, Java would make a different
assumption, and so would read a different set of files correctly.
the InputStream pick up 8 bits one time for compose a byte
no matter what the system is
in windows ,in unix or other system, it always pick 8 bits one time,not
16bits or 32bits
is it correct?

That's right.

-- chris
 
J

jtl.zheng

Thank you very much , you are so enthusiastic
I'm much more clear now (difference between char stream and byte stream
and others)

but still a little puzzle:

Chris said:
If I want a Java program to read all of
those files correctly, then I have to /tell/ it what character encoding each
files uses.

do you means I should tell BufferedRead to use which encodes?
what I guest is that the JVM will decide which encodes to use
automaticly.
is it right?
but I don't know where do the JVM know what encodes should be used,
it judge it from where? the head of a file?

and today I have learn the RandomAccessFile
in its constructor "RandomAccessFile(String name, String mode) "
here mode has "r","rw","rws","rwd"

in JavaDocs:
"rws" Open for reading and writing, as with "rw", and also require that
every update to the file's content or metadata be written synchronously
to the underlying storage device.
"rwd" Open for reading and writing, as with "rw", and also require
that every update to the file's content be written synchronously to the
underlying storage device.

what is the difference between "rws" and "rwd"
what is the "file's content" and "metadata" mean?

Thank you very much in advance
: )
 
J

jtl.zheng

I find that RandomAccessFile is very useful to read the No.n record
which is written by the DataOutputStream.

1 but how can I use the RandomAccessFile to read the charrcter file .
because the length of one char is variable in defferent system
in my mind the seek() of RandomAccessFile method locate position by
byte,not by charrcter.
how can I locate the No.n character randomly?
can I do that?

2 if the data file written by the DataOutputStream is full of
String(UTF-8),
and the length of these is variable.
how can I locate the No.n string randomly?
can I do that?

Thank you very much in advance
: )
 
M

Mark Space

I'm going to continuing posting, even though I'm obviously not the
expert here, just because I don't mind being wrong. It's how I learn
stuff. :)

jtl.zheng said:
do you means I should tell BufferedRead to use which encodes?
what I guest is that the JVM will decide which encodes to use
automaticly.
is it right?
but I don't know where do the JVM know what encodes should be used,
it judge it from where? the head of a file?

I think determining file encoding is a system level responsibility; Java
just follows what the system provides.

Java has a default Reader/Writer that provides the defaults for your
system. I think these are InputStreamReader and OutputStreamWriter.

try {
InputStreamReader convert = new InputStreamReader(System.in);
BufferedReader in = newBufferReader(convert);
String text = in.readLine();
int i = NumberFormat.getInstance().parse(text).intValue();
}
catch ( IOException ) {}
catch { ParseException ) {}

The above is correct according to O'Reilly's Learning Java. (It may be
a bit old, however.) You should make sure that InputStreamReader and
OutputStreamWriter are the correct classes for your locale.

However, what Chris was talking about was something like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Remember me this weekend</message>
</note>

Ah ha! Now we have a file format that tells us what the encoding is!
Java won't recognize this automatically with Readers and Writers. (It
might with javax.parser.xml however.) You the programmer have to know
to read the encoding string, and then reopen the file (or whatever) with
the correct Reader filter.

XML is more of a "real world" answer. The Reader/Writer answer I gave
first is more of a "Java only" answer. The programmer must know which
to use and when.

what is the difference between "rws" and "rwd"

I don't know, sorry.
what is the "file's content" and "metadata" mean?

I think this refers to a multi-stream or multiple fork file model. The
"file's contents" refers to all the data. The stuff that InputStream
reads or OutputStream writes.

The "metadata" is stuff that IS NOT part of the file. These are things
that one doesn't read with InputStream nor write with OutputStream. For
example, the file name itself is not read with InputStream, the
programmer must supply that from some other source. Ditto with things
like access permission, and file creation and last modified dates and
times. Those aren't part of the file data, they are elsewhere.

Does that make sense?
 
C

Chris Uppal

jtl.zheng said:
do you means I should tell BufferedRead to use which encodes?
what I guest is that the JVM will decide which encodes to use
automaticly.

Not the BufferedReader (which /only/ adds buffering), but the InputStreamReader
(or FileReader).

No, the JVM does not (and cannot) decide automatically which encoding is
correct. There is a single default encoding which is always used whenever you
don't specify the encoding explicitly. That is, as far as I know, set when you
install Java on the machine. I don't know what aspects of the environment the
installer takes into account, but presumably it's the machine's OS and system
locale (on Windows). You can change that default yourself by setting the
system property (on the command-line, or I guess from code), but unless you do,
the default is fixed.

and today I have learn the RandomAccessFile
in its constructor "RandomAccessFile(String name, String mode) "
here mode has "r","rw","rws","rwd"

RandomAccessFile can be /very/ useful if you need it, but usually you don't --
I have never used one myself, for instance. (I have used random access to
files in other programming languages, but I've never happened to need it in
Java programming.)

what is the difference between "rws" and "rwd"
what is the "file's content" and "metadata" mean?

I'm not absolutely sure, since I have never used RandomAccessFile, and have not
looked at the source either. But usually when someone talks about a file's
content they mean what you would see if you opened the file up and read it
(either with an editor or from Java code). The metadata is things like the
last-accessed-time or the creation-time, and exactly what metadata there is is
/very/ dependent on the operating system and even the specific file system
where the live is. OSes sometimes don't update the last access time every time
that a file is read, or the last modified time every time it is written -- they
do that to speed things up. Some OSes also have the ability for an application
to tell them "update the metadata now anyway -- I need it to be correct /now/
not later", and I presume that the "rws" is provided for that purpose.

-- chris
 
C

Chris Uppal

Mark Space wrote:

.... XML snipped...
Ah ha! Now we have a file format that tells us what the encoding is!
Java won't recognize this automatically with Readers and Writers. (It
might with javax.parser.xml however.)

As I understand it, the XML parser will recognise and apply the correct
character set /if/ one is specified in the XML's doctype (obviously ;-) /and/
you haven't tried to second-guess it by asking it to read the text from a
(necessarily already configured) Reader but just give it an InputStream to read
from.

-- chris
 
J

jtl.zheng

Thank you very much to Chris Uppal and Mark Space
the JVM does not (and cannot) decide automatically which encoding is
correct.
There is a single default encoding which is always used whenever you
don't specify the encoding explicitly. That is, as far as I know, set when you
install Java on the machine.

I catch it now.Thank you. : )
RandomAccessFile can be /very/ useful if you need it, but usually you don't --
I have never used one myself, for instance. (I have used random access to
files in other programming languages, but I've never happened to need it in
Java programming.)

Could you tell me why don't you use RandomAccessFile in java?
What do you use to access file randomly?
I'm not absolutely sure, since I have never used RandomAccessFile, and have not
looked at the source either.

Where could I look at the source?
it is so interesting and useful.

: )
 
M

Mark Space

Chris said:
Mark Space wrote:

... XML snipped...

As I understand it, the XML parser will recognise and apply the correct
character set /if/ one is specified in the XML's doctype (obviously ;-) /and/
you haven't tried to second-guess it by asking it to read the text from a
(necessarily already configured) Reader but just give it an InputStream to read
from.


Good info. I have never used the XML parser, so I didn't want to say for
sure. The idea seemed logical to me that it would, though, so I thought
I'd mention the possibility.
 
C

Chris Uppal

jtl.zheng wrote:

[me:]
Could you tell me why don't you use RandomAccessFile in java?

No special reason, it's just that I haven't happened to write the kind of
application which needs random access to files. It's not something that I've
/avoided/, I just haven't needed it.

BTW, random access is a concept that really only applies to binary files, or to
text files considered as binary. As you pointed out yourself, if the file
contains text in a variable-width encoding then it is (at best) difficult to
use random access in that file sensibly. Probably the best you can do is
perform a sequential scan, decoding the text, and build some sort of table of
where the data you are interested in is.

Where could I look at the source?
it is so interesting and useful.

If you are using Sun's JDK, then there should be a src.zip file in the JDK's
installation directory. That contains most of the Java source for the
platform.

If you need /all/ the source -- the C++ source to the JVM itself, the C++
source to the native methods in AWT (etc), and the Java source to the "private"
implementation-specific parts of the platform -- then Sun allows you to
download that too. It's available from the same place as the JDK itself,
currently:
http://java.sun.com/javase/downloads/index.jsp
(near the bottom of the page). Be warned: there are two licenses you can
download under (SCSL and JRL), read the terms of whichever licence you choose
/very/ carefully, before deciding whether you want to accept it.

-- chris
 
J

jtl.zheng

I have another question about the encode

in physics, a text file only has the name of it and the content chars
in it
there is nothing else to figure out what encode it has
is it right?

if it is right
how do the JVM detect the encode correctly?
as we know in following codes the jvm can read any encode file
correctly

FileReader in=new FileReader("text.txt");
in.read();

it detect the encode from what?
 
M

Michael Rauscher

jtl.zheng said:
I have another question about the encode

in physics, a text file only has the name of it and the content chars
in it
there is nothing else to figure out what encode it has
is it right?

Yes (apart from "in physics" as a file is a logical construct).
if it is right
how do the JVM detect the encode correctly?

Not at all.
as we know in following codes the jvm can read any encode file
correctly

Then, you "know" something wrong :)
FileReader in=new FileReader("text.txt");

From the API docs:

The constructors of this class assume that the default character
encoding and [...] are appropriate. To specify these values yourself,
construct an InputStreamReader on a FileInputStream.

Bye
Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,265
Messages
2,571,069
Members
48,771
Latest member
ElysaD

Latest Threads

Top