Questions about buffered streams

F

failure_to

hello

Sorry for so many questions, but I think I/O is one topic that will
give me troubles for quite some time

1)
What makes read and write operations time consuming ( or at least more
time consuming than calls to ordinary, non IO methods? )?

a) The fact that underlying stream classes make calls to system
libraries?

b)
* Buffer classes are supposed to help us with that, since they buffer
data and thus don¡¦t necessarelly call underlying system for each byte.
But non the less, even if buffered stream doesn¡¦t immediately call
underlying system in order to write a byte, it still has to call
underlying system ( say it buffers 5 bytes of data and at some point
we call flush() ) once for each byte „³ just as non buffered stream
classes have to¡K so in the end, same amount of time was spent to write
those five bytes to a file as if we¡¦d write those 5 bytes with non
buffered stream ¡K only difference being that those 5 bytes were
written at once and not every time write() was called?!

* Or, can buffered streams somehow call underlying system¡¦s method
just once and with just that one call write all five bytes of data?



c)
Even if buffered byte streams can somehow write all 5 bytes with one
system call, that shouldn¡¦t be true for BufferedWriter streams?!
BufferedWriter stream doesn¡¦t directly talk to underlying byte
stream, so I assume it would still take 5 system calls to write those
five bytes of data? So no time was saved!




2)
FileInputStream FS = new FileInputStream ( ¡§A.txt¡¨ );
BufferedInputStream BS = new BufferedInputStream ( FS );
DataInputStream DS = new DataInputStream ( BS );

Of the three objects above, I assume only the byte stream objects keep
some sort of internal pointer which keeps track of which bytes in a
stream were already written/read and thus advances this pointer with
each read or write operation? Wrapper objects ( BS and DS in the above
example ) don¡¦t have such internal pointers?!

3)
If you use say Fileoutputstream method write( int buf[]¡K), does it act
like kind of buffer and reads all those bytes with one system call, or
does it make one system call for each byte read?


4)
BufferedReader in = new
BufferedReader( new FileReader("foo.in") );

Does even simple in.read() without any parameters specified causes
wrapped FileReader object to read more than just one character from
underlying byte stream?


5)
Next questions are about PrintStream class. Here is what Java docs and
my book have to say about this class:

¡§All characters printed by a PrintStream are converted into bytes
using the platform's default character encoding. ¡§

I assume the text is referring to cases where we don¡¦t specify type of
encoding in a constructor, since if we do specify which encoding to
use, then PrintStream converts characters into bytes using specified
encoding and not platform¡¦s default character encoding?!



¡§For real-world programs, the recommended method of
writing to the console when using Java is through a PrintWriter
stream. PrintWriter is one of the character-based classes. Using a
character-based class for console output makes it easier to
internationalize your program.¡¨

¡§The PrintWriter class should be used in situations that require
writing characters rather than bytes.¡¨


* Why should PrintWriter be used instead in situations that require
writing characters instead of bytes?

* How does PrintWriter make it easier to internationalize a program?

* When dealing with characters, when and why ( or why not ) would you
choose PrintWriter over some other character based stream ( like
OutputStreamWriter )?


thank you

cheers
 
S

Silvio Bierman

hello

Sorry for so many questions, but I think I/O is one topic that will
give me troubles for quite some time

1)
What makes read and write operations time consuming ( or at least more
time consuming than calls to ordinary, non IO methods? )?

a) The fact that underlying stream classes make calls to system
libraries?

b)
* Buffer classes are supposed to help us with that, since they buffer
data and thus don¡¦t necessarelly call underlying system for each byte.
But non the less, even if buffered stream doesn¡¦t immediately call
underlying system in order to write a byte, it still has to call
underlying system ( say it buffers 5 bytes of data and at some point
we call flush() ) once for each byte ? just as non buffered stream
classes have to¡K so in the end, same amount of time was spent to write
those five bytes to a file as if we¡¦d write those 5 bytes with non
buffered stream ¡K only difference being that those 5 bytes were
written at once and not every time write() was called?!

* Or, can buffered streams somehow call underlying system¡¦s method
just once and with just that one call write all five bytes of data?



c)
Even if buffered byte streams can somehow write all 5 bytes with one
system call, that shouldn¡¦t be true for BufferedWriter streams?!
BufferedWriter stream doesn¡¦t directly talk to underlying byte
stream, so I assume it would still take 5 system calls to write those
five bytes of data? So no time was saved!




2)
FileInputStream FS = new FileInputStream ( ¡§A.txt¡¨ );
BufferedInputStream BS = new BufferedInputStream ( FS );
DataInputStream DS = new DataInputStream ( BS );

Of the three objects above, I assume only the byte stream objects keep
some sort of internal pointer which keeps track of which bytes in a
stream were already written/read and thus advances this pointer with
each read or write operation? Wrapper objects ( BS and DS in the above
example ) don¡¦t have such internal pointers?!

3)
If you use say Fileoutputstream method write( int buf[]¡K), does it act
like kind of buffer and reads all those bytes with one system call, or
does it make one system call for each byte read?


4)
BufferedReader in = new
BufferedReader( new FileReader("foo.in") );

Does even simple in.read() without any parameters specified causes
wrapped FileReader object to read more than just one character from
underlying byte stream?


5)
Next questions are about PrintStream class. Here is what Java docs and
my book have to say about this class:

¡§All characters printed by a PrintStream are converted into bytes
using the platform's default character encoding. ¡§

I assume the text is referring to cases where we don¡¦t specify type of
encoding in a constructor, since if we do specify which encoding to
use, then PrintStream converts characters into bytes using specified
encoding and not platform¡¦s default character encoding?!



¡§For real-world programs, the recommended method of
writing to the console when using Java is through a PrintWriter
stream. PrintWriter is one of the character-based classes. Using a
character-based class for console output makes it easier to
internationalize your program.¡¨

¡§The PrintWriter class should be used in situations that require
writing characters rather than bytes.¡¨


* Why should PrintWriter be used instead in situations that require
writing characters instead of bytes?

* How does PrintWriter make it easier to internationalize a program?

* When dealing with characters, when and why ( or why not ) would you
choose PrintWriter over some other character based stream ( like
OutputStreamWriter )?


thank you

cheers

You asked too many questions to be answered individually, especially
because they stack on top of each other. I will try to give a brief
explanation and suggest you do some reading/googling.

The system calls you refer to primarily come down to the same two system
calls that can read/write n bytes from/to what is often called a file
descriptor. In C these system calls would be

int read(int fd,char *buff,int nbytes);
int write(int fd,char *buff,int nbytes);

For any Java implementation the basic systems routines might look
entirely different but this should be sufficiently accurate.

This answers your question about buffered streams, they read/write more
optimal sized blocks of bytes in one system call than the individual
read/write calls performed on the stream.

Readers/writers add the abstraction of characters and encodings of
characters into bytes. At the end they need a stream to read/write bytes.

Basically this is all you need but if you want to be able to write
strings, integers etc. to some character oriented output then you will
need some formatting logic and that is what a PrintWriter will do for you.

A DataInputStream or a DataOutputStream is something that adds binary IO
of Strings, integers etc. to a byte oriented stream.

This all fits together nicely. In combination of the java.text.XXXFormat
classes you have a rather complete set of basic IO tools in the Java SDK.

I would suggest a good Java textbook or the Sun website to learn more.

Good luck,

Silvio Bierman
 
R

Roedy Green

What makes read and write operations time consuming ( or at least more
time consuming than calls to ordinary, non IO methods? )?

they require mechanical motion of disk head, and waiting for disk
surfaces to spin under the read head, and for the data to mechanically
pass by the read head.
 
R

Roedy Green

b)
* Buffer classes are supposed to help us with that, since they buffer
data and thus don??t necessarelly call underlying system for each byte.
But non the less, even if buffered stream doesn??t immediately call
underlying system in order to write a byte, it still has to call
underlying system ( say it buffers 5 bytes of data and at some point
we call flush() ) once for each byte ?? just as non buffered stream
classes have to?K so in the end, same amount of time was spent to write
those five bytes to a file as if we??d write those 5 bytes with non
buffered stream ?K only difference being that those 5 bytes were
written at once and not every time write() was called?!

If you wrote 1 byte at a time, you would have to wait for the spot on
disk to spin round for each byte. If you write 64,000 bytes at a time,
you only have to wait once for the proper spot on disk to spin round.

see http://mindprod.com/jgloss/buffer.html
 
R

Roedy Green

Even if buffered byte streams can somehow write all 5 bytes with one
system call, that shouldn??t be true for BufferedWriter streams?!
BufferedWriter stream doesn??t directly talk to underlying byte
stream, so I assume it would still take 5 system calls to write those
five bytes of data? So no time was saved!

when you write to a buffer, the buffer class only writes to the system
when the buffer is full, or when you flush or close.
 
R

Roedy Green

FileInputStream FS = new FileInputStream ( ??A.txt?? );
BufferedInputStream BS = new BufferedInputStream ( FS );
DataInputStream DS = new DataInputStream ( BS );

Of the three objects above, I assume only the byte stream objects keep
some sort of internal pointer which keeps track of which bytes in a
stream were already written/read and thus advances this pointer with
each read or write operation? Wrapper objects ( BS and DS in the above
example ) don??t have such internal pointers?!

All streams keep track internally of how many bytes have been read
both on disk and from the buffer. RandomAccessFiles also keep track
of where you are in the file, but explicitly with getFilePointer and
seek.
 
R

Roedy Green

3)
If you use say Fileoutputstream method write( int buf[]?K), does it act
like kind of buffer and reads all those bytes with one system call, or
does it make one system call for each byte read?

You can example the source for yourself in src.zip. It will write all
the bytes in one go. I do a lot of I/O megabytes at a pop and it is
very fast, certainly not a byte at a time.
 
R

Roedy Green

4)
BufferedReader in = new
BufferedReader( new FileReader("foo.in") );

Does even simple in.read() without any parameters specified causes
wrapped FileReader object to read more than just one character from
underlying byte stream?

Buffered readers will either:

1. satisfy the request from the buffer.
2. read a buffer full, then satisfy the request.
3. read to the tail end of the file if it can't get a whole buffer
full, then satisfy the request.
 
R

Roedy Green

??All characters printed by a PrintStream are converted into bytes
using the platform's default character encoding. ??

Inside the program you are using 16-bit Unicode. Your platform
typically supports 8-bit text files. What encoding depends where you
live. See http://mindprod.com/jgloss/encoding.html

PrintStream automatically encodes to the local 8-bit encoding,
However, you can explicitly choose the encoding, .e.g. UTF-8 or even
UTF-16.
 
R

Roedy Green

??The PrintWriter class should be used in situations that require
writing characters rather than bytes.??

PrintWriters are for writing text files. They have translation going
on. This would confound efforts to compose binary bytes. For than use
DataOutputStream.
 
R

Roedy Green

* When dealing with characters, when and why ( or why not ) would you
choose PrintWriter over some other character based stream ( like
OutputStreamWriter )?

PrintWriter gives you extra methods, mostly println which will insert
a platform specific line ending.
 
F

failure_to

hello


If you wrote 1 byte at a time, you would have to wait for the
spot on disk to spin round for each byte. If you write 64,000
bytes at a time, you only have to wait once for the proper spot
on disk to spin round.

a) So in essence, when flush() is called, buffered stream calls
underlying byte stream's write() and this write() method calls
underlying system just once and with that one call transfers all of
64000 bytes( meaning, write() is not called 64000 times )?



b)
FileOutputStream fo = new FileOutputStream( "A.txt" );
fo.write(byte_1);
fo.write(byte_2);
..
..
..
fo.write(byte_64000);


So in theory, above code would write those 64000 bytes to a system in
aprox the same amount of time as Buffered stream would, assuming no
other thread blocks this output stream?
I'm assuming this since:

* first write() call ( fo.write(byte_1) ) causes disk to spin to
appropriate spot
* since after the first write() call disk is at the appropriate spot,
the disk doesn't have to rotate for the subsequent 63999 write()
calls



when you write to a buffer, the buffer class only writes to the
system when the buffer is full, or when you flush or close.
--

So when BufferedWriter flushes its data, the procedure is the same as
when BufferedOutputStream flushes its data ( meaning it takes same
amount of time to write those 64000 bytes to the file )?



If you have an explicit encoding, you can put that in your
internationolisation configurion file.

So only advantage of Printwriter over PrintStream ( when dealing with
characters ) is internationalization?

PrintWriter gives you extra methods, mostly println which will
insert a platform specific line ending.

* While other output character streams only insert platform specific
line ending when newline() is called?

* Don't character streams also have a method which writes a full line
and that automatically adds native newline sequence? I'm asking this
cos I can't find one.


All streams keep track internally of how many bytes have been
read both on disk and from the buffer. RandomAccessFiles also
keep track of where you are in the file, but explicitly with
getFilePointer and seek.

Yes, but only FileOutputStream stream knows the TOTAL offset from the
beginning of the file ( from the time we first started reading the
file )?!


thank you
 
R

Roedy Green

a) So in essence, when flush() is called, buffered stream calls
underlying byte stream's write() and this write() method calls
underlying system just once and with that one call transfers all of
64000 bytes( meaning, write() is not called 64000 times )?

If you had a buffer of 64K, no matter how small the pieces you wrote,
no physical I/O would happen until you called flush or close if the
total size were under 64K. If you wrote more than 64K, you would get
a physical write when you filled the first 64K.

Did you read my essay at http://mindprod.com/jgloss/buffer.html

If you did, please read it again and tell me where you got confused.
 
R

Roedy Green

So in theory, above code would write those 64000 bytes to a system in
aprox the same amount of time as Buffered stream would, assuming no
other thread blocks this output stream?
I'm assuming this since:

yes. however this is still some overhead for each call to write to
copy the bytes to the buffer. It has to check if the buffer is full
etc.
 
D

Daniel Pitts

yes. however this is still some overhead for each call to write to
copy the bytes to the buffer. It has to check if the buffer is full
etc.
Actually, the disk is constantly spinning, so if you don't write in a
complete block, the disk may pass the position you want it to write to
before you write, so it would in effect be in the *worst* position for
the write.

Not to mention that typically disk IO happens in Sectors or Clusters,
which are usually at least 512 bytes long. Unless the OS itself does
some caching, writing one byte at a time is actually a read of 512
bytes, update of *that* buffer, and a write of 512 bytes. As you can
imagine, this is highly inefficient.

Something else to note is that you're discussion so far has assumed
Disk IO operations, but there are other forms of IO, including network
IO. Writing one byte at a time to a Socket stream can result in a lot
of overhead for the underlying protocols. I think that TCP/IP has a
minimum of something like 38 bytes, not to mention the ethernet and OS
overhead.
 
F

failure_to

hello
If you had a buffer of 64K, no matter how small the pieces you
wrote, no physical I/O would happen until you called flush or
close if the total size were under 64K. If you wrote more than
64K, you would get a physical write when you filled the first
64K.

I realize that!
Did you read my essay at http://mindprod.com/jgloss/buffer.html
If you did, please read it again and tell me where you got
confused.

I'm not sure how you got the impression that article got me
confused?

yes. however this is still some overhead for each call to write
to copy the bytes to the buffer. It has to check if the buffer
is full etc.

Are you talking about the code below or about buffered streams? I know
from your article that using buffers can cause some overhead due to
bytes being copied to buffer, but from my understanding fo object
doesn't buffer these bytes, but sends them directly to the system ... so
in theory ( well mine, much flawed theory )the below code should write
those bytes to the system faster than buffered stream would ( assuming
the disk isn't constantly spinning :) ... which apparently it is )

FileOutputStream fo = new FileOutputStream( "A.txt" );
fo.write(byte_1);
fo.write(byte_2);
..
..
..
fo.write(byte_64000);

Actually, the disk is constantly spinning, so if you don't write
in a complete block, the disk may pass the position you want it
to write to before you write, so it would in effect be in the
*worst* position for the write.

Not to mention that typically disk IO happens in Sectors or
Clusters, which are usually at least 512 bytes long. Unless the
OS itself does some caching, writing one byte at a time is
actually a read of 512 bytes, update of *that* buffer, and a
write of 512 bytes. As you can imagine, this is highly
inefficient.


but if the disk wasn't constantly spinning then

FileOutputStream fo = new FileOutputStream( "A.txt" );
fo.write(byte_1);
fo.write(byte_2);
..
..
..
fo.write(byte_64000);

would be just as efficient as if buffered stream flushed those 64000
bytes?
 
L

Lew

I'm not sure how you got the impression that article got me
confused?

Perhaps it was your assertion that 64K individual writes of one byte would
proceed faster than a single write of 64K bytes that gave that impression.

No.

64K one-byte writes will be *much* slower than one 64 KB write.
Are you talking about the code below or about buffered streams? I know
from your article that using buffers can cause some overhead due to
bytes being copied to buffer, but from my understanding fo object
doesn't buffer these bytes, but sends them directly to the system ... so
in theory ( well mine, much flawed theory )the below code should write
those bytes to the system faster than buffered stream would ( assuming
the disk isn't constantly spinning :) ... which apparently it is )

No.

Even assuming you're writing to a disk, your Java write operation is so far
removed from "spinning" that it isn't even remotely useful to think of it in
those terms.

You have Java flushing to a system buffer, which writes to a driver, which
loads data onto a disk-controller cache if there is one, which loads data onto
the disk's own cache, which loads data onto the disk platter(s). Assuming no
RAID, which adds some overhead of multiple-disk synchronization. OSes have
'fsync' and such modes that determine if writes go all the way to platter
before reporting completion, which may or may not be engaged.

Anyway, each individual write has to go through all those layers - 64000 times
for one byte apiece will always lose to 64KB through the gate in a single rush.
 
E

EJP

Roedy said:
All streams keep track internally of how many bytes have been read
both on disk and from the buffer.

No they don't.
RandomAccessFiles also keep track of where you are in the file,
> but explicitly with getFilePointer and seek.

No they don't. The operating system does that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top