Re: which OutputStreams are buffered?

Discussion in 'Java' started by Tom Anderson, May 16, 2008.

  1. Tom Anderson

    Tom Anderson Guest

    On Fri, 16 May 2008, Rex Mottram wrote:

    > There is a java.io.BufferedOutputStream whose purpose is well
    > documented, basically as a good thing to wrap around an unbuffered
    > OutputStream (at least if you want buffering). However, and surprisingly
    > to me, a number of the other OutputStreams in java.io do not document
    > whether they are buffered, and thus it's not clear to me whether I
    > should wrap them or not.


    I believe that BufferedOutputStream is the only one that does buffering
    *in java* (more or less ...), but others may involve buffers out in native
    code or the OS. FileOutputStream, for instance - i believe every write
    turns into a call to the OS or C library's write routine, but that may not
    immediately put bytes onto a platter. The stream you get from a Socket is
    another - all writes go to the TCP implementation, but that won't
    necessarily send them immediately.

    The point of buffering on the java side is that it saves you native calls
    - you make one call when you have a kilobyte of data to send, rather than
    one every time you have a morsel of data to write. This can be a big
    performance win. Basically, always wrap.

    You still have to worry about the native buffering for correctness, though
    - you can't rely on data being written to a file until you've flushed the
    FileOutputStream.

    Now, that "more or less" above is about the various streams which do
    transformations on data passing through them, and which have to do some
    buffering to do that. That means GZIPOutputStream, DeflaterOutputStream,
    CipherOutputStream, and possibly others. These require special attention
    to wring all their bytes out of them. However, i think this is pretty well
    documented in each case.

    tom

    --
    It's the 21st century, man - we rue _minutes_. -- Benjamin Rosenbaum
     
    Tom Anderson, May 16, 2008
    #1
    1. Advertising

  2. Tom Anderson

    Neil Coffey Guest

    Tom Anderson wrote:

    > I believe that BufferedOutputStream is the only one that does buffering
    > *in java* (more or less ...), but others may involve buffers out in
    > native code or the OS.


    As far as your Java application is concerned, I think you
    should generally treat "secret buffering at the OS level" as
    "no buffering" and should wrap in a BufferedInput/OutputStream--
    otherwise you have the overhead of the native call on every
    single read/write.

    I believe you don't need extra buffering in the streams given to you
    by some Servlet implementations (they do their own Java-side
    buffering to handle the HTTP protocol), though I'd be interested
    if anyone has further insight on this.

    > Now, that "more or less" above is about the various streams which do
    > transformations on data passing through them, and which have to do some
    > buffering to do that. That means GZIPOutputStream, DeflaterOutputStream,
    > CipherOutputStream, and possibly others.


    For similar reasons to above, it's generally best to add a Java buffer
    unless you have strong grounds for not doing so. These compression
    stream classes may "naturally" work on a buffer, but if the buffer
    is held natively, then it's a native call to fetch each individual byte
    unless you buffer in Java.

    If memory serves correctly, it was the flavour of InputStream you get
    from ZipFile.getInputStream() whose single-byte read() method creates
    a new one-element byte array on each call and then calls the
    multi-byte version...

    Neil
     
    Neil Coffey, May 18, 2008
    #2
    1. Advertising

  3. Tom Anderson

    Tom Anderson Guest

    On Sat, 17 May 2008, Neil Coffey wrote:

    > Tom Anderson wrote:
    >
    >> I believe that BufferedOutputStream is the only one that does buffering *in
    >> java* (more or less ...), but others may involve buffers out in native code
    >> or the OS.

    >
    > As far as your Java application is concerned, I think you should
    > generally treat "secret buffering at the OS level" as "no buffering" and
    > should wrap in a BufferedInput/OutputStream-- otherwise you have the
    > overhead of the native call on every single read/write.


    Yes, that's exactly what i said in my post:

    "The point of buffering on the java side is that it saves you native calls
    - you make one call when you have a kilobyte of data to send, rather than
    one every time you have a morsel of data to write. This can be a big
    performance win. Basically, always wrap."

    Except that you *do* need to be aware of the secret buffering for
    correctness reasons:

    "you can't rely on data being written to a file until you've flushed the
    FileOutputStream."

    So, for things like FileOutputStream, you have to treat them as both
    unbuffered (by wrapping them in a buffered stream) and buffered (by
    remembering to flush) at the same time!

    > I believe you don't need extra buffering in the streams given to you by
    > some Servlet implementations (they do their own Java-side buffering to
    > handle the HTTP protocol), though I'd be interested if anyone has
    > further insight on this.


    Good point.

    >> Now, that "more or less" above is about the various streams which do
    >> transformations on data passing through them, and which have to do some
    >> buffering to do that. That means GZIPOutputStream, DeflaterOutputStream,
    >> CipherOutputStream, and possibly others.

    >
    > For similar reasons to above, it's generally best to add a Java buffer
    > unless you have strong grounds for not doing so. These compression
    > stream classes may "naturally" work on a buffer, but if the buffer is
    > held natively, then it's a native call to fetch each individual byte
    > unless you buffer in Java.


    True.

    > If memory serves correctly, it was the flavour of InputStream you get
    > from ZipFile.getInputStream() whose single-byte read() method creates a
    > new one-element byte array on each call and then calls the multi-byte
    > version...


    Urgh!

    tom

    --
    1 p4WN 3v3Ry+h1n G!!!
     
    Tom Anderson, May 18, 2008
    #3
  4. Tom Anderson wrote:
    > On Sat, 17 May 2008, Neil Coffey wrote:


    > Except that you *do* need to be aware of the secret buffering for
    > correctness reasons:
    >
    > "you can't rely on data being written to a file until you've flushed the
    > FileOutputStream."
    >
    > So, for things like FileOutputStream, you have to treat them as both
    > unbuffered (by wrapping them in a buffered stream) and buffered (by
    > remembering to flush) at the same time!


    Actually, you are still not sure if the data has been written to the
    file, only that the data has been passd from the Java side to the OS
    side. To ensure data has been commited to disk you should call synch()
    on the FileDescriptor. (Although this is not necessary for most
    applications).

    --
    Roger Lindsjö
     
    Roger Lindsjö, May 20, 2008
    #4
  5. Tom Anderson

    Tom Anderson Guest

    On Tue, 20 May 2008, Roger Lindsjö wrote:

    > Tom Anderson wrote:
    >> On Sat, 17 May 2008, Neil Coffey wrote:

    >
    >> Except that you *do* need to be aware of the secret buffering for
    >> correctness reasons:
    >>
    >> "you can't rely on data being written to a file until you've flushed the
    >> FileOutputStream."
    >>
    >> So, for things like FileOutputStream, you have to treat them as both
    >> unbuffered (by wrapping them in a buffered stream) and buffered (by
    >> remembering to flush) at the same time!

    >
    > Actually, you are still not sure if the data has been written to the
    > file, only that the data has been passd from the Java side to the OS
    > side. To ensure data has been commited to disk you should call synch()
    > on the FileDescriptor. (Although this is not necessary for most
    > applications).


    My impression was that this was not the case - that
    FileOutputStream.flush() operates the C library or OS's flush mechanism.

    Ah, no, OutputStream.flush:

    "If the intended destination of this stream is an abstraction provided by
    the underlying operating system, for example a file, then flushing the
    stream guarantees only that bytes previously written to the stream are
    passed to the operating system for writing; it does not guarantee that
    they are actually written to a physical device such as a disk drive."

    How unhelpful.

    tom

    --
    there is never a wrong time to have your bullets passing further into
    someone's face -- D
     
    Tom Anderson, May 20, 2008
    #5
  6. "Neil Coffey" <> wrote in message
    news:g0od2l$vta$...
    > Tom Anderson wrote:
    >
    >> I believe that BufferedOutputStream is the only one that does buffering
    >> *in java* (more or less ...), but others may involve buffers out in
    >> native code or the OS.

    >
    > As far as your Java application is concerned, I think you
    > should generally treat "secret buffering at the OS level" as
    > "no buffering" and should wrap in a BufferedInput/OutputStream--
    > otherwise you have the overhead of the native call on every
    > single read/write.
    >
    > I believe you don't need extra buffering in the streams given to you
    > by some Servlet implementations (they do their own Java-side
    > buffering to handle the HTTP protocol), though I'd be interested
    > if anyone has further insight on this.


    You ought not to need extra buffering, since with Servlet 2.2 response
    buffering is part of the API. This doesn't so much "handle" the HTTP
    protocol as just make it easier to work with, especially as regards error
    handling.

    Working in cooperation with that would be buffering to support chunked
    encoding, which would be directly "handling" the HTTP protocol. Wikipedia
    (http://en.wikipedia.org/wiki/HTTP#Persistent_connections) rather
    confusingly says that chunking allows data on persistent connections to be
    streamed rather than buffered, but of course the mechanism is still going to
    be doing buffering.

    AHS
     
    Arved Sandstrom, May 20, 2008
    #6
  7. Tom Anderson

    Tom Anderson Guest

    On Tue, 20 May 2008, Kenneth P. Turvey wrote:

    > On Tue, 20 May 2008 12:49:54 +0100, Tom Anderson wrote:
    >
    >> "If the intended destination of this stream is an abstraction provided
    >> by the underlying operating system, for example a file, then flushing
    >> the stream guarantees only that bytes previously written to the stream
    >> are passed to the operating system for writing; it does not guarantee
    >> that they are actually written to a physical device such as a disk
    >> drive."
    >>
    >> How unhelpful.

    >
    > But to be expected. Many operating systems provide no way to guarantee
    > that a given write has made it all the way to the disk.


    Really? That's genuinely shocking. Which are the culprits?

    (apart from the versions of unix you mention below)

    > Even under some versions of Unix, sync does not guarantee this.


    Double wow. Could you expand on that?

    tom

    --
    We got our own sense of propaganda. We call it truth. -- Rex Steele,
    Nazi Smasher
     
    Tom Anderson, May 21, 2008
    #7
  8. Tom Anderson

    Tom Anderson Guest

    On Wed, 20 May 2008, Kenneth P. Turvey wrote:

    > On Wed, 21 May 2008 00:36:47 +0100, Tom Anderson wrote:
    >
    >>> Even under some versions of Unix, sync does not guarantee this.

    >>
    >> Double wow. Could you expand on that?

    >
    > Honestly, I can't. I've run into it before and I cataloged it in my head
    > with some strange behavior in AIX having to do with signals and
    > security. It may have been another AIX quirk, but I don't recall.
    >
    > It may also have been only for non-root users. I don't remember the
    > details.


    Fair enough. I'll keep it in mind though!

    Oh christ - i just looked up what the Open Group have to say about it [1],
    and according to IEEE Std 1003.1, 2004 Edition, aka POSIX:

    "The sync() function shall cause all information in memory that updates
    file systems to be scheduled for writing out to all file systems.

    "The writing, although scheduled, is not necessarily complete upon return
    from sync()."

    Although that's the all-files sync, and not the just-this-file fsync,
    which says:

    "The fsync() function shall request that all data for the open file
    descriptor named by fildes is to be transferred to the storage device
    associated with the file described by fildes. The nature of the transfer
    is implementation-defined. The fsync() function shall not return until the
    system has completed that action or until an error is detected."

    Which sounds a bit sketchy, but basically what we want. But then it comes
    back with:

    "If _POSIX_SYNCHRONIZED_IO is not defined, the wording relies heavily on
    the conformance document to tell the user what can be expected from the
    system. It is explicitly intended that a null implementation is
    permitted."

    Great!

    tom

    [1] http://www.opengroup.org/onlinepubs/000095399/functions/sync.html

    --
    see im down wid yo sci fi crew
     
    Tom Anderson, May 21, 2008
    #8
  9. Tom Anderson

    Arne Vajhøj Guest

    Tom Anderson wrote:
    > Oh christ - i just looked up what the Open Group have to say about it
    > [1], and according to IEEE Std 1003.1, 2004 Edition, aka POSIX:
    >
    > "The sync() function shall cause all information in memory that updates
    > file systems to be scheduled for writing out to all file systems.
    >
    > "The writing, although scheduled, is not necessarily complete upon
    > return from sync()."
    >
    > Although that's the all-files sync, and not the just-this-file fsync,
    > which says:
    >
    > "The fsync() function shall request that all data for the open file
    > descriptor named by fildes is to be transferred to the storage device
    > associated with the file described by fildes. The nature of the transfer
    > is implementation-defined. The fsync() function shall not return until
    > the system has completed that action or until an error is detected."
    >
    > Which sounds a bit sketchy, but basically what we want. But then it
    > comes back with:
    >
    > "If _POSIX_SYNCHRONIZED_IO is not defined, the wording relies heavily on
    > the conformance document to tell the user what can be expected from the
    > system. It is explicitly intended that a null implementation is permitted."


    At that level there are not much guarantees for anything.

    I guess one of the reasons is that it can be very difficult to
    implement an API that make it 100% sure the data is at location that
    is rotating. Cache in RAID controllers, cache in disk drives,
    file servers, NAS, SAN etc.etc..

    Arne
     
    Arne Vajhøj, May 21, 2008
    #9
  10. Tom Anderson

    Tom Anderson Guest

    On Tue, 20 May 2008, Lew wrote:

    > Arne Vajhøj wrote:
    >> At that level there are not much guarantees for anything.
    >>
    >> I guess one of the reasons is that it can be very difficult to
    >> implement an API that make it 100% sure the data is at location that
    >> is rotating. Cache in RAID controllers, cache in disk drives,
    >> file servers, NAS, SAN etc.etc..

    >
    > From what I understand one can configure certain file systems to be truthful
    > about their [f]sync activity. The program may not be able to count on that,
    > but it has to trust the file system to do its part in accordance with system
    > goals.
    >
    > The key to safe writes isn't so much how quickly they happen, but that
    > they don't report completion until they're actually written.
    > Synchronous writes come back certain of their outcome.


    Exactly. And that's such a fundamental principle of storage that i'd be
    surprised if there was any component in the chain that didn't support it -
    they may all have caches, but that just means they all also have to
    provide a flush mechanism.

    An interesing case would be in a super-reliable data shop, where all data
    written to disk gets backed up on tape. Would flush() wait for the backups
    to be made? :)

    tom

    --
    I need a proper outlet for my tendency towards analytical thought. --
    Geneva Melzack
     
    Tom Anderson, May 21, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Yurai Núñez Rodríguez

    Problems using ASPNET OutputStreams

    Yurai Núñez Rodríguez, May 2, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    317
    Yurai Núñez Rodríguez
    May 2, 2004
  2. Andy Fish

    closing writers and outputstreams

    Andy Fish, Nov 13, 2003, in forum: Java
    Replies:
    1
    Views:
    521
    Thomas Weidenfeller
    Nov 13, 2003
  3. Bruce Lee
    Replies:
    1
    Views:
    370
    Steve Horsley
    Dec 7, 2004
  4. Replies:
    9
    Views:
    723
    Michael Wojcik
    Aug 23, 2005
  5. Knute Johnson

    Re: which OutputStreams are buffered?

    Knute Johnson, May 16, 2008, in forum: Java
    Replies:
    1
    Views:
    354
    Knute Johnson
    May 16, 2008
Loading...

Share This Page