return to the begin of InputStream

L

Lothar Kimmeringer

Anabolik said:
How to return to the begin of InputStream after reading this file?

use a BufferedInputStream with mark and reset or just reopen
the FileInputStream. If you need to access the file in a
"random" (non-sequential) way, you can use RandomAccessFile
http://java.sun.com/j2se/1.5.0/docs/api/java/io/RandomAccessFile.html
and call seek(0)


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lothar Kimmeringer

Arne said:
Lothar said:
Anabolik said:
How to return to the begin of InputStream after reading this file?

use a BufferedInputStream with mark and reset [...]

I would not rely on mark reset.

The docs for the marklimit method does not give a comfortable
feeling.

You need to specify the maximum number of bytes you want to
be able to go back, that's the reason why I gave alternative
ways to solve that.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
D

Daniel Pitts

Anabolik said:
How to return to the begin of InputStream after reading this file?
If it is a FileInputStream, you can use mark/reset, or you can re-open
the file for reading.

If you don't know what kind of InputStream it is, you can check if it
markSupported(). If markSupported() is false, you can wrap the
InputStream in a "BufferedInputStream", which does support mark/reset.

Note that BufferedInputStream's mark() implementation will actually read
the contents into memory, so you don't want to do that if you need to
read a lot of large files concurrently.

Again, if you know it is a file, you can use RandomAccessFile instead of
InputStream.
 
M

markspace

Arne said:
Close and reopen the file.


Good idea. Another in the same vein, if the stream is a network stream
that can't be re-opened (easily), then make a temp file and copy it to
the temp file. Then you can re-open the temp file as random access.

File.createTempFile() I think is the name of the method to create a
temporary file.
 
R

Roedy Green

How to return to the begin of InputStream after reading this file?

Open another inputStream. e.g.

// Read big-endian ( Java ) binary from a sequential file.

// C A V E A T S
// WARNING! Code to handle IOExceptions is not shown.
// WARNING! This code is intended for teaching. In practice you would
telescope some of the steps.
// WARNING! unsigned Applets may not read the local hard disk.

// import java.io.*;

// O P E N
FileInputStream fis = new FileInputStream( "C:/temp/temp.in" );
DataInputStream dis = new DataInputStream( fis );

// R E A D
boolean q = dis.readBoolean();
byte b = dis.readByte();
byte ba[] = new byte[1024];
// -1 means eof.
// You don't necessarily get all you ask for in one read.
// You get what's immediately available.
int bytesRead = dis.read( ba, 0 /* offset in ba */, ba.length /* bytes
to read */ );
char c = dis.readChar();
// There is no readChars method
double d = dis.readDouble();
float f = dis.readFloat();
int j = dis.readInt();
long l = dis.readLong();
short s = dis.readShort();
// Do not use readUTF to read Unicode files.
// readUTF reads binary counted Strings created by writeUTF.
// The lead 16-bit byte count and UTF-8 encoding means Strings must be
<= 10,922 chars!!
// See http://mindprod.com/jgloss/utf.html#WRITEUTF for details.// Use
Reader.read with an explicit UTF-8 or UTF-16 encoding instead.
String u = dis.readUTF();
byte ub = (byte)dis.readUnsignedByte();
short us = (short)dis.readUnsignedShort();

// C L O S E
dis.close();


// R E O P E N
fis = new FileInputStream( "C:/temp/temp.in" );
dis = new DataInputStream( fis );
 
J

John B. Matthews

Anabolik said:
How to return to the begin of InputStream after reading this file?

Others have prudently suggested simply closing and re-opening the file,
but I tried the mark/reset approach just to see how it worked:

<http://sites.google.com/site/trashgod/edasm#DC>

The object of the exercise was to meaningfully format certain portions
of a legacy binary file. Using reset seemed a good way to explore,
absent _a_priori_ knowledge of the correct relationships.

The example uses "int value = in.read()" to advance through the data,
but reading the entire stream and using "int value = data[index++]"
would do as well. Still, I can see the value of being able to back-track
while parsing a lengthy, non-file stream.
 
L

Lothar Kimmeringer

markspace said:
Good idea. Another in the same vein, if the stream is a network stream
that can't be re-opened (easily), then make a temp file and copy it to
the temp file. Then you can re-open the temp file as random access.

File.createTempFile() I think is the name of the method to create a
temporary file.

It is but the file is not a temporary file that gets deleted
automagically. You have to do that for yourself, otherwise your
temporary folder get flooded by files over time.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
D

Daniel Pitts

Lothar said:
It is but the file is not a temporary file that gets deleted
automagically. You have to do that for yourself, otherwise your
temporary folder get flooded by files over time.


Regards, Lothar

Unless you call "file.deleteOnExit()", then it will get deleted
automagically (except on JVM crash).
 
L

Lothar Kimmeringer

Daniel Pitts wrote:

[File.createTempFile()]
Unless you call "file.deleteOnExit()", then it will get deleted
automagically (except on JVM crash).

.... and on long running server processes.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
D

Daniel Pitts

Lothar said:
Daniel Pitts wrote:

[File.createTempFile()]
Unless you call "file.deleteOnExit()", then it will get deleted
automagically (except on JVM crash).

.... and on long running server processes.


Regards, Lothar
True, ideally you would have:
try { handleTempFile(file); } finally {file.delete();}
 
T

Tom Anderson

Lothar said:
Daniel Pitts wrote:

[File.createTempFile()]
Unless you call "file.deleteOnExit()", then it will get deleted
automagically (except on JVM crash).

.... and on long running server processes.

True, ideally you would have:
try { handleTempFile(file); } finally {file.delete();}

Although this still doesn't handle crashes. I think there is a trick you
can do on unix to have files deleted even when the process crashes -
something like create, open, then delete the directory entry, so that the
only reference keeping the file alive is from the open filehandle, which
will die when the process exits - but i don't know if there's a way to use
it from java. Or even that this is definitely correct.

However, by default, createTempFile puts files in java.io.tmpdir, which on
unix machines will typically be /tmp. Files there are subject to deletion
at the whim of the OS, so to an extent, you can delegate the problem of
worrying about deleting files to that.

That said, i'm not sure what current unixen's policies towards /tmp are; i
believe linux will only delete things at reboot, not during normal
operation, which makes this less useful. I used a system (OSF/1?) at some
point that had a /scr, for scratch, which was deleted more aggressively,
which would be ideal for this.

tom
 
M

Martin Gregorie

Although this still doesn't handle crashes. I think there is a trick you
can do on unix to have files deleted even when the process crashes -
something like create, open, then delete the directory entry, so that
the only reference keeping the file alive is from the open filehandle,
which will die when the process exits - but i don't know if there's a
way to use it from java. Or even that this is definitely correct.
Thats correct. Its the standard UNIX idiom for making sure that temporary
files don't outlive the process that created them no matter how it dies.

It should work from Java since its not language-dependent, though of
course its not portable outside outside the *nix world.
However, by default, createTempFile puts files in java.io.tmpdir, which
on unix machines will typically be /tmp. Files there are subject to
deletion at the whim of the OS, so to an extent, you can delegate the
problem of worrying about deleting files to that.
You should attempt to delete them at some stage because there's no
guarantee that the OS will. Its merely a way of guaranteeing that the
tempfile has a unique name no matter how many copies of the process are
running.

A more useful approach would be to start the process(es) from a shell
script or control process whose first action is to delete all temporary
files it finds that are used by the processes it controls: this will be
portable provided the script/control process is portable: no reason it
shouldn't be written in Java or a portable scripting language like Groovy
or Python.
That said, i'm not sure what current unixen's policies towards /tmp are;
i believe linux will only delete things at reboot, not during normal
operation, which makes this less useful.
I'm not certain that temp files are necessarily deleted at boot because
that does slow down crash recovery. Since a file in temp will survive
until its closed, its equally likely that there's a cron job that runs
'rm -rf /tmp/*' sometime after midnight each day. The real caveat is that
no program creating files in /tmp should expect them to be there after it
terminates, i.e. don't pass them to another program started after the
first ends.
 
E

Eric Sosman

Martin said:
Thats correct. Its the standard UNIX idiom for making sure that temporary
files don't outlive the process that created them no matter how it dies.

It should work from Java since its not language-dependent, though of
course its not portable outside outside the *nix world.

You should attempt to delete them at some stage because there's no
guarantee that the OS will. Its merely a way of guaranteeing that the
tempfile has a unique name no matter how many copies of the process are
running.

A more useful approach would be to start the process(es) from a shell
script or control process whose first action is to delete all temporary
files it finds that are used by the processes it controls: this will be
portable provided the script/control process is portable: no reason it
shouldn't be written in Java or a portable scripting language like Groovy
or Python.

I'm not certain that temp files are necessarily deleted at boot because
that does slow down crash recovery. Since a file in temp will survive
until its closed, its equally likely that there's a cron job that runs
'rm -rf /tmp/*' sometime after midnight each day. The real caveat is that
no program creating files in /tmp should expect them to be there after it
terminates, i.e. don't pass them to another program started after the
first ends.

(Marginally topical) On Solaris, the Unix flavor I'm most
familiar with, /tmp is usually mounted on a tmpfs file system.
This is a memory-resident file system to the extent possible,
spilling over into swap space as needed. Nothing special needs
to happen at reboot to "clean out" tmpfs, no more than anything
special needs to happen to "clean out" swap files: The newly-
booted system just initializes its metadata to say "empty," and
everything from prior incarnations is gone.

Also, it's a *very* bad idea to purge /tmp blindly, even if
you're careful only to purge files that haven't been modified
in a while. I recall working with a server application that put
files in /tmp and mmap'ed them to share memory between its multiple
processes. Since simple paging I/O to and from a file opened a
week ago doesn't change the files' modification date, along came
the customer's /tmp-purging cron job and BLOOEY went the server ...
 
M

Martin Gregorie

Also, it's a *very* bad idea to purge /tmp blindly, even if
you're careful only to purge files that haven't been modified in a
while. I recall working with a server application that put files in
/tmp and mmap'ed them to share memory between its multiple processes.
Since simple paging I/O to and from a file opened a week ago doesn't
change the files' modification date, along came the customer's
/tmp-purging cron job and BLOOEY went the server ...
Good point.

When I've needed to do this on Unices I've used the Unix IPC library
functions to hand references to shared memory segments between programs.
But you can't do that in Java AFAIK.

The only place I've used mmap files was on Tandem (now HP) Guardian fault
tolerant systems where an mmapped file was the only possibility because
the sharing processes were almost guaranteed to be on different CPUs with
nothing in common except shared disk. If I used them on a *NIX system
they'd probably need to be persistent data and so would be sitting in
normal directories alongside named pipes.
 
T

Tom Anderson

(Marginally topical) On Solaris, the Unix flavor I'm most familiar
with, /tmp is usually mounted on a tmpfs file system. This is a
memory-resident file system to the extent possible, spilling over into
swap space as needed. Nothing special needs to happen at reboot to
"clean out" tmpfs, no more than anything special needs to happen to
"clean out" swap files: The newly- booted system just initializes its
metadata to say "empty," and everything from prior incarnations is gone.

Aha, interesting. Seems like a good scheme.
Also, it's a *very* bad idea to purge /tmp blindly, even if you're
careful only to purge files that haven't been modified in a while. I
recall working with a server application that put files in /tmp and
mmap'ed them to share memory between its multiple processes. Since
simple paging I/O to and from a file opened a week ago doesn't change
the files' modification date, along came the customer's /tmp-purging
cron job and BLOOEY went the server ...

Hang on, the files were open, right? So how could they be deleted? Or is
the point that the directory entries were deleted, so when new processes
were spawned, they couldn't open the file? And since when did writing to a
file via an mmap not change its modification time, anyway?

Either way, i'd suggest the bad idea here was putting critical long-lived
files in /tmp. Yes, they're temporary, but not that temporary!

tom
 
M

Martin Gregorie

Hang on, the files were open, right? So how could they be deleted?
Some non-*nix OSes (OS/400, VMS?) refuse to accept the delete() operation
if the file is open. Of course its irrelevant for single tasking OSen
(i.e. DOS) and can cause havoc where the 'OS' is a multi-tasking shell
sitting on a single-tasking kernel (Win 3.1 thru ME fit this description).
Or is the point that the directory entries were deleted, so when new
processes were spawned, they couldn't open the file?
This is what I was muttering about. In the *nix family, which includes
VOS, AIX, Linux and Free BSD:

- deleting a file removes the directory entry, reducing the link count
for the file (the link count is the number of directory entries
pointing at the file's inode). A directory entry only holds a name
and an inode reference: everything else (ownership, permissions,
timestamps, pointers to the data blocks) is in the inode.

- the inode and associated storage is only deleted when the link count is
zero and no processes have the file open.
And since when did writing to
a file via an mmap not change its modification time, anyway?
Works OK for Linux and most *nixen, don't know about others.

Either way, i'd suggest the bad idea here was putting critical
long-lived files in /tmp. Yes, they're temporary, but not that
temporary!
Exactly so.

A good precautionary design would clear out unwanted files as its first
action as well as deleting surplus files as its last action.

Its probably also a good idea to store the activity status (starting/
running/clean-up/done) in a permanent file so the process knows whether
its doing a normal start or a restart and, depending on what the program
is doing, it may also be useful to build a list of files to be deleted,
backed up, etc. This type of information makes restarts *much* easier if
you're dealing with high volume, long running applications and is
probably essential if any part of it involves parallel tasks.
 
E

Eric Sosman

(Marginally topical) [...]
Also, it's a *very* bad idea to purge /tmp blindly, even if you're
careful only to purge files that haven't been modified in a while. I
recall working with a server application that put files in /tmp and
mmap'ed them to share memory between its multiple processes. Since
simple paging I/O to and from a file opened a week ago doesn't change
the files' modification date, along came the customer's /tmp-purging
cron job and BLOOEY went the server ...

Hang on, the files were open, right? So how could they be deleted? Or is
the point that the directory entries were deleted, so when new processes
were spawned, they couldn't open the file? And since when did writing to
a file via an mmap not change its modification time, anyway?

(The topicality margin gets even thinner)

You're right: An open file can't be deleted. However, its
directory entry is removed. Then, when the application spawns a
new process and that new process tries to share memory with the
others by opening and mmap'ing the now-unfindable file, BLOOEY.
(When the customer first reported trouble, I immediately asked
whether there was a cron job or some such that periodically purged
old files from /tmp. Customer asserted -- vehemently and a bit
angrily -- that OF COURSE there wasn't. So we cobbled together
some DTrace to monitor file deletions in /tmp, and caught the
non-existent cron job red-handed ...)

As for file modification times, I confess an incomplete grasp
of exactly which operations do and do not update them. However,
just poking a new value into a page that's mmap'ed from a file is
not enough to update the time stamp. Can you imagine the overhead
if every memory write trapped to the kernel to update the time?
Either way, i'd suggest the bad idea here was putting critical
long-lived files in /tmp. Yes, they're temporary, but not that temporary!

It wasn't my choice. It wasn't even my company's choice.
The third party who wrote the application chose to do things that
way, and even went so far as to include "do_not_delete" as part
of the files' names.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,260
Messages
2,571,039
Members
48,768
Latest member
first4landlord

Latest Threads

Top