Multi-threaded access to a file on disk

C

Chris

We have a web app that needs to access a very large file on disk. This app
will have a large number of simultaneous users. The file is too big to fit
entirely in memory.

I'd really like for different threads to be able to read different parts of
the file without synchronizing access. Unfortunately, RandomAccessFile isn't
multithreaded, because you have to call seek() and then read().

At the moment we can't use NIO and ByteBuffers because we have to support
JDK 1.3.

Does anyone know how to solve this problem?
 
S

Steve W. Jackson

:We have a web app that needs to access a very large file on disk. This app
:will have a large number of simultaneous users. The file is too big to fit
:entirely in memory.
:
:I'd really like for different threads to be able to read different parts of
:the file without synchronizing access. Unfortunately, RandomAccessFile isn't
:multithreaded, because you have to call seek() and then read().
:
:At the moment we can't use NIO and ByteBuffers because we have to support
:JDK 1.3.
:
:Does anyone know how to solve this problem?

You don't mention writing, only reading. Are you actually getting
failures in your current attempts to read from multiple threads? I
can't see any reason why it should be hampered unless Java or the
underlying filesystem are somehow preventing it.

= Steve =
 
J

John C. Bollinger

Chris said:
We have a web app that needs to access a very large file on disk. This app
will have a large number of simultaneous users. The file is too big to fit
entirely in memory.

I'd really like for different threads to be able to read different parts of
the file without synchronizing access. Unfortunately, RandomAccessFile isn't
multithreaded, because you have to call seek() and then read().

At the moment we can't use NIO and ByteBuffers because we have to support
JDK 1.3.

Does anyone know how to solve this problem?

If you read-only access to the file is sufficient then you might try
giving each thread its own RandomAccessFile, all associated with the
same physical file. If you try this then be sure to set all the file
modes to specify read-only -- partly for safety's sake, but more because
there's a better chance of it working that way.

If you need to both read and write to the file then you can write a
wrapper around RandomAccessFile that provides appropriate
synchronization, and share an instance of the wrapper among your
threads. If you go this route then I suggest you try to design the
wrapper in such a way that its external interface makes the fewest
possible assumptions about the internal implementation (which is good
general advice anyway). In this case you may want the flexibility to
get an initial implementation written quickly, while being able to drop
in a more sophisticated, better-performing implementation later.


John Bollinger
(e-mail address removed)
 
C

Chris

Steve W. Jackson said:
You don't mention writing, only reading. Are you actually getting
failures in your current attempts to read from multiple threads? I
can't see any reason why it should be hampered unless Java or the
underlying filesystem are somehow preventing it.

The difficulty is the you need to make two calls to RandomAccessFile to read
anything: seek(), and then read(). If you don't synchronize, then one thread
could seek, the next thread could seek, then thread 1 could read, and then
thread 2 could read. Each thread would get the wrong data.
 
A

Ann

Chris said:
The difficulty is the you need to make two calls to RandomAccessFile to read
anything: seek(), and then read(). If you don't synchronize, then one thread
could seek, the next thread could seek, then thread 1 could read, and then
thread 2 could read. Each thread would get the wrong data.
---
This does not sound so expensive to me -- subtract two ints
from the source file randomaccessfile.java
---
seek(newpos);

/* return the actual number of bytes skipped */
return (int) (newpos - pos);
}
---
 
A

Ann

Ann said:
---
This does not sound so expensive to me -- subtract two ints
from the source file randomaccessfile.java
---
seek(newpos);

/* return the actual number of bytes skipped */
return (int) (newpos - pos);
}
---


OOPS! sorry, I goofed, it calls a native method to do the work.
 
S

Steve W. Jackson

::>
:> >:We have a web app that needs to access a very large file on disk. This
:app
:> >:will have a large number of simultaneous users. The file is too big to
:fit
:> >:entirely in memory.
:> >:
:> >:I'd really like for different threads to be able to read different parts
:eek:f
:> >:the file without synchronizing access. Unfortunately, RandomAccessFile
:isn't
:> >:multithreaded, because you have to call seek() and then read().
:> >:
:> >:At the moment we can't use NIO and ByteBuffers because we have to
:support
:> >:JDK 1.3.
:> >:
:> >:Does anyone know how to solve this problem?
:>
:> You don't mention writing, only reading. Are you actually getting
:> failures in your current attempts to read from multiple threads? I
:> can't see any reason why it should be hampered unless Java or the
:> underlying filesystem are somehow preventing it.
:
:The difficulty is the you need to make two calls to RandomAccessFile to read
:anything: seek(), and then read(). If you don't synchronize, then one thread
:could seek, the next thread could seek, then thread 1 could read, and then
:thread 2 could read. Each thread would get the wrong data.

I should've been more clear. My question was whether Java or the
underlying filesystem would allow each of your threads to have their own
RandomAccessFile objects referring to the same physical file (assuming
this is read-only access) without difficulties.

= Steve =
 
R

Richard Wheeldon

Steve said:
I should've been more clear. My question was whether Java or the
underlying filesystem would allow each of your threads to have their own
RandomAccessFile objects referring to the same physical file (assuming
this is read-only access) without difficulties.

Yes. It does allow this. I've done it in the past - on Linux (ext3).
However, there are performance issues to consider. I eventually found
it better to have a caching wrapper with a single RandomAccessFile.
This allowed multiple threads to access cached data concurrently,
but only one thread to read live data at a time. Try it and see - ymmv,

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,054
Latest member
LucyCarper

Latest Threads

Top