File copy through RMI

S

Sri Ramaswamy

I need to perform a remote file copy on large files (15-20MB).
I have a handle to a remote object which has access to
inputStreams corresponding to these files. The code that
exposes the the getInputStream() method is part of a 3rd
party library and actually returns a ByteArrayInputStream.
Since I cannot pass back the input stream through RMI,
I am reading the input stream, creating an array of bytes,
returning this byte array through RMI and writing these
bytes to a file, on the other side.

Here are the problems with this approach -

1) I am not sure how this library implements the getInputStream()
method but the jvm crashes with a java.lang.OutOfMemoryError
if the heap size is less than 64MB.

2) This process is very slow.

If you have any thoughts/suggestions on a better approach that
reduces the memory requirement and can speed up this operation,
I would really appreciate it.

Thank you,
Sri Ramaswamy
 
J

John C. Bollinger

Sri said:
I need to perform a remote file copy on large files (15-20MB).
I have a handle to a remote object which has access to
inputStreams corresponding to these files. The code that
exposes the the getInputStream() method is part of a 3rd
party library and actually returns a ByteArrayInputStream.
Since I cannot pass back the input stream through RMI,
I am reading the input stream, creating an array of bytes,
returning this byte array through RMI and writing these
bytes to a file, on the other side.

Here are the problems with this approach -

1) I am not sure how this library implements the getInputStream()
method but the jvm crashes with a java.lang.OutOfMemoryError
if the heap size is less than 64MB.

That's not surprising. Consider that the ByteArrayInputStream contains
the data in an internal byte[]. All of the data. You then read it and
create a new byte[] with the same content. You thus have a minimum of
30MB tied up in just those two arrays.
2) This process is very slow.

That's not surprising either. The third-party library you are using is
not well designed for efficient large-file handling. I don't quite know
what purpose it serves to use a ByteArrayInputStream in that
circumstance -- perhaps there is some reason related to the library's
goals and intended function -- so I'll reserve judgment on the propriety
of that design. Nevertheless, it is time consuming to read 15 MB from
disk into memory. It is time consuming to create a copy of a 15MB
array. Some implementations of both of these functions are more time
consuming than others. Both may be slowed further by requiring full GC
passes to occur during their execution. Transmitting the whole thing
over RMI in one transmission is slow. Such action involves creating a
copy on the remote side, which may also require expensive GC on that
side. Then there is also the question of how efficiently file I/O is
performed on each side.
If you have any thoughts/suggestions on a better approach that
reduces the memory requirement and can speed up this operation,
I would really appreciate it.

Consider whether you can do without the particular third-party library
you're using. Putting a whole multimegabyte file into one byte[] is to
be avoided if at all possible, whether in your own code or in library code.

Consider sending the file using direct socket I/O (with buffered
streams) instead of RMI.

If you must use RMI then come up with a mechanism for sending the data
in smaller chunks. This is possible by changing the design of your
remoteable class.

Whether or not you use RMI, never make an in-memory copy of the whole
file -- instead, read it in small chunks, say 4K or 8K at a time.
Combine this with whichever of socket I/O or RMI you end up using for
sending the data to the far side.


John Bollinger
(e-mail address removed)
 
B

booxplode

Sri said:
I need to perform a remote file copy on large files (15-20MB).
I have a handle to a remote object which has access to
inputStreams corresponding to these files. The code that
exposes the the getInputStream() method is part of a 3rd
party library and actually returns a ByteArrayInputStream.
Since I cannot pass back the input stream through RMI,
I am reading the input stream, creating an array of bytes,
returning this byte array through RMI and writing these
bytes to a file, on the other side.

Here are the problems with this approach -

1) I am not sure how this library implements the getInputStream()
method but the jvm crashes with a java.lang.OutOfMemoryError
if the heap size is less than 64MB.

2) This process is very slow.

If you have any thoughts/suggestions on a better approach that
reduces the memory requirement and can speed up this operation,
I would really appreciate it.

Thank you,
Sri Ramaswamy

If you use a single RMI call passing a byte array
that's not ideal because it will require your entire
file to be in memory in both processes. I'd use
multiple RMI calls instead of just one, each one
with a maximum buffer size of, say, 8Kb.

--Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top