suggestions for optimization loading of int array from disk

J

jonbbbb

Hello,

I was just wondering if there would be any suggestions for getting the
following scenario to run faster.

I have a program that loads some data from disk as a byte array.
This byte data is actually a quite large list of int that I want to
use.
So I first use read(byte[] b) to fill the byte array, then I fill the
int array by going through
the byte array and use some byte shifting to get 4 bytes to an int.

If this was a C program I could just read it as a byte array, and cast
it to a int array without
going through the painful loop of actually converting each int,
right?

I suppose there is no way around it in Java. Would it make sense to
write this as a C and
use JNI to get it back into Java. Any other ideas?

Thanks.

Regards,
Jon.
 
P

Philipp

Hello,

I was just wondering if there would be any suggestions for getting the
following scenario to run faster.

I have a program that loads some data from disk as a byte array.
This byte data is actually a quite large list of int that I want to
use.
So I first use read(byte[] b) to fill the byte array, then I fill the
int array by going through
the byte array and use some byte shifting to get 4 bytes to an int.

If this was a C program I could just read it as a byte array, and cast
it to a int array without
going through the painful loop of actually converting each int,
right?

I suppose there is no way around it in Java. Would it make sense to
write this as a C and
use JNI to get it back into Java. Any other ideas?

You can read ints directly from the InputStream using
DataInputStream.readInt(). Don't know if that is faster though.
HTH Phil
 
A

Arne Vajhøj

jonbbbb said:
I was just wondering if there would be any suggestions for getting the
following scenario to run faster.

I have a program that loads some data from disk as a byte array.
This byte data is actually a quite large list of int that I want to
use.
So I first use read(byte[] b) to fill the byte array, then I fill the
int array by going through
the byte array and use some byte shifting to get 4 bytes to an int.

If this was a C program I could just read it as a byte array, and cast
it to a int array without
going through the painful loop of actually converting each int,
right?

I suppose there is no way around it in Java. Would it make sense to
write this as a C and
use JNI to get it back into Java. Any other ideas?

DataInputStream over BufferedInputStream over FileInputStream
and readInt would both be readable code and reasonable fast,
but it does assume the int's are stored in big endian (network order).

Arne
 
L

Lew

According to jonbbbb:
Thomas said:
I suggest that you concentrate on writing code which performs the
desired computation correctly. _Then_ check that it runs "fast enough".
If it does not, see if the byte-to-int file reading is the bottleneck.
And at that point _only_ does it become a reasonable idea to fiddle with
JNI or other such trickeries.

If you do decide to fiddle with such trickeries, java.nio may help.

The ByteBuffer class has some mechanisms to allow mapping of bytes to
other data types, and the ByteOrder class to help it determine
platform endianness.
<http://java.sun.com/javase/6/docs/api/java/nio/ByteBuffer.html>
<http://java.sun.com/javase/6/docs/api/java/nio/ByteOrder.html>

MappedByteBuffer might help with performance.
<http://java.sun.com/javase/6/docs/api/java/nio/MappedByteBuffer.html>

As Thomas correctly advises, turn to these things only in the event of
demonstrable need.
 
T

Tom Anderson

I have a program that loads some data from disk as a byte array.
This byte data is actually a quite large list of int that I want to
use.

Is it just ints, or is it mixed in with other stuff?
So I first use read(byte[] b) to fill the byte array, then I fill the
int array by going through
the byte array and use some byte shifting to get 4 bytes to an int.

If this was a C program I could just read it as a byte array, and cast
it to a int array without going through the painful loop of actually
converting each int, right?

I suppose there is no way around it in Java. Would it make sense to
write this as a C and use JNI to get it back into Java.

Hell no.
Any other ideas?

Have a look in the java.nio package. There you will find a class called
ByteBuffer, which is a thing which holds a big load of bytes, and one
called IntBuffer, which does the same for ints. You will also find, in
java.nio.channels, some classes which can be used to read buffers from
disk; primarily FileChannel, but also Channels, which has a
newChannel(InputStream) method that you can use if you need to
interoperate with java.io streams.

If you now look again at ByteBuffer, you will see that it has a method
asIntBuffer, which makes an IntBuffer which is really a view on the
ByteBuffer - exactly like your evil cast in C.

Put all these bits together, and you have a clean, easy and safe way of
reading your file and getting access to it as ints.

Here's a little demo:

import java.nio.ByteBuffer;
import java.nio.IntBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.ReadableByteChannel;
import java.io.FileInputStream;
import java.io.IOException;

public class IntFile {
public static void main(String... args) throws IOException {
ReadableByteChannel chan = new FileInputStream(args[0]).getChannel();
ByteBuffer buf = ByteBuffer.allocate(1024 * 1024);
chan.read(buf);
buf.flip();
IntBuffer ibuf = buf.asIntBuffer();
while (ibuf.hasRemaining()) {
System.out.println(ibuf.get());
}
chan.close();
}
}

You can actually make it potentially even better than this, by using
FileChannel's map method, which memory-maps the file in as a buffer. That
avoids having to explicitly read it at all.

Mind you, doubt all of this is faster than just using
DataInputStream.readInt if you only need sequential access.

tom
 
C

cbossens73

I have a program that loads some data from disk as a byte array.
This byte data is actually a quite large list of int that I want to
use.

How big and how often do you need to read it?

I've got an application that persists and reads
back a file that is at least 100 MB big and can
go up to several hundreds MB (and it needs to
stay in memory all at once).

I'm simply using DataInputStream's readLong() method
and perfs aren't an issue and as Tom stated it,
I doubt too that for sequential read you can be
much faster than that (you'll be I/O bound anyway).

I've got control over the file format. The first
data in the file contains infos such as how many longs
I'll need to read, checksums, version number, etc.
then I just loop doing readLong().

If that is something you only need to do occasionnally,
perfs aren't going to be an issue.

If your file has to fit at once in memory, then
there's no much point IMHO in reading it by chunck:
you'll need it anyway entirely in memory.

Why no read your bytes as ints, in a int[], then
re-order/shift the bytes in place, in the int[]?

I mean, if the first four bytes are FF FE FD and
FC, no matter the endianness, I'll always be able
to trivially re-arrange my int[] so that my ints
are correct.

YMMV but I guess that if you're talking about
"some byte shifting to get 4 bytes to an int"
the bytes are indeed aligned and you have
a way to determine which kind of shift you need.

If your int[] needs to in memory at once loop
doing readInt() and do your "byte shifting to
get 4 bytes to an int" [sic] in place,

Charles
 
R

Roedy Green

I have a program that loads some data from disk as a byte array.
This byte data is actually a quite large list of int that I want to
use.

If the file is actually a list of little-endian ints, use
LEDatainputStream. See
http://mindprod.com/products1.html#LEDATASTREAM.

If is a a list of big endian ints, use DataInputStream.

Use a whacking huge buffer.

If you enjoy hair shirts and self flagellation, try nio. It should be
the fastest. Try the other techniques first. They will take very
little work to try.

See http://mindprod.com/jgloss/nio.html


--
Roedy Green Canadian Mind Products
http://mindprod.com

"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
~ Charles Darwin
 
J

jonbbbb

[..]

Thanks for all your inputs. Much appriciated!
I got some ideas to try out and see what works.

:)

Jon.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,058
Members
48,769
Latest member
Clifft

Latest Threads

Top