What's the best/fastest way to access image data

G

G.W. Lucas

I have an application that performs some specialized image-processing
which is simple, but not supported by the JAI or other Java API. I
pull in data from an existing image, process it, and store it back in
a new image.

For my application, I am using the Java BufferedImage class for images
that are about 2 megapixels in size. The processing is requiring about
250 milliseconds, which isn't bad, though I my application will be
processing a LOT of images and is user-interactive, so I'd like to
trim that if I can.

Anyway, I added some more instrumentation to my time measurements and
realized that of that 250 milliseconds, 200 or so was happening in the
BufferedImage.getRGB() method that I was using to extract the raw data
from the image:

long time0 = System.currentTimeMillis();
int w = image.getWidth();
int h = image.getHeight();
int n = w * h;
int[] rgb = image.getRGB(0, 0, w, h, new int[n], 0, w);
long time1 = System.currentTimeMillis();
long accessTime = time1-time0;

I found the fact that the access took so much longer than my own
processing kind of a surprise... Ordinarily, when I see an unexpected
result like this, it's usually a indication that I'm doing the wrong
thing or using the wrong tool.

So, I was wondering if I might be using the wrong approach in pulling
out the raw data. Perhaps BufferedImage isn't even the right class to
use? I've read the API document on the many Java image classes and
find the nuances of "which one to use when" to be rather non-obvious.

Could anyone point me in the direction of the best way to do this? Is
there a web page that provides information about the design of the
image classes that might clarify this issue.

Thanks for your help.

Gary
 
M

markspace

G.W. Lucas said:
Anyway, I added some more instrumentation to my time measurements and
realized that of that 250 milliseconds, 200 or so was happening in the
BufferedImage.getRGB() method that I was using to extract the raw data
int[] rgb = image.getRGB(0, 0, w, h, new int[n], 0, w);


I'm not sure, but several of the Java image APIs are asynchronous. They
return immediately and load in the background. You're getting the whole
image here so it's possible that read() is waiting on IO to complete, I
suppose.

Can you produce a complete example? I realize that you can't really
send an image, but if you could at least post the source that reproduces
this we could take a look at it.
 
G

G.W. Lucas

[snip] Can you produce a complete example?


Sure. The code follows. The actual application is generating content
based on data analysis performed in background threads (behind the
scenes) and storing it in BufferedImage objects. There are a number of
different kinds of analysis routines running (some of them quite
complicated). At run time, several images may get overlaid on top of
each other to make make one composite. This, the images are declared
TYPE_INT_ARGB when the application creates them. So far, I have been
favorably impressed by Java's performance.

Anyway, you can see where I don't think that asynchronous behavior is
the issue, though I could certainly be wrong.

This snippet of code simulates the conditions under which I am
observing the performance issues. Running this on my computer, I
observed 200 millisecond access time. I set the command-line memory
option to -Xmx512m to ensure plenty of memory for the test.

Hope this helps.

Gary



import java.awt.image.BufferedImage;

/**
*
*/
public class TimeTest {

public static void main(String args[]) {
TimeTest test = new TimeTest();
for(int i=0; i<5; i++)
test.run();
}

public void run() {
BufferedImage image = new BufferedImage(
2000,
1500,
BufferedImage.TYPE_INT_ARGB);

// Diagnostic graphics operations to exercise the
// idea that some work had been done before we tried
// extracting the contents of the image. In testing,
// doesn't seem to make much difference.
//Graphics2D g2d = image.createGraphics();
//g2d.setColor(Color.ORANGE);
//g2d.drawRect(0, 0, 500, 600);
//g2d.dispose();
//try {
// Thread.sleep(1000);
//} catch (InterruptedException ex) {
//}

long time0 = System.currentTimeMillis();
int w = image.getWidth();
int h = image.getHeight();
int n = w * h;
int[] rgb = image.getRGB(0, 0, w, h, new int[n], 0, w);
long time1 = System.currentTimeMillis();
long accessTime = time1 - time0;
System.out.println("Access time " + accessTime);
System.out.flush();
}
}





G.W. Lucas said:
Anyway, I added some more instrumentation to my time measurements and
realized that of that 250 milliseconds, 200 or so was happening in the
BufferedImage.getRGB() method that I was using to extract the raw data
        int[] rgb = image.getRGB(0, 0, w, h, new int[n], 0, w);

I'm not sure, but several of the Java image APIs are asynchronous.  They
return immediately and load in the background.  You're getting the whole
image here so it's possible that read() is waiting on IO to complete, I
suppose.

Can you produce a complete example?  I realize that you can't really
send an image, but if you could at least post the source that reproduces
this we could take a look at it.
 
M

markspace

G.W. Lucas said:
This snippet of code simulates the conditions under which I am
observing the performance issues. Running this on my computer, I
observed 200 millisecond access time. I set the command-line memory
option to -Xmx512m to ensure plenty of memory for the test.
int[] rgb = image.getRGB(0, 0, w, h, new int[n], 0, w);


This line of code calls getRBG(), which basically does this (cut and
paste from the source code itself):

for (int y = startY; y < startY+h; y++, yoff+=scansize) {
off = yoff;
for (int x = startX; x < startX+w; x++) {
rgbArray[off++] = colorModel.getRGB(
raster.getDataElements(x,y,data));
}
}

Copying data like this is never going to be fast. You want the internal
buffer itself, probably, not a copy, so your operations on the data will
be fast.

So I think the trick is to use a different constructor, so that you
already have access to the internal argb array. There's no way to just
get a pointer to it in the BufferedImage API, that I can see.

Now: are you loading data from disk? Or are you creating the data
wholesale inside your program, as you example seems to imply? The
answer is different depending on what you are doing.
 
J

John B. Matthews

markspace said:
G.W. Lucas said:
This snippet of code simulates the conditions under which I am
observing the performance issues. Running this on my computer, I
observed 200 millisecond access time. I set the command-line memory
option to -Xmx512m to ensure plenty of memory for the test.
int[] rgb = image.getRGB(0, 0, w, h, new int[n], 0, w);


This line of code calls getRBG(), which basically does this (cut and
paste from the source code itself):

for (int y = startY; y < startY+h; y++, yoff+=scansize) {
off = yoff;
for (int x = startX; x < startX+w; x++) {
rgbArray[off++] = colorModel.getRGB(
raster.getDataElements(x,y,data));
}
}

Copying data like this is never going to be fast. You want the
internal buffer itself, probably, not a copy, so your operations on
the data will be fast.

So I think the trick is to use a different constructor, so that you
already have access to the internal argb array. There's no way to
just get a pointer to it in the BufferedImage API, that I can see.

Now: are you loading data from disk? Or are you creating the data
wholesale inside your program, as you example seems to imply? The
answer is different depending on what you are doing.

It should be possible to operate on the Raster directly:

<http://sites.google.com/site/drjohnbmatthews/raster>

Of course, that still leaves leaves 2000 x 1500 pixels work on.
 
M

markspace

John said:
It should be possible to operate on the Raster directly:

<http://sites.google.com/site/drjohnbmatthews/raster>

Of course, that still leaves leaves 2000 x 1500 pixels work on.


I came up with the code below as a direct replacement for the OP's
example. However, I'm not really sure if this is what he wants or not.
It does run 10x faster than his example.


public class Main
{
public static void main( String[] args )
{
for( int i = 0; i < 5; i++ ) {
long startTime = System.nanoTime();

SinglePixelPackedSampleModel neoSPPSM =
new SinglePixelPackedSampleModel(
DataBuffer.TYPE_INT, 2000,
1500,
new int[]{0xFF0000, 0xFF00, 0xFF, 0xFF000000} );
int[] rawARGB = new int[2000 * 1500];
DataBufferInt neoDBI = new DataBufferInt( rawARGB, 2000 *
1500 );

WritableRaster neoWR =
Raster.createWritableRaster( neoSPPSM, neoDBI,
null );
DirectColorModel dcm = new DirectColorModel( 32, 0xFF0000,
0xFF00,
0xFF, 0xFF000000 );
BufferedImage image2 = new BufferedImage( dcm, neoWR,
false, null );
long endTime = System.nanoTime();
System.out.println( "Loop time (" + image2.hashCode() +
"): " +
(endTime - startTime) / 1000000 );
}
}
 
G

G.W. Lucas

Thank you. That's truly impressive. Of course, now I have about six
new things to learn :)

Looking at your earlier post, I can see where invoking getRGB 3
million times might result in sub-optimal performance. I must say that
I'm a little surprised to see something like that in a core API. But
it explains a lot.

I modified my test program based on your example, drawing a few
primitives to the BufferedImage after it was created and inspecting
the contents of the rawARGB array to see if it changed. It worked
like a charm. And, as you say, the speed improvement is easily a
factor of 10.

I do have a question. From your code, it looks like the trick is to
supply the BufferedImage constructor with the memory that you want it
to write to so that you don't have to ask for it later on. That makes
sense. The thing I was wondering about is if the resulting
BufferedImage will have the same performance as the ones which I am
currently creating with the less-advanced constructor. The real core
function of my application is in the rendering of graphics (using
Graphics2D) to produce the raw images. The rendering involves a lot of
graphics primitives, so performance is critical. The image processing
that I am doing is really just an extra.

Naturally, I'm going to do my homework (read up on the API elements
you used and also do a lot of testing) before I commit to an approach,
but I was wondering whether I should be alert to any special tricks or
techniques as I do so.

Thanks again for your well-informed and insightful suggestion!

Gary

John said:
It should be possible to operate on the Raster directly:

Of course, that still leaves leaves 2000 x 1500 pixels work on.

I came up with the code below as a direct replacement for the OP's
example.  However, I'm not really sure if this is what he wants or not.
  It does run 10x faster than his example.

public class Main
{
     public static void main( String[] args )
     {
         for( int i = 0; i < 5; i++ ) {
             long startTime = System.nanoTime();

             SinglePixelPackedSampleModel neoSPPSM =
                     new SinglePixelPackedSampleModel(
                     DataBuffer.TYPE_INT, 2000,
                     1500,
                     new int[]{0xFF0000, 0xFF00, 0xFF, 0xFF000000} );
             int[] rawARGB = new int[2000 * 1500];
             DataBufferInt neoDBI = new DataBufferInt( rawARGB, 2000 *
                     1500 );

             WritableRaster neoWR =
                     Raster.createWritableRaster( neoSPPSM, neoDBI,
                     null );
             DirectColorModel dcm = new DirectColorModel( 32, 0xFF0000,
                     0xFF00,
                     0xFF, 0xFF000000 );
             BufferedImage image2 = new BufferedImage( dcm, neoWR,
                     false, null );
             long endTime = System.nanoTime();
             System.out.println( "Loop time (" + image2.hashCode() +
                     "):  " +
                     (endTime - startTime) / 1000000 );
         }

}
 
M

markspace

G.W. Lucas said:
The thing I was wondering about is if the resulting
BufferedImage will have the same performance as the ones which I am
currently creating with the less-advanced constructor.


I honestly don't know. I believe it will have the same performance,
because I looked at the constructors for BufferedImage and related
classes, and basically just did the same thing as they do.

However, I haven't tested this yet, so you're going to be the first. If
you don't see any speed drop right away, then I'm going to guess there
there won't be any, because as I said I'm just doing the same thing that
Sun's API does.

Please do report back on your findings if you can. I'm interested if
this technique is general and will work for other folks.
 
G

G.W. Lucas

G.W. Lucas said:
The thing I was wondering about is if the resulting
BufferedImage will have the same performance as the ones which I am
currently creating with the less-advanced constructor. [snip]

Please do report back on your findings if you can.  I'm interested if
this technique is general and will work for other folks.

Fortunately, I've always been interested in performance considerations
(who isn't?), so I've got plenty of timing instrumentation
already in place for my application. I was
able to run the program a dozen or so times alternating between
the different constructors. Although the sample set was pretty small,
I'm pretty sure there are no statistically significant difference
in the time required to build images (for a while there, I actually
thought the WritableRaster constructor might have an edge, but
that was just a result of a noisy test environment).

Thanks again for all your help.

g.
 
J

John B. Matthews

"G.W. Lucas said:
G.W. Lucas said:
The thing I was wondering about is if the resulting BufferedImage
will have the same performance as the ones which I am currently
creating with the less-advanced constructor. [snip]

Please do report back on your findings if you can.  I'm interested
if this technique is general and will work for other folks.

Fortunately, I've always been interested in performance
considerations (who isn't?), so I've got plenty of timing
instrumentation already in place for my application. I was able to
run the program a dozen or so times alternating between the different
constructors. Although the sample set was pretty small, I'm pretty
sure there are no statistically significant difference in the time
required to build images (for a while there, I actually thought the
WritableRaster constructor might have an edge, but that was just a
result of a noisy test environment).

Thanks again for all your help.

Interesting; thank you for reporting your results. I missed the import
of markspace's suggestion: it was predicated on the reasonable
assumption that array access would be faster than method invocation. My
experience with mixing WritableRaster and Graphics2D operations is that
the latter tend to dominate and the former are fast enough. Still, 'it's
interesting to see how to construct a BufferedImage with one's own data
buffer.
 
M

markspace

I'd also like to thank you for reporting these results. It's good to
know that there weren't some other gotchas waiting for you further down
the road.
Interesting; thank you for reporting your results. I missed the import
of markspace's suggestion: it was predicated on the reasonable
assumption that array access would be faster than method invocation.


Not quite. I was basing by prediction on the idea that buffer copies
should be avoided. In other words, the method call I focused on didn't
just use setters and getters, it copied the entire 3,000,000 word pixel
buffer, before handing the copy to the caller. This also means that the
3,000,000 buffer gets allocated twice. Once when the BufferedImage is
created, and once again when the OP had to allocate a second buffer.

Both of these operations are avoided in the code I posted. The buffer
is allocated once, and never copied.

Modern CPUs impose a high penalty for large numbers of consecutive reads
and writes. In typical algorithm analysis, all reads and writes are
assumed to be the same value. However, this doesn't work for long
strings of consecutive reads and writes, because they can't be cached,
and therefore don't benefit from locality of access the way that other
reads and writes do.

In other words, most memory access have a lower amortized access time,
due to locality and the CPU cache. Memory that is access precisely once
doesn't benefit from this amortized time, and has to pay the full cost
of a cache miss, main-memory access, and then the eventual main-memory
write. A long string of such accesses is particularly painful.

If that's all too much to remember, then just remember that "buffer
copies are bad" and go with that.
My
experience with mixing WritableRaster and Graphics2D operations is that
the latter tend to dominate and the former are fast enough. Still, 'it's
interesting to see how to construct a BufferedImage with one's own data
buffer.


It would be interesting to compare the method calls in a writable raster
with direct buffer access, like the OP was doing. I suspect they are
similar and the performance hit using method calls vs. direct access
isn't as large as most folks would believe. However, the OP wanted a
raw array, so that's what I gave him.
 
D

Daniel Pitts

G.W. Lucas said:
G.W. Lucas said:
The thing I was wondering about is if the resulting
BufferedImage will have the same performance as the ones which I am
currently creating with the less-advanced constructor. [snip]
Please do report back on your findings if you can. I'm interested if
this technique is general and will work for other folks.

Fortunately, I've always been interested in performance considerations
(who isn't?), so I've got plenty of timing instrumentation
already in place for my application. I was
able to run the program a dozen or so times alternating between
the different constructors. Although the sample set was pretty small,
I'm pretty sure there are no statistically significant difference
in the time required to build images (for a while there, I actually
thought the WritableRaster constructor might have an edge, but
that was just a result of a noisy test environment).

Thanks again for all your help.

g.
I hope you *also* use a profiler :)
 
A

alexandre_paterson

I have an application that performs some specialized image-processing
which is simple, but not supported by the JAI or other Java API.

And I take it it's not supported either by 3D hardware-accelerated
APIs? Because if it's supported by such hardware, then you can
bypass the entire software stack and the gains are expressed in
orders of magnitude (even when called from Java :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top