is there a better way to optimise this code

S

Steve Leach

I am writing an application where I look for a white pixel by
testing if all the R,G,B values are 255 i.e. I use

if(RGB[0] == 255 && RGB[1] == 255 && RGB[2] == 255) (assuming
RGB is a pointer to unsigned char)

Use XOR rather than a comparison. Much faster (or at least it was back
in the days of the 386 and 486... I assume it still is).

If short int is 2 bytes on your hardware (check, but it probably is) do
the first XOR on a short int at RGB[0] and the second on a char at RGB[2].

This last part might not matter depending on how smart the compiler is
but if you are going sequentially through the entire array then dont use
an array+offset but rather set a pointer to the base before you enter
the loop and increment it.

char test[1024*768*3]; // Just guessing at an image size
unsigned short int * pointer;
unsigned char * limit;
pointer=&test;
limit=pointer+1024*768*3;

do {
if(!((unsigned short int )*pointer^0xffff)
&& !((unsigned char )*(pointer+2)^0xff))
printf("white\n");
pointer+=3;

} } while(pointer<limit);

Other than some warnings from the compiler about wrong pointer type, I
think this should work... check my logic on this as I'm more than a
little rusty to put it lightly.

Oh, and I don't know if this is still true or not with modern processors,
but you might see a benefit from always loading the word (16 bits) from
an even address... if you check the pointer first and then do (if
pointer is even) short int, char, char, short int for each loop (
checking two pixles for each loop) then you will always load a word from
an even address rather than odd. if the base pointer is odd, then start
with char. I don't even know if this still matters though It used to be
that the x86's would load a word from an odd base in two fetches and
from an even in one. Anyone know if this still applies to Pentium and
newer processors?

After all of this if things are STILL too slow, about your only other
option would be to reduce the number of loops by repeating the same
instructions several times in the loop... ie check pixel one, then two,
then three, then four, THEN loop. that way there are 1/4 as many jumps
back to the beginning of the loop.


............................................................
 
D

Dave Vandervies

Arthur J. O'Dwyer said:
I am writing an application where I look for a white pixel by
testing if all the R,G,B values are 255 i.e. I use

if(RGB[0] == 255 && RGB[1] == 255 && RGB[2] == 255) (assuming
RGB is a pointer to unsigned char)

I am absolutely amazed that no one has mentioned the obvious.
You just want to compare one block of unsigned chars to another
block; so write what you mean!

#include <string.h>

unsigned char to_find[3] = {255, 255, 255};

if (memcmp(RGB, to_find, 3) == 0) { ... }

This is the sort of code that gcc loves to optimize, and I
would be surprised if it compared unfavorably to /any/ of the
other proposed solutions, given a modern compiler.

It is not quite that easy. That will find:

{{0, 0, 255} {255, 255, 0}}

and other similar things. You have to impose the modulo 3 on the
results (which may well not be 3, depending on how things have
been declared).

I haven't had my coffee yet today, so you could be right, but your
comment looks wrong to me.

Arthur's code is comparing the array-of-three-bytes pointed to by RGB
against a pointer to three bytes that all have value 255; if this code
is run for each pixel with RGB pointing at the RGB values for that pixel,
it should do The Right Thing. It's not trying to walk through the whole
pixel array and look for any string of three consecutive 255s.


dave
 
D

Dan Pop

In said:
I am writing an application where I look for a white pixel by testing
if all the R,G,B values are 255 i.e. I use

if(RGB[0] == 255 && RGB[1] == 255 && RGB[2] == 255) (assuming RGB is a
pointer to unsigned char)

This statement gets executed for all the pixels in a page, so if I can
find a better way to do this, I could potentially save a lot of cpu
cycles.

Is there also an unused RGB[3], so that each pixel is properly aligned
for a 32-bit access? If yes, you could use a single integer comparison
for each pixel value:

typedef whatever_appropriate uint32t;

uint32t *p = (uint32t *)RGB, white = 0xffffffff;
((unsigned char *)&white)[3] = 0; /* clear the last byte of white */

if ((*p & white) == white) /* the pixel is white */ ;

Dan
 
B

bogonic

if(RGB[0] == 255 && RGB[1] == 255 && RGB[2] == 255) (assuming
Use XOR rather than a comparison.

Use a XOR, and you'll *still* need to do a comparison - an implicit
comparison with zero.
If short int is 2 bytes on your hardware (check, but it probably is) do
the first XOR on a short int at RGB[0] and the second on a char at RGB[2].

<off-topic>

Trying to read a word from an address that isn't a multiple of its size
will seriously degrade performance on x86, and, on processors such as the
Sparc or the Alpha, will make your program die an horrible death,
screaming "Bus Error".

You're talking about code like

uint8_t *foo;
uint16_t bar = (uint16_t *)foo;

If 'foo' is at an even address, everything will be fine (on most
architectures). But it doesn't need to be; and in this particular
scenario it *won't* be, in 2/3 of the cases.

Incidentally, the code I posted to this very thread suffers from
the same problem; it assumes the array passed to the function starts
at a word-aligned address. It also assumes 32-bit words. Given that,
and a decent compiler, it will do just three memory fetches for
every four pixels, as opposed to twelve fetches for char pointer
walk. But it's definitely not production quality, and maybe I should've
thought twice before posting it.

Yes, this small discussion was off-topic, since the C programming
language has no concept of addresses and words.

This last part might not matter depending on how smart the compiler is
but if you are going sequentially through the entire array then dont use
an array+offset but rather set a pointer to the base before you enter
the loop and increment it.

Any compiler worth its salt will convert an index walk to a pointer
walk.
 
S

Steve Leach

Use a XOR, and you'll *still* need to do a comparison - an implicit
comparison with zero.

Ahh, I see your point that it will still have a conditional jump but an
XOR followed by a conditional jump still uses fewer clock cycles than a
CMP followed by said jump. Though now that I think about it, the
compiler SHOULD be smart enough to recognize this and compile the XOR
anyway since he is comparing to 0xFF.
Trying to read a word from an address that isn't a multiple of its
size will seriously degrade performance on x86, and, on processors
such as the Sparc or the Alpha, will make your program die an horrible
death, screaming "Bus Error".

I wasn't aware of the horrible death potential, but keeping alignment
even to avoid the double fetch is why I suggested checking the pointer
offset at the outset (though he could just declare the array as short
int and the compiler would always place it at an even base) and doing
either short-char-char-short or char-short-short-char depending.

Incidentally [and all appologies for straying so far off topic here], I
wonder if this is still the case on intel for 386 and later since the
data bus is 32 bits rather than 16?
Incidentally, the code I posted to this very thread suffers from
the same problem; it assumes the array passed to the function starts
at a word-aligned address. It also assumes 32-bit words. Given that,
and a decent compiler, it will do just three memory fetches for
every four pixels, as opposed to twelve fetches for char pointer
walk. But it's definitely not production quality, and maybe I
should've thought twice before posting it.

If performance in this loop is important enough that would still be a
valid option, he would have to have the program check the type size at
the outset and then test to see if it is on a little or big endian
machine and adjust accordingly... possibly even having a little endian
and big endian version of the routine and setting a function pointer if
it is called repeatedly from a loop.
Yes, this small discussion was off-topic, since the C programming
language has no concept of addresses and words.

Ooops, all appologies for the stray into assembly land :)


............................................................
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top