First, it's not at all clear what you're trying to speed
up, nor what you have measured and found to be too slow. A
Hi Eric,
I'm looking for advice/pointers on the operators mentioned above. Its
not a case of anything being too slow : its a question of ensuring or
approaching optimum performance as early as possible. I did a lot of
this stuff at assembler level years ago when writing high performance
vga libraries, but want, if possible to leave it all in C now.
From my opening paragraph :
shift & rotate operations : left & right shift operations
anding : bitwise and operations
oring : bitwise or operations
I am writing some small libraries for generating image masks,
transparancy masks and edge silhouettes. Some more of the details are below.
"generic" optimization is often ill-directed; optimizing "on
general principles" is often ill-advised. If you have a specific
operation you'd like to speed up, it'd be a good idea to show
the actual code you're currently using, along with measurements
of its current speed and an estimate of the amount of speedup
you simply MUST achieve to make the code useful.
I'm looking for general guidelines : nothing spectacuarly
specific. Although you do give some good guidlines, some of which had
crossed my mind already, but certainly not all.
As to finding the first non-zero bit in an array of 64 ints,
the first thing to do is nail down what you mean by "first," lest
endianness issues pollute the solutions. Having done that, you
Thats in the details : but here we can assume that the sequence is
high bit == left most pixel. Also that any colour issues are segregated
into 3 or more color planes : initial code can assume a monochrome
bitmap.
I dont really think endian comes into it since the operations will
be taken care of at the word level. Even if it wasnt monochrome then
the most specific X bits would be leftmost pixel where X is the color
depth : in this case I would see the only code issues being that of
increasing the bit width of the test mask and the number of shifts
done in order to check the next pixel.
(Just to clarify the endian thing : all work is in memory buffers and
is in no way related to video HW addressing.)
should also decide what form the answer should take -- it may be
simpler (or harder) to get a bit mask than a bit index, and you
may benefit by adjusting the rest of the code to use the more
easily obtained answer.
Thats something to think about ok. Although I always considered just
anding and shifting 0x80000000 until a non zero result was
indicated. The mask can be generated from the bit by subtracting one.
(e.g 0x4000000-1 -> 0x3fffffff) and readding the bit.
After left pixel is detected we detect right most and if we are
creating a mask of words for rest of scanline its then trivial.
If one-bits are expected to be fairly sparse, you'll probably
want to search through largish "chunks" to find one that isn't
all zeroes, then search within that chunk for the proper bit.
You could do the search an `int' at a time with an ordinary loop,
or you might use memchr() to search byte-by-byte. The memchr()
method might (or might not) take longer for the search, but note
that it leaves a smaller space for the bit-by-bit search to explore.
You'd need to implement both and measure.
Yes, check word for 0 before looking for a set bit : simple enough &
very practical operation with no real overhead at time of word
read. I dont expect the images to be sparse so I'm not sure its
worth the overhead of calling a library to find it : considering left
most pixel detection: so In C, this seems fairly optimal to me (and
this type of thing I'm looking for advice on)
(early caveat : all pseudo c code assuming 32bit word for clarity and not
getting lost in type sizes)
bitmap =0;
while(columnsToCheck-- && !(bitmap = *imagePtr++));
if(bitmap){
/* now find leftmost pixel */
}
should be optimal enough?
If you've used memchr() and have located a non-zero byte, it
memchr() would certinaly not be on the agenda I think : it needs a
character to compare whereas the implicit recognition of "zero"
or "non zero" is faster. Also, I read in words (properly aligned of
course) and not bytes : bytes would certainly strangle the application.
might make sense to use a precalculated 256-element table to finish
the job without further "computation." Then again, it might not:
Cou you expand on that table bit a little : I dont know about this. Is
it something CPU specific?
modern CPU's are much faster than memory, and might be able to
complete a multi-step computation in less time than a memory fetch.
More measurements are needed.
If you've searched int-by-int you could dive directly into
the second search, or you could first use a couple of comparisons
to decide which of the four bytes holds the "first" non-zero and
then proceed as if memchr() had located that byte. Which will be
faster? Measure, measure, measure.
If you need a bit index and don't want to use a table, the
obvious test-and-shift-and-count approach is probably best. If
you need a mask as opposed to an index, the `x & (x - 1)' trick
Could you expand on that? I would see it as simple as
(find left pixel mask)
/* we know the bitmask is non zero so a bit check is a must*/
for(bit=0x8000000;!(bit&bitmap);bit>>=1);
/*word bit mask protects all bits from leftmost pixel on*/
leftWordMask = bit | (bit-1);
may be faster. Measure, measure, -- is there an echo in here?
Ive used a few profilers on Vax, Convex and on MS : but will be using
gcov for initial run time analysis on linux - but I'm not sure if its
what I want for real execution time profiling. But I'm certainly
looking for "C hints" on faster ways with bitmaps first : then I can
decide on a design approach : I think the code above (silly errors not
withstanding), appears to be relatively quick and the core "loop" areas are
small enough where I would expect their execution footprint to get cached.
Finally, an impression of the two Web pages you cited. A
quick glance suggests they're filled with well-intentioned and
sometimes reasonable advice, but ALL of it is out of context.
Yes & no : it has some general advice on standard stuff which can help
: alignment, powers of two etc. It even discusses use of machine words
at some stage instead of smaller units which can cause excessive
overhead I think. Certainly helpful to anyone embarking on optimizing
anything in C for the first time.
(Necessarily so, of course: the authors of the Web pages have no
idea what kind of program you're trying to write.) Even expert
advice can become nonsense when taken out of context. Your
grandmother's doctor is (presumably) an expert; do you therefore
take the same medicines that are prescribed for your grandmother?
IMHO, it is best to read and understand advice of the sort these
pages offer, but NOT to follow it until and unless you have reason
to believe it applies to your situation.
I appreciate your reply,
thanks for taking the time to do so.