Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
(Visual Studio .NET)
Not really. I assume you are using x86. (Which is probably wrong
since you mention CE, but it's reasonably common, and you didn't say
what you are using so I will assume it anyway.)
Performance may be effected by any number of things. Aligned reads are
generally faster than unaligned reads, just as a completely general
rule of thumb. However, read penalties are usually less if the data is
in cache. Does your chip have cache? How much? What algorithms
control how the cache is filled and drained of data? Is the cache
shared between multiple cores? How about other processes? Any of this
can effect performance.
Does your CPU support MMX? SSE? 3DNow? Depending on what *exactly*
you are doing, these instructions may help immensely, or be of no
value. Does your compiler output those extra instructions? When does
it do so?
Are you using a 386? Opteron? P4? Some old 386's only had a 16 bit
address bus, so the alignment issue may not be a big deal, considering
you are already so slowed down by the bus. Perfectly optimal code on
an Opteron will look a bit different from optimal code on a P4 in most
cases.
So, what about the compiler? In a minimally optimising mode, what you
write will probably map very directly to the function of the machine
code. On a higher optimisation mode, the machine code may seem only
dimly related to what you wrote. And, two different looking chunks of
C which do the exact same thing may actually wind up being compiled to
the exact same machine code. Which compiler are you using? Which
version? gcc added a lot of new optimising bells and whistles
recently. Maybe one method will be better optimised by the new 4.X
bells and whistles, and the other method will be better optmised by the
older 3.X bells and whistles.
Those are some of the questions off the top of my head that would all
need to be considered before you could really say how well
micro-optimisations will work out. It's basically impossible to be
sure, so you just have to measure actual performance. And, you have to
measure it under actual running conditions.
When considering doing optimisations, people generally try to avoid
really low level stuff as much as possible. First, consider if you are
using an appropriate algorithm. This will almost always give you the
biggest possible speedup. If you can move from an O(n*n) algorithm to
an O(n log n) one, then you have probably sped things up tremendously.
From there, you need to profile, and see what is running slow. If you
spend two months making a perfectly aligned data structure with
perfectly aligned accesses, that may be great. But, if loading your
structure is only .001% of your run time, then there was probably no
point, even if that specific step is a million times faster. Once you
find what is running slow, start with the low hanging fruit.
For example, imagine that you are processing data in a file. If you
have established that reading the data is slow, and the processing is
reasonably quick, then you need to try and figure out what the easiest
way to speed up the reading is. If you are making a bunch of small
reads, then you could try lumping them together into some big reads
that get a bunch of data at once. This may result in less disk
seeking, which can improve things dramatically. You can also look at
some crazy non portable system calls which will take you a long time to
get working right. But, if grouping your reads gives you 90% of the
crazy and super complex solution, then there is probably no point to
going that route.
So, remember that micro-optimisations suffer bit rot. What you
micro-optimise today will almost certainly be subuptimal on the new
chip they release next month.
I've rambled quite a lot longer than I intended to, and I apologise.
As you can see, getting into micro-optimisations really explodes the
number of issues that might come up. That's why this group is such a
bad source of advice on those sorts of issues. This group is mostly
just about the stuff that is definitely specified in the C standard.
You will be much better off trying to get micro-optimisation advice in
a group dedicated to your compiler, or to assembly coding on your CPU,
etc.