data structure & alignment accessing speed on 32 bits system

P

pt

Hi,
i am wonderng what is faster according to accessing speed to read
these data structure from the disk in c/c++ including alignment
handling if we access it on little endian system 32 bits system + OS
e.g. Windows, Linux, WinCE. I am not quite sure about the alignment of
the memory....

soln. 1: should be faster, I am not sure.
idx size (bytes)
1 4
2 4
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
sum = 16 bytes

soln.2
idx size (bytes)
1 4
2 4
3 4 --> to get the contents back need to use shift bits 4+ times
4 4 --> to get the contents back need to use shift bits 4 times

sum = 16 bytes



Thanks for any suggestion!

pt.
 
V

Victor Bazarov

pt said:
i am wonderng what is faster according to accessing speed to read
these data structure from the disk in c/c++ including alignment
handling if we access it on little endian system 32 bits system + OS
e.g. Windows, Linux, WinCE.

I can't make heads or tails of this statement. Could you break it up into
shorter sentences?

Also, consider that performance depends greatly on the platform and the
compiler you're using, so there is no single answer to your question.
Performance (especially when such low level of operations is concerned)
is _measured_, not calculated. You write your functions and measure the
time it takes. Then you look at it from the overall program execution
standpoint and decide if the performance of any particular part is of
any importance.

Also, for OS-specific inquiries, consider posting to the newsgroup
dedicated to the OS[es].

V
 
P

pattreeya

Also, consider that performance depends greatly on the platform and the
compiler you're using, so there is no single answer to your question.
Performance (especially when such low level of operations is concerned)
is _measured_, not calculated. You write your functions and measure the
time it takes. Then you look at it from the overall program execution
standpoint and decide if the performance of any particular part is of
any importance.

Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
(Visual Studio .NET)
 
F

forkazoo

Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
(Visual Studio .NET)

Not really. I assume you are using x86. (Which is probably wrong
since you mention CE, but it's reasonably common, and you didn't say
what you are using so I will assume it anyway.)

Performance may be effected by any number of things. Aligned reads are
generally faster than unaligned reads, just as a completely general
rule of thumb. However, read penalties are usually less if the data is
in cache. Does your chip have cache? How much? What algorithms
control how the cache is filled and drained of data? Is the cache
shared between multiple cores? How about other processes? Any of this
can effect performance.

Does your CPU support MMX? SSE? 3DNow? Depending on what *exactly*
you are doing, these instructions may help immensely, or be of no
value. Does your compiler output those extra instructions? When does
it do so?

Are you using a 386? Opteron? P4? Some old 386's only had a 16 bit
address bus, so the alignment issue may not be a big deal, considering
you are already so slowed down by the bus. Perfectly optimal code on
an Opteron will look a bit different from optimal code on a P4 in most
cases.

So, what about the compiler? In a minimally optimising mode, what you
write will probably map very directly to the function of the machine
code. On a higher optimisation mode, the machine code may seem only
dimly related to what you wrote. And, two different looking chunks of
C which do the exact same thing may actually wind up being compiled to
the exact same machine code. Which compiler are you using? Which
version? gcc added a lot of new optimising bells and whistles
recently. Maybe one method will be better optimised by the new 4.X
bells and whistles, and the other method will be better optmised by the
older 3.X bells and whistles.

Those are some of the questions off the top of my head that would all
need to be considered before you could really say how well
micro-optimisations will work out. It's basically impossible to be
sure, so you just have to measure actual performance. And, you have to
measure it under actual running conditions.

When considering doing optimisations, people generally try to avoid
really low level stuff as much as possible. First, consider if you are
using an appropriate algorithm. This will almost always give you the
biggest possible speedup. If you can move from an O(n*n) algorithm to
an O(n log n) one, then you have probably sped things up tremendously.
From there, you need to profile, and see what is running slow. If you
spend two months making a perfectly aligned data structure with
perfectly aligned accesses, that may be great. But, if loading your
structure is only .001% of your run time, then there was probably no
point, even if that specific step is a million times faster. Once you
find what is running slow, start with the low hanging fruit.

For example, imagine that you are processing data in a file. If you
have established that reading the data is slow, and the processing is
reasonably quick, then you need to try and figure out what the easiest
way to speed up the reading is. If you are making a bunch of small
reads, then you could try lumping them together into some big reads
that get a bunch of data at once. This may result in less disk
seeking, which can improve things dramatically. You can also look at
some crazy non portable system calls which will take you a long time to
get working right. But, if grouping your reads gives you 90% of the
crazy and super complex solution, then there is probably no point to
going that route.

So, remember that micro-optimisations suffer bit rot. What you
micro-optimise today will almost certainly be subuptimal on the new
chip they release next month.

I've rambled quite a lot longer than I intended to, and I apologise.
As you can see, getting into micro-optimisations really explodes the
number of issues that might come up. That's why this group is such a
bad source of advice on those sorts of issues. This group is mostly
just about the stuff that is definitely specified in the C standard.
You will be much better off trying to get micro-optimisation advice in
a group dedicated to your compiler, or to assembly coding on your CPU,
etc.
 
H

Hallvard B Furuseth

pattreeya said:
i am wonderng what is faster according to accessing speed to read
these data structure from the disk in c/c++ including alignment
handling if we access it on little endian system 32 bits system + OS
e.g. Windows, Linux, WinCE. I am not quite sure about the alignment of
the memory....

This thread lists a lot of specific issues to consider, but here are
some general ones (none of them overriding the more specific matters:)

The compiler likely knows better than you do. Even when it doesn't, any
optimization you do for some specific hardware is likely to be obsolete
or even a slowdown soon enough, unless you take care to keep it updated.

So when people ask such a low-level question, the best solution is often
to rearrange their code so that the compiler will take care of the
matter. E.g. give your data the proper data type, and maybe put it a
struct or union instead of aligning it "by hand" in some malloced area.

If you do need to know the alignment of some type T, anyway, you can
ask the compiler to align it for you and read out the alignment it used.
C++ has some alignof feature, I don't remember what. In C, use:
struct align_T { char dummy; T align; };
offsetof(struct align_T, align)


One step more specific: 'int' is supposed to have the host's natural
word size, so the alignment the compiler gives 'int' _may_ be the best
minimum alignment (that is, the best alignment for data types smaller
than int). For example, I've heard of some host where memcpy() tried to
first deal with the first <address mod 3> bytes and then take the rest
in 4-byte chunks, or something like that. So if you want optimal access
to some char[] array, and it's not already inside a struct which
contains an int, a pointer, or some wider data type, you can use
union foo {
char data[SIZE];
int align;
};
Then the compiler will give it the proper alignment, and if you use
'union foo' pointers it will also know that the data is "int-size"
aligned, so it won't need to generate code to handle byte-sized
aligned data. (Unlike if you pass around char* pointers.)

On the other hand, even that little hack can be a pessimization:
For one thing, unions and structs are more complex to handle for the
compiler than simple char* arrays, so it may not optimize the code as
well. Also, pointers to different data types may have different
representations, and if you cause it to convert pointers back and forth
a lot you are out of luck. And if you increase the data size, you'll
increase memory use, reduce the usefulness of the cache, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top