data structure & alignment accessing speed on 32 bits system

Discussion in 'C Programming' started by pt, Aug 1, 2006.

  1. pt

    pt Guest

    Hi,
    i am wonderng what is faster according to accessing speed to read
    these data structure from the disk in c/c++ including alignment
    handling if we access it on little endian system 32 bits system + OS
    e.g. Windows, Linux, WinCE. I am not quite sure about the alignment of
    the memory....

    soln. 1: should be faster, I am not sure.
    idx size (bytes)
    1 4
    2 4
    2 1
    3 1
    4 1
    5 1
    6 1
    7 1
    8 1
    9 1
    sum = 16 bytes

    soln.2
    idx size (bytes)
    1 4
    2 4
    3 4 --> to get the contents back need to use shift bits 4+ times
    4 4 --> to get the contents back need to use shift bits 4 times

    sum = 16 bytes



    Thanks for any suggestion!

    pt.
    pt, Aug 1, 2006
    #1
    1. Advertising

  2. pt wrote:
    > i am wonderng what is faster according to accessing speed to read
    > these data structure from the disk in c/c++ including alignment
    > handling if we access it on little endian system 32 bits system + OS
    > e.g. Windows, Linux, WinCE.


    I can't make heads or tails of this statement. Could you break it up into
    shorter sentences?

    Also, consider that performance depends greatly on the platform and the
    compiler you're using, so there is no single answer to your question.
    Performance (especially when such low level of operations is concerned)
    is _measured_, not calculated. You write your functions and measure the
    time it takes. Then you look at it from the overall program execution
    standpoint and decide if the performance of any particular part is of
    any importance.

    Also, for OS-specific inquiries, consider posting to the newsgroup
    dedicated to the OS[es].

    > [..]


    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Aug 1, 2006
    #2
    1. Advertising

  3. pt

    Guest

    > Also, consider that performance depends greatly on the platform and the
    > compiler you're using, so there is no single answer to your question.
    > Performance (especially when such low level of operations is concerned)
    > is _measured_, not calculated. You write your functions and measure the
    > time it takes. Then you look at it from the overall program execution
    > standpoint and decide if the performance of any particular part is of
    > any importance.


    Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
    (Visual Studio .NET)
    , Aug 1, 2006
    #3
  4. pt

    forkazoo Guest

    wrote:
    > > Also, consider that performance depends greatly on the platform and the
    > > compiler you're using, so there is no single answer to your question.
    > > Performance (especially when such low level of operations is concerned)
    > > is _measured_, not calculated. You write your functions and measure the
    > > time it takes. Then you look at it from the overall program execution
    > > standpoint and decide if the performance of any particular part is of
    > > any importance.

    >
    > Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
    > (Visual Studio .NET)


    Not really. I assume you are using x86. (Which is probably wrong
    since you mention CE, but it's reasonably common, and you didn't say
    what you are using so I will assume it anyway.)

    Performance may be effected by any number of things. Aligned reads are
    generally faster than unaligned reads, just as a completely general
    rule of thumb. However, read penalties are usually less if the data is
    in cache. Does your chip have cache? How much? What algorithms
    control how the cache is filled and drained of data? Is the cache
    shared between multiple cores? How about other processes? Any of this
    can effect performance.

    Does your CPU support MMX? SSE? 3DNow? Depending on what *exactly*
    you are doing, these instructions may help immensely, or be of no
    value. Does your compiler output those extra instructions? When does
    it do so?

    Are you using a 386? Opteron? P4? Some old 386's only had a 16 bit
    address bus, so the alignment issue may not be a big deal, considering
    you are already so slowed down by the bus. Perfectly optimal code on
    an Opteron will look a bit different from optimal code on a P4 in most
    cases.

    So, what about the compiler? In a minimally optimising mode, what you
    write will probably map very directly to the function of the machine
    code. On a higher optimisation mode, the machine code may seem only
    dimly related to what you wrote. And, two different looking chunks of
    C which do the exact same thing may actually wind up being compiled to
    the exact same machine code. Which compiler are you using? Which
    version? gcc added a lot of new optimising bells and whistles
    recently. Maybe one method will be better optimised by the new 4.X
    bells and whistles, and the other method will be better optmised by the
    older 3.X bells and whistles.

    Those are some of the questions off the top of my head that would all
    need to be considered before you could really say how well
    micro-optimisations will work out. It's basically impossible to be
    sure, so you just have to measure actual performance. And, you have to
    measure it under actual running conditions.

    When considering doing optimisations, people generally try to avoid
    really low level stuff as much as possible. First, consider if you are
    using an appropriate algorithm. This will almost always give you the
    biggest possible speedup. If you can move from an O(n*n) algorithm to
    an O(n log n) one, then you have probably sped things up tremendously.
    >From there, you need to profile, and see what is running slow. If you

    spend two months making a perfectly aligned data structure with
    perfectly aligned accesses, that may be great. But, if loading your
    structure is only .001% of your run time, then there was probably no
    point, even if that specific step is a million times faster. Once you
    find what is running slow, start with the low hanging fruit.

    For example, imagine that you are processing data in a file. If you
    have established that reading the data is slow, and the processing is
    reasonably quick, then you need to try and figure out what the easiest
    way to speed up the reading is. If you are making a bunch of small
    reads, then you could try lumping them together into some big reads
    that get a bunch of data at once. This may result in less disk
    seeking, which can improve things dramatically. You can also look at
    some crazy non portable system calls which will take you a long time to
    get working right. But, if grouping your reads gives you 90% of the
    crazy and super complex solution, then there is probably no point to
    going that route.

    So, remember that micro-optimisations suffer bit rot. What you
    micro-optimise today will almost certainly be subuptimal on the new
    chip they release next month.

    I've rambled quite a lot longer than I intended to, and I apologise.
    As you can see, getting into micro-optimisations really explodes the
    number of issues that might come up. That's why this group is such a
    bad source of advice on those sorts of issues. This group is mostly
    just about the stuff that is definitely specified in the C standard.
    You will be much better off trying to get micro-optimisation advice in
    a group dedicated to your compiler, or to assembly coding on your CPU,
    etc.
    forkazoo, Aug 1, 2006
    #4
  5. pattreeya writes:
    > i am wonderng what is faster according to accessing speed to read
    > these data structure from the disk in c/c++ including alignment
    > handling if we access it on little endian system 32 bits system + OS
    > e.g. Windows, Linux, WinCE. I am not quite sure about the alignment of
    > the memory....


    This thread lists a lot of specific issues to consider, but here are
    some general ones (none of them overriding the more specific matters:)

    The compiler likely knows better than you do. Even when it doesn't, any
    optimization you do for some specific hardware is likely to be obsolete
    or even a slowdown soon enough, unless you take care to keep it updated.

    So when people ask such a low-level question, the best solution is often
    to rearrange their code so that the compiler will take care of the
    matter. E.g. give your data the proper data type, and maybe put it a
    struct or union instead of aligning it "by hand" in some malloced area.

    If you do need to know the alignment of some type T, anyway, you can
    ask the compiler to align it for you and read out the alignment it used.
    C++ has some alignof feature, I don't remember what. In C, use:
    struct align_T { char dummy; T align; };
    offsetof(struct align_T, align)


    One step more specific: 'int' is supposed to have the host's natural
    word size, so the alignment the compiler gives 'int' _may_ be the best
    minimum alignment (that is, the best alignment for data types smaller
    than int). For example, I've heard of some host where memcpy() tried to
    first deal with the first <address mod 3> bytes and then take the rest
    in 4-byte chunks, or something like that. So if you want optimal access
    to some char[] array, and it's not already inside a struct which
    contains an int, a pointer, or some wider data type, you can use
    union foo {
    char data[SIZE];
    int align;
    };
    Then the compiler will give it the proper alignment, and if you use
    'union foo' pointers it will also know that the data is "int-size"
    aligned, so it won't need to generate code to handle byte-sized
    aligned data. (Unlike if you pass around char* pointers.)

    On the other hand, even that little hack can be a pessimization:
    For one thing, unions and structs are more complex to handle for the
    compiler than simple char* arrays, so it may not optimize the code as
    well. Also, pointers to different data types may have different
    representations, and if you cause it to convert pointers back and forth
    a lot you are out of luck. And if you increase the data size, you'll
    increase memory use, reduce the usefulness of the cache, etc.

    --
    Hallvard
    Hallvard B Furuseth, Aug 1, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GGG
    Replies:
    10
    Views:
    12,518
    Donar
    Jul 6, 2006
  2. sarmin kho
    Replies:
    2
    Views:
    819
    A. Lloyd Flanagan
    Jun 15, 2004
  3. Miki Tebeka
    Replies:
    1
    Views:
    435
    Marcin 'Qrczak' Kowalczyk
    Jun 14, 2004
  4. pt
    Replies:
    5
    Views:
    382
    Hallvard B Furuseth
    Aug 1, 2006
  5. Zhi
    Replies:
    2
    Views:
    1,122
    Mike Treseler
    Oct 9, 2007
Loading...

Share This Page