Cache

Discussion in 'C++' started by Thomas Kowalski, Sep 25, 2006.

  1. Hi,
    I would like to know which high level techniques can improve the number
    of cache hits. I am not trying to rewrite some code in assembler, just
    want to know which things might be considered to help a good compiler
    "understanding" my code better. Some examples or links would be very
    helpful.

    Thanks in advance,
    Thomas Kowalski
    Thomas Kowalski, Sep 25, 2006
    #1
    1. Advertising

  2. Thomas Kowalski wrote:
    > I would like to know which high level techniques can improve the
    > number of cache hits. I am not trying to rewrite some code in
    > assembler, just want to know which things might be considered to help
    > a good compiler "understanding" my code better. Some examples or
    > links would be very helpful.


    Last time I came across the recommendations of this sort it was mostly
    to place objects in such a way that when you follow the references (not
    C++ references, but generally using pointers, indices, etc) from one to
    another, the objects found (resolved) would be close-by.

    Beware, those are always platform-specific, and so is the advice you'll
    receive, I am fairly certain. Try a good book, like "Efficient C++".

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Sep 25, 2006
    #2
    1. Advertising

  3. Thomas Kowalski wrote:
    > Hi,
    > I would like to know which high level techniques can improve the number
    > of cache hits. I am not trying to rewrite some code in assembler, just
    > want to know which things might be considered to help a good compiler
    > "understanding" my code better. Some examples or links would be very
    > helpful.

    My reply is perhaps slightly OT here: Have you tried profiling it with
    valgrind's cachegrind, or the frontend to it, kcachegrind? I can't give
    many hard and fast rules (except for the case of linear algebra), but I
    imagine kcachegrind could give you a somewhat better view of what's
    going on.

    Alan
    Alan Woodland, Sep 25, 2006
    #3
  4. Thomas Kowalski wrote:
    > Hi,
    > I would like to know which high level techniques can improve the number
    > of cache hits. I am not trying to rewrite some code in assembler, just
    > want to know which things might be considered to help a good compiler
    > "understanding" my code better. Some examples or links would be very
    > helpful.


    I imagine this would be hard on the source level, but one
    advice springs to mind (similar to what Victor had pointed out)
    -- it is usually beneficial to access memory addresses that
    are close together, in an increasing order. For instance

    int arr[1000][1000];
    long int sum=0;

    for(int i=0; i<1000; ++i)
    for(int j=0; j<1000; ++j)
    sum+=arr[j];

    would probably run significantly faster than

    for(int i=0; i<1000; ++i)
    for(int j=0; j<1000; ++j)
    sum+=arr[j];

    because in the first case the memory occupied by arr is
    traversed linearly. Therefore it is sometimes beneficial
    to have a transposed copy of an array that you need to
    access in the latter fashion.

    <OT>
    Other than that, I remember that aligning objects to
    some boundaries may speed things up. Try looking up
    "alignment" or __declspec(align(n)) in your compiler's
    manual. For example on the Itanium 2 it is beneficial
    to align variables at 16-byte boundaries. The cache
    also usually displays this property that certain
    alignments are ill advised, because they interfere
    with the cache's internal hashing algorithm. For
    instance on the Itanium 2 accessing variables in
    succession that all lie at addresses divisible by 2048
    causes very poor performance.
    </OT>

    As a last note -- definitely use a profiler to
    check where the bottleneck lies. It quite often
    lies far from where you'd expect it to be by
    inspecting the source code.

    HTH,
    - J.
    Jacek Dziedzic, Sep 25, 2006
    #4
  5. Thomas Kowalski

    Bart Guest

    Thomas Kowalski wrote:
    > Hi,
    > I would like to know which high level techniques can improve the number
    > of cache hits. I am not trying to rewrite some code in assembler, just
    > want to know which things might be considered to help a good compiler
    > "understanding" my code better. Some examples or links would be very
    > helpful.


    As others have pointed out, it's obviously very platform-specific.
    Ideally, the part of your data structure that you are accessing should
    fit into cache. There are various tricks already mentioned by others
    like alignment, but again, these are just too specific to the
    particular processor you're using.

    Regards,
    Bart.
    Bart, Sep 25, 2006
    #5
  6. Thomas Kowalski

    Bart Guest

    Jacek Dziedzic wrote:
    > I imagine this would be hard on the source level, but one
    > advice springs to mind (similar to what Victor had pointed out)
    > -- it is usually beneficial to access memory addresses that
    > are close together, in an increasing order. For instance
    >
    > int arr[1000][1000];
    > long int sum=0;
    >
    > for(int i=0; i<1000; ++i)
    > for(int j=0; j<1000; ++j)
    > sum+=arr[j];
    >
    > would probably run significantly faster than
    >
    > for(int i=0; i<1000; ++i)
    > for(int j=0; j<1000; ++j)
    > sum+=arr[j];
    >
    > because in the first case the memory occupied by arr is
    > traversed linearly.


    That's entirely dependent on the situation. If the whole structure fits
    into cache then that wouldn't be the case. Okay, arguably not too many
    computers have 1MB of cache, but the point is that you have to analyze
    each case separately. General advice on this particular topic is rarely
    useful and gets obsolete very quickly.

    Regards,
    Bart.
    Bart, Sep 25, 2006
    #6
  7. Thomas Kowalski

    Bart Guest

    Bart wrote:
    > Jacek Dziedzic wrote:
    > > I imagine this would be hard on the source level, but one
    > > advice springs to mind (similar to what Victor had pointed out)
    > > -- it is usually beneficial to access memory addresses that
    > > are close together, in an increasing order. For instance
    > >
    > > int arr[1000][1000];
    > > long int sum=0;
    > >
    > > for(int i=0; i<1000; ++i)
    > > for(int j=0; j<1000; ++j)
    > > sum+=arr[j];
    > >
    > > would probably run significantly faster than
    > >
    > > for(int i=0; i<1000; ++i)
    > > for(int j=0; j<1000; ++j)
    > > sum+=arr[j];
    > >
    > > because in the first case the memory occupied by arr is
    > > traversed linearly.

    >
    > That's entirely dependent on the situation. If the whole structure fits
    > into cache then that wouldn't be the case. Okay, arguably not too many
    > computers have 1MB of cache,


    For some reason, I'm somewhat careless today. I meant 1 million ints
    but wrote 1MB instead.

    Sorry for that,

    Regards,
    Bart.
    Bart, Sep 25, 2006
    #7
  8. Thomas Kowalski wrote:
    > Hi,
    > I would like to know which high level techniques can improve the number
    > of cache hits. I am not trying to rewrite some code in assembler, just
    > want to know which things might be considered to help a good compiler
    > "understanding" my code better. Some examples or links would be very
    > helpful.
    >


    One of the most effective techniques is "blocking" or taking chunks of
    your data and operating on the chunks.

    e.g. If I was doing this:

    for ( i = 0; i < 1000; ++ i ) {
    for ( j = 0; j < 1000; ++ j ) {
    magic....
    }
    }

    it can be turned into :

    for ( i = 0; i < 1000; i += 10 ) {
    for ( j = 0; j < 1000; j += 10) {
    type temp_data[10];
    for ( ii = i; ii < i + 10; ++i ) {
    for ( jj = j; jj < j + 10; ++j ) {
    tmp_data[ii-i] += magic ....
    }
    }
    }
    }

    This is very simplistic, but, basically the idea is to keep as much data
    in cache. I have seen 100x speed increases using this technique on
    large data.
    Gianni Mariani, Sep 25, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Nokes

    Cache::Cache Stale Segments

    Jeff Nokes, Sep 30, 2003, in forum: Perl
    Replies:
    0
    Views:
    558
    Jeff Nokes
    Sep 30, 2003
  2. DesignerX

    Page.Cache vs HttpContext.Current.Cache

    DesignerX, Jan 20, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    8,227
    vMike
    Jan 20, 2004
  3. =?Utf-8?B?b25l?=
    Replies:
    1
    Views:
    5,271
    Karl Seguin [MVP]
    Mar 8, 2006
  4. Sergey via DotNetMonster.com

    ASP.NET Cache vs Window System Cache

    Sergey via DotNetMonster.com, Nov 15, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    463
    Sergey via DotNetMonster.com
    Nov 15, 2006
  5. John
    Replies:
    2
    Views:
    1,172
Loading...

Share This Page