single producer, single consumer

Discussion in 'C++' started by goodfella, Oct 6, 2009.

  1. goodfella

    goodfella Guest

    Here is some code I came up with for a lock free single producer
    single consumer algorithm. I have done some tests using GCC with -O3
    optimizations and all seems well. I would like some constructive
    criticism of the code as I am not yet an expert on concurrency.

    I'm assuming that reading and writing the value of a pointer is
    atomic. Although, I have read on some architectures that is a bad
    assumption, so to insure correctness across all platforms, I may have
    to wait for C++ atomic types. Any suggestions and criticisms are
    welcomed thanks. The code is below.


    bool push_element(T* el);
    T* pop_element();

    const size_t Size = 1024;
    T* volatile data[Size];

    bool push_element(T* el)
    {
    static int write_idx = 0;

    if( !data[write_idx] )
    {
    data[write_idx] = el;
    write_idx = (write_idx + 1) % Size;
    return true;
    }
    else
    return false;
    }

    T* pop_element()
    {
    static int read_idx = 0;

    T* element;

    if( element = data[read_idx] )
    {
    data[read_idx] = 0;
    read_idx = (read_idx + 1) % Size;
    return element;
    }
    else
    return 0;
    }
     
    goodfella, Oct 6, 2009
    #1
    1. Advertising

  2. goodfella

    REH Guest

    On Oct 6, 10:00 am, goodfella <> wrote:
    > Here is some code I came up with for a lock free single producer
    > single consumer algorithm.  I have done some tests using GCC with -O3
    > optimizations and all seems well.  I would like some constructive
    > criticism of the code as I am not yet an expert on concurrency.
    >
    > I'm assuming that reading and writing the value of a pointer is
    > atomic.  Although, I have read on some architectures that is a bad
    > assumption, so to insure correctness across all platforms, I may have
    > to wait for C++ atomic types.  Any suggestions and criticisms are
    > welcomed thanks.  The code is below.
    >
    > bool push_element(T* el);
    > T* pop_element();
    >
    > const size_t Size = 1024;
    > T* volatile data[Size];
    >
    > bool push_element(T* el)
    > {
    >     static int write_idx = 0;
    >
    >     if( !data[write_idx] )
    >     {
    >         data[write_idx] = el;
    >         write_idx = (write_idx + 1) % Size;
    >         return true;
    >     }
    >     else
    >         return false;
    >
    > }
    >
    > T* pop_element()
    > {
    >     static int read_idx = 0;
    >
    >     T* element;
    >
    >     if( element = data[read_idx] )
    >     {
    >         data[read_idx] = 0;
    >         read_idx = (read_idx + 1) % Size;
    >         return element;
    >     }
    >     else
    >         return 0;
    >
    > }


    You have a few potential issues:
    1) Even if pointer accesses are atomic on your machine, your array
    element accesses most certainly aren't.
    2) If you run this on a multi-processor system, you will have cache-
    coherency issues.

    You really need to use proper atomic operations. Most systems provide
    these. GCC 4.1 or higher has atomic built-ins. Windows has atomic
    APIs. Most processors provide them (e.g., Intel has lock+cmpxchg).

    REH
     
    REH, Oct 6, 2009
    #2
    1. Advertising

  3. goodfella

    REH Guest

    On Oct 6, 10:00 am, goodfella <> wrote:
    > Here is some code I came up with for a lock free single producer
    > single consumer algorithm.  I have done some tests using GCC with -O3
    > optimizations and all seems well.  I would like some constructive
    > criticism of the code as I am not yet an expert on concurrency.
    >
    > I'm assuming that reading and writing the value of a pointer is
    > atomic.  Although, I have read on some architectures that is a bad
    > assumption, so to insure correctness across all platforms, I may have
    > to wait for C++ atomic types.  Any suggestions and criticisms are
    > welcomed thanks.  The code is below.
    >


    Incidentally, I've written a lock-free multi-read/multi-write queue
    for multi-core machines that runs on both Windows and Linux, if you
    need an example. It probably does more than you want/need, but you
    could strip it down.

    REH
     
    REH, Oct 6, 2009
    #3
  4. goodfella

    goodfella Guest

    On Oct 6, 7:29 am, REH <> wrote:
    > On Oct 6, 10:00 am, goodfella <> wrote:
    >
    > > Here is some code I came up with for a lock free single producer
    > > single consumer algorithm.  I have done some tests using GCC with -O3
    > > optimizations and all seems well.  I would like some constructive
    > > criticism of the code as I am not yet an expert on concurrency.

    >
    > > I'm assuming that reading and writing the value of a pointer is
    > > atomic.  Although, I have read on some architectures that is a bad
    > > assumption, so to insure correctness across all platforms, I may have
    > > to wait for C++ atomic types.  Any suggestions and criticisms are
    > > welcomed thanks.  The code is below.

    >
    > Incidentally, I've written a lock-free multi-read/multi-write queue
    > for multi-core machines that runs on both Windows and Linux, if you
    > need an example. It probably does more than you want/need, but you
    > could strip it down.
    >
    > REH


    REH, thanks for your replay, and yes I would be interested in seeing
    your lock-free multi-read/multi-write queue. However, I do not see
    how the array element access would not be atomic. From my
    understanding, the following line of code:

    element = data[read_idx]

    Is converted to something like:

    T* temp = data + read_idx
    element = temp

    The first line might not be atomic, but if reading the value from a
    pointer is atomic, my code is still valid because its the second line
    that counts. Neither data nor read_idx are going to be updated by
    another thread. data is never pointed to another array of T* and
    read_idx is only updated by the consumer thread.

    Also, could you elaborate more on your second point, thanks.
     
    goodfella, Oct 6, 2009
    #4
  5. goodfella

    REH Guest

    On Oct 6, 1:22 pm, goodfella <> wrote:
    > REH, thanks for your replay, and yes I would be interested in seeing
    > your lock-free multi-read/multi-write queue.  However, I do not see
    > how the array element access would not be atomic.  From my
    > understanding, the following line of code:
    >
    > element = data[read_idx]
    >
    > Is converted to something like:
    >
    > T* temp = data + read_idx
    > element = temp
    >
    > The first line might not be atomic, but if reading the value from a
    > pointer is atomic, my code is still valid because its the second line
    > that counts.  Neither data nor read_idx are going to be updated by
    > another thread.  data is never pointed to another array of T* and
    > read_idx is only updated by the consumer thread.


    I glossed over the part where you said you had single readers and
    writers. In that limited case, and if you can guarantee that your
    pointer accesses are atomic, I think you are OK.


    >
    > Also, could you elaborate more on your second point, thanks.


    If your code is running on a system with more than one processor
    (e.g., a multi-core processor), each processor will have its own
    cache. If one thread updates a pointer, the other processor(s) may not
    see it, because they are accessing a value stored in their cache. One
    of the things an atomic operation needs to do is ensure the value it
    is working with is the most recent one, regardless of any caches that
    may lay between the processor and memory. You normally don't give this
    a second thought in multi-threaded code because your synchronization
    objects (i.e., mutexes, semaphores, etc.) do this for you. They also
    ensure that optimizations such as code reordering do not change the
    logic. Code that must occur before or after the atomic operation must
    not be move across that operation. You cannot provide these guarantees
    in normal C++ (yet).

    REH
     
    REH, Oct 6, 2009
    #5
  6. goodfella

    REH Guest

    On Oct 6, 1:22 pm, goodfella <> wrote:
    > REH, thanks for your replay, and yes I would be interested in seeing
    > your lock-free multi-read/multi-write queue.


    It extracted the relevant code, and wrote a README file to explain all
    the pieces. I can email it to you tonight when I get home. Is the
    email you have above the correct one to use?

    REH
     
    REH, Oct 6, 2009
    #6
  7. On Oct 6, 7:00 am, goodfella <> wrote:
    > Here is some code I came up with for a lock free single producer
    > single consumer algorithm.  I have done some tests using GCC with -O3
    > optimizations and all seems well.  I would like some constructive
    > criticism of the code as I am not yet an expert on concurrency.
    >
    > I'm assuming that reading and writing the value of a pointer is
    > atomic.  Although, I have read on some architectures that is a bad
    > assumption, so to insure correctness across all platforms, I may have
    > to wait for C++ atomic types.  Any suggestions and criticisms are
    > welcomed thanks.  The code is below.
    >

    [Snip code]

    As already mentioned, multi-processor systems will probably not run
    your code as you intended because of cache coherency. I'd strongly
    suggest looking it up and getting some good references on threading.
    The short version is that BOTH the reader and the writer threads must
    execute some kind of synchronization primitive to have any guarantee
    WHATSOEVER on the order that writes become visible to another thread
    (where a synchronization primitive might be a full blown lock, acquire
    and release memory barriers, read and write memory barriers, etc.).
    Ex:

    #include <string>
    #include <fstream>
    using namespace std;

    //insert threading library
    template <typename callable_t>
    void startThread(callable_t callable);

    int x = 0;
    int y = 0;

    struct foo
    { foo(string const& f) : filename(f) {}
    string filename;
    void operator() ()
    { ofstream fout(filename.c_str());
    fout << x << " " << y << endl;
    }
    };

    int main()
    { startThread(foo("a"));
    startThread(foo("b"));
    startThread(foo("c"));
    startThread(foo("d"));
    x = 1;
    y = 2;
    }

    This program, on a \single\ execution, can produce 4 files with the
    contents (though highly unlikely for this contrived example):
    0 0
    0 2
    1 0
    1 2

    This is the cache coherency problem. Let me emphasize again: two
    different threads may see the same writes occur in different orders.
    On modern machines, there is no single view of main memory. Everyone
    has their own local copy because of their own independent caches. One
    thread may execute the instructions "load register to x" then "load
    register to y", but another thread may see it happen in the opposite
    order.

    Synchronization instructions make the view consistent when used
    properly. Atomic is insufficient. The pointer might have been updated
    but what it points to has not been updated. See C++ And The Perils Of
    Double Checked Locking
    http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    for an excellent description of modern threading, and common
    pitfalls.

    This is of course ignoring the other two big "culprits" of making non-
    conformant code do what the user does not expect: compiler
    optimizations and single core pipeline reorderings / optimizations.

    Also, even on a single core single processor machine, you might still
    hit related issues due to compiler optimizations and hardware pipeline
    optimizations, especially with signal handlers.
     
    Joshua Maurice, Oct 6, 2009
    #7
  8. goodfella

    goodfella Guest

    On Oct 6, 1:52 pm, Joshua Maurice <> wrote:
    > On Oct 6, 7:00 am, goodfella <> wrote:> Here is some code I came up with for a lock free single producer
    > > single consumer algorithm.  I have done some tests using GCC with -O3
    > > optimizations and all seems well.  I would like some constructive
    > > criticism of the code as I am not yet an expert on concurrency.

    >
    > > I'm assuming that reading and writing the value of a pointer is
    > > atomic.  Although, I have read on some architectures that is a bad
    > > assumption, so to insure correctness across all platforms, I may have
    > > to wait for C++ atomic types.  Any suggestions and criticisms are
    > > welcomed thanks.  The code is below.

    >
    > [Snip code]
    >
    > As already mentioned, multi-processor systems will probably not run
    > your code as you intended because of cache coherency. I'd strongly
    > suggest looking it up and getting some good references on threading.
    > The short version is that BOTH the reader and the writer threads must
    > execute some kind of synchronization primitive to have any guarantee
    > WHATSOEVER on the order that writes become visible to another thread
    > (where a synchronization primitive might be a full blown lock, acquire
    > and release memory barriers, read and write memory barriers, etc.).
    > Ex:
    >
    > #include <string>
    > #include <fstream>
    > using namespace std;
    >
    > //insert threading library
    > template <typename callable_t>
    > void startThread(callable_t callable);
    >
    > int x = 0;
    > int y = 0;
    >
    > struct foo
    > {   foo(string const& f) : filename(f) {}
    >     string filename;
    >     void operator() ()
    >     {   ofstream fout(filename.c_str());
    >         fout << x << " " << y << endl;
    >     }
    >
    > };
    >
    > int main()
    > {   startThread(foo("a"));
    >     startThread(foo("b"));
    >     startThread(foo("c"));
    >     startThread(foo("d"));
    >     x = 1;
    >     y = 2;
    >
    > }
    >
    > This program, on a \single\ execution, can produce 4 files with the
    > contents (though highly unlikely for this contrived example):
    >     0 0
    >     0 2
    >     1 0
    >     1 2
    >
    > This is the cache coherency problem. Let me emphasize again: two
    > different threads may see the same writes occur in different orders.
    > On modern machines, there is no single view of main memory. Everyone
    > has their own local copy because of their own independent caches. One
    > thread may execute the instructions "load register to x" then "load
    > register to y", but another thread may see it happen in the opposite
    > order.
    >
    > Synchronization instructions make the view consistent when used
    > properly. Atomic is insufficient. The pointer might have been updated
    > but what it points to has not been updated. See C++ And The Perils Of
    > Double Checked Lockinghttp://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    > for an excellent description of modern threading, and common
    > pitfalls.
    >
    > This is of course ignoring the other two big "culprits" of making non-
    > conformant code do what the user does not expect: compiler
    > optimizations and single core pipeline reorderings / optimizations.
    >
    > Also, even on a single core single processor machine, you might still
    > hit related issues due to compiler optimizations and hardware pipeline
    > optimizations, especially with signal handlers.


    The machine I'm testing this on has a dual core processor, and I have
    yet to hit a problem like that. I would expect the process to hang
    because the producer would constantly be getting a false value
    returned from the push function and the consumer would constantly be
    getting a null pointer from the pop function due to the fact that the
    pointer value would be stored in their respective cache's and a write
    to the pointer in one cache would not propagate to the pointer value
    in the other cache. Are there any guarantees in regard to cache
    coherency on certain platforms?
     
    goodfella, Oct 6, 2009
    #8
  9. On Oct 6, 3:10 pm, goodfella <> wrote:
    > On Oct 6, 1:52 pm, Joshua Maurice <> wrote:
    >
    >
    >
    > > On Oct 6, 7:00 am, goodfella <> wrote:> Here is some code I came up with for a lock free single producer
    > > > single consumer algorithm.  I have done some tests using GCC with -O3
    > > > optimizations and all seems well.  I would like some constructive
    > > > criticism of the code as I am not yet an expert on concurrency.

    >
    > > > I'm assuming that reading and writing the value of a pointer is
    > > > atomic.  Although, I have read on some architectures that is a bad
    > > > assumption, so to insure correctness across all platforms, I may have
    > > > to wait for C++ atomic types.  Any suggestions and criticisms are
    > > > welcomed thanks.  The code is below.

    >
    > > [Snip code]

    >
    > > As already mentioned, multi-processor systems will probably not run
    > > your code as you intended because of cache coherency. I'd strongly
    > > suggest looking it up and getting some good references on threading.
    > > The short version is that BOTH the reader and the writer threads must
    > > execute some kind of synchronization primitive to have any guarantee
    > > WHATSOEVER on the order that writes become visible to another thread
    > > (where a synchronization primitive might be a full blown lock, acquire
    > > and release memory barriers, read and write memory barriers, etc.).
    > > Ex:

    >
    > > #include <string>
    > > #include <fstream>
    > > using namespace std;

    >
    > > //insert threading library
    > > template <typename callable_t>
    > > void startThread(callable_t callable);

    >
    > > int x = 0;
    > > int y = 0;

    >
    > > struct foo
    > > {   foo(string const& f) : filename(f) {}
    > >     string filename;
    > >     void operator() ()
    > >     {   ofstream fout(filename.c_str());
    > >         fout << x << " " << y << endl;
    > >     }

    >
    > > };

    >
    > > int main()
    > > {   startThread(foo("a"));
    > >     startThread(foo("b"));
    > >     startThread(foo("c"));
    > >     startThread(foo("d"));
    > >     x = 1;
    > >     y = 2;

    >
    > > }

    >
    > > This program, on a \single\ execution, can produce 4 files with the
    > > contents (though highly unlikely for this contrived example):
    > >     0 0
    > >     0 2
    > >     1 0
    > >     1 2

    >
    > > This is the cache coherency problem. Let me emphasize again: two
    > > different threads may see the same writes occur in different orders.
    > > On modern machines, there is no single view of main memory. Everyone
    > > has their own local copy because of their own independent caches. One
    > > thread may execute the instructions "load register to x" then "load
    > > register to y", but another thread may see it happen in the opposite
    > > order.

    >
    > > Synchronization instructions make the view consistent when used
    > > properly. Atomic is insufficient. The pointer might have been updated
    > > but what it points to has not been updated. See C++ And The Perils Of
    > > Double Checked Lockinghttp://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    > > for an excellent description of modern threading, and common
    > > pitfalls.

    >
    > > This is of course ignoring the other two big "culprits" of making non-
    > > conformant code do what the user does not expect: compiler
    > > optimizations and single core pipeline reorderings / optimizations.

    >
    > > Also, even on a single core single processor machine, you might still
    > > hit related issues due to compiler optimizations and hardware pipeline
    > > optimizations, especially with signal handlers.

    >
    > The machine I'm testing this on has a dual core processor, and I have
    > yet to hit a problem like that.


    Probably just luck. No one side it's likely to fail. Depending on the
    application it could be quite rare. You don't want such a Heisen-bug
    to show up only once a month on a customer's machine.

    > I would expect the process to hang
    > because the producer would constantly be getting a false value
    > returned from the push function and the consumer would constantly be
    > getting a null pointer from the pop function due to the fact that the
    > pointer value would be stored in their respective cache's and a write
    > to the pointer in one cache would not propagate to the pointer value
    > in the other cache.


    Hardware, from the perspective of a software programmer, is a black
    art. You seem to have a very poor understanding of how modern
    operating systems work, and how basic processors and caches work. Just
    follow the rules and you should be OK. Here are some examples from
    real machines.

    Caches are not infinite in size. Eventually they get full and things
    must be evicted, aka spilled, to main memory. Generally they use a
    heuristic of least recently referenced, which means first in is not
    guaranteed to be first out. (To answer your question, surely you have
    other processes running on your machine, and swapping between these
    processes many times a second means that the cache will be cleared
    many times a second from this context switching.)

    The reader's cache may contain a stale entry for one but not the
    other, which means it'll fetch the new version from main memory for
    one but not the other.

    The infamous DEC Alpha had a split cache, meaning that (something
    like ??) all even addresses went to one cache, and all odd addresses
    went to the other cache. This is much more likely to produce effects
    like above, as something much more recently referenced will be flushed
    to main memory first because that subportion of the cache is full.
    (The DEC Alpha also has crazy look ahead, where **x may be the newest
    value in main memory, but *x may be a stale value. At least, that's
    how it's been explained to me. I have no clue how or why you would
    implement that.)

    I'd assume there exist some caches which use other heuristics, like
    locality heuristics. Maybe some caches only contain ~20 byte units of
    main memory, in effect like a look-ahead optimization, pulling data
    from main memory before it's required, resulting in possibly stale
    data for some data if it's near other used data, but if it's far away
    in main memory, then it will be much more likely to be the newest
    version of main memory.

    I'm sure there are even "weirder" optimizations which will mess with
    you if you don't follow the rules. The programming language gives you
    a contract, and that contract is both threads must execute the proper
    synchronization primitives to give any guarantee of the order that
    writes become visible from one thread to the other. The programming
    language, hardware, and the operating system fulfill this contract,
    and they will do their best to optimize within the confines of this
    contract. If you do not follow the contract, then you violated this
    contract, and violated assumptions made by the implementation, and in
    which case you're SOL.

    > Are there any guarantees in regard to cache
    > coherency on certain platforms?


    Probably. I don't know offhand.
     
    Joshua Maurice, Oct 6, 2009
    #9
  10. "goodfella" <> wrote in message
    news:...
    > Here is some code I came up with for a lock free single producer
    > single consumer algorithm. I have done some tests using GCC with -O3
    > optimizations and all seems well. I would like some constructive
    > criticism of the code as I am not yet an expert on concurrency.
    >
    > I'm assuming that reading and writing the value of a pointer is
    > atomic. Although, I have read on some architectures that is a bad
    > assumption, so to insure correctness across all platforms, I may have
    > to wait for C++ atomic types. Any suggestions and criticisms are
    > welcomed thanks. The code is below.


    [...]

    Check this example code out:

    (IA-32/MSVC/PThread)

    http://pastebin.com/f693b64b3


    It's a wait-free fast-pathed single-consumer/producer bounded ring buffer
    that allows for efficient in place writes and reads. A producer can directly
    reserve and utilize buffer space, then atomically commit it such that a
    consumer can read and process the buffer in place. Nothing gets overwritten,
    everything works, and it has conditional blocking ability.

    The code has the ability to block because it makes efficient use of my
    eventcount algorithm. The eventcount is as important to non-blocking code as
    condition variables are to PThread's. This uses my original algorithm which
    I disclosed to the public here:


    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380
     
    Chris M. Thomasson, Oct 7, 2009
    #10
  11. goodfella

    goodfella Guest

    On Oct 6, 5:54 pm, "Chris M. Thomasson" <> wrote:
    > "goodfella" <> wrote in message
    >
    > news:...
    >
    > > Here is some code I came up with for a lock free single producer
    > > single consumer algorithm.  I have done some tests using GCC with -O3
    > > optimizations and all seems well.  I would like some constructive
    > > criticism of the code as I am not yet an expert on concurrency.

    >
    > > I'm assuming that reading and writing the value of a pointer is
    > > atomic.  Although, I have read on some architectures that is a bad
    > > assumption, so to insure correctness across all platforms, I may have
    > > to wait for C++ atomic types.  Any suggestions and criticisms are
    > > welcomed thanks.  The code is below.

    >
    > [...]
    >
    > Check this example code out:
    >
    > (IA-32/MSVC/PThread)
    >
    > http://pastebin.com/f693b64b3
    >
    > It's a wait-free fast-pathed single-consumer/producer bounded ring buffer
    > that allows for efficient in place writes and reads. A producer can directly
    > reserve and utilize buffer space, then atomically commit it such that a
    > consumer can read and process the buffer in place. Nothing gets overwritten,
    > everything works, and it has conditional blocking ability.
    >
    > The code has the ability to block because it makes efficient use of my
    > eventcount algorithm. The eventcount is as important to non-blocking code as
    > condition variables are to PThread's. This uses my original algorithm which
    > I disclosed to the public here:
    >
    > http://groups.google.com/group/comp.programming.threads/browse_frm/th...


    Thanks, I'll take a look at that as well.
     
    goodfella, Oct 7, 2009
    #11
  12. goodfella

    goodfella Guest

    On Oct 6, 6:23 pm, goodfella <> wrote:

    I have read the following Wikipedia article about cache coherrency:


    http://en.wikipedia.org/wiki/Cache_coherence


    and it seems that cache coherency guarantees the following behavior on
    reads and writes of memory in the same location:


    A read made by processor P1 to location X that follows a write by
    another processor P2 to X must return the written value made by P2 if
    no other writes to X made by any processor occur between the two
    accesses. This condition defines the concept of coherent view of
    memory. If processors can read the same old value after the write made
    by P2, we can say that the memory is incoherent.


    So, from what I have gathered, a read made by processor P1 of memory
    location X, following the write of A by processor P2 to memory
    location X, should, at some point in time, return the value A. In
    addition, if P1 writes A to memory location X, P1 will read A from
    memory location X until either it or another processor writes another
    value to X.

    The above statements about cache coherency are enough to show that my
    code works on systems that guarantee cache coherency. The consumer and
    producer at any given time are only concerned with the writing and
    reading of one particular memory location. The producer is only going
    to write to a memory location if the value of the memory location is
    zero and the consumer is only going to read from a memory location if
    the value of the memory location is not zero.

    At any given time, the producer thread is either going to see a null
    pointer or a valid pointer at memory location X. If it sees a valid
    pointer at memory location X, it will do nothing. If it sees a null
    pointer at memory location X it will write a valid memory address to
    that memory location.

    At any given time, the consumer thread, is either going to see a null
    pointer or a valid pointer at memory location X. If it sees a null
    pointer it will do nothing. If it sees a valid pointer at memory
    location X it will read that value and write zero to the memory
    location.

    Only the producer writes a valid address to memory location X and only
    the consumer writes zero to memory location X. The cache coherency
    protocals insure that when one processor writes to a memory location,
    the new value of that memory location will eventually be available to
    the other processors.

    The ordering of reads and writes on different memory locations does
    not apply in my case because my code is not dependent on the values of
    two different memory locations at the same time. The if statments in
    the push and pop functions rely on one particular memory location at a
    time, and assuming for the moment that reading the value of a pointer
    is atomic, the read of the pointer is going to either return zero or a
    valid memory address.

    So unless the architecture that my code is ran on fails to guarantee
    cache coherency, I don't see how my code could suffer from cache
    coherency problems.
     
    goodfella, Oct 8, 2009
    #12
  13. On Oct 7, 6:31 pm, goodfella <> wrote:
    > On Oct 6, 6:23 pm, goodfella <> wrote:
    >
    > I have read the following Wikipedia article about cache coherrency:
    >
    > http://en.wikipedia.org/wiki/Cache_coherence

    [snip]
    > So unless the architecture that my code is ran on fails to guarantee
    > cache coherency, I don't see how my code could suffer from cache
    > coherency problems.


    True. Which processors are those again? Are you fine writing non-
    portable code?
     
    Joshua Maurice, Oct 8, 2009
    #13
  14. "Joshua Maurice" <> wrote in message
    news:...
    On Oct 7, 6:31 pm, goodfella <> wrote:
    > > On Oct 6, 6:23 pm, goodfella <> wrote:
    > >
    > > I have read the following Wikipedia article about cache coherrency:
    > >
    > > http://en.wikipedia.org/wiki/Cache_coherence

    [snip]
    > > So unless the architecture that my code is ran on fails to guarantee
    > > cache coherency, I don't see how my code could suffer from cache
    > > coherency problems.


    > True. Which processors are those again? Are you fine writing non-
    > portable code?


    http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520f7af13ea5
    (read all...)
     
    Chris M. Thomasson, Oct 8, 2009
    #14
  15. goodfella

    REH Guest

    On Oct 7, 9:31 pm, goodfella <> wrote:
    > On Oct 6, 6:23 pm, goodfella <> wrote:
    >
    > I have read the following Wikipedia article about cache coherrency:
    >
    > http://en.wikipedia.org/wiki/Cache_coherence
    >
    > and it seems that cache coherency guarantees the following behavior on
    > reads and writes of memory in the same location:
    >


    You seem to be confused. Cache coherency is not something that is
    guaranteed by the architecture. It is something your implementation
    must guarantee in order to be deterministic, reliable, and portable.
    You can make all the attempts at a logical argument you want, and run
    tests as many times as you want, but unless it has the proper memory
    barriers, you simply cannot guarantee your code is correct.

    I write real-time, embedded systems with safety critical requirements.
    Without the proper atomic operations, your code would be rejected at
    the first code review.

    REH
     
    REH, Oct 8, 2009
    #15
  16. On Oct 7, 7:13 pm, "Chris M. Thomasson" <> wrote:
    > "Joshua Maurice" <> wrote in message
    >
    > news:...
    > On Oct 7, 6:31 pm, goodfella <> wrote:
    >
    > > > On Oct 6, 6:23 pm, goodfella <> wrote:

    >
    > > > I have read the following Wikipedia article about cache coherrency:

    >
    > > >http://en.wikipedia.org/wiki/Cache_coherence

    > [snip]
    > > > So unless the architecture that my code is ran on fails to guarantee
    > > > cache coherency, I don't see how my code could suffer from cache
    > > > coherency problems.

    > > True. Which processors are those again? Are you fine writing non-
    > > portable code?

    >
    > http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520...
    > (read all...)


    That doesn't really answer the question. It gives a single example of
    one processor which is not cache coherent, nowhere near the listing of
    common (and uncommon) processors necessary to make an informed
    decision.

    "Which processors are those again?" was also intended as a rhetorical
    question. The intent was "Are you even sure on which platforms this
    will work?".
     
    Joshua Maurice, Oct 8, 2009
    #16
  17. "Joshua Maurice" <> wrote in message
    news:...
    On Oct 7, 7:13 pm, "Chris M. Thomasson" <> wrote:
    > > "Joshua Maurice" <> wrote in message
    > >
    > > news:...
    > > On Oct 7, 6:31 pm, goodfella <> wrote:
    > >
    > > > > On Oct 6, 6:23 pm, goodfella <> wrote:

    > >
    > > > > I have read the following Wikipedia article about cache coherrency:

    > >
    > > > >http://en.wikipedia.org/wiki/Cache_coherence

    > > [snip]
    > > > > So unless the architecture that my code is ran on fails to guarantee
    > > > > cache coherency, I don't see how my code could suffer from cache
    > > > > coherency problems.
    > > > True. Which processors are those again? Are you fine writing non-
    > > > portable code?

    > >
    > > http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520...
    > > (read all...)


    > That doesn't really answer the question. It gives a single example of
    > one processor which is not cache coherent, nowhere near the listing of
    > common (and uncommon) processors necessary to make an informed
    > decision.


    Here is what I take away from the thread:




    Local Cache-Coherency on a per-Socket/Node basis.

    Non-Cache-Coherent Socket/Node Netowrk; basically, message passing.




    BTW, you write:

    > It gives a single example of one processor which is not cache coherent


    IMHO, the information contained within the linked thread in `comp.arch'
    gives a simple example of per-socket/node cache coherency vs
    inter-socket/node NUMA. What happens if the memory for the volatile variable
    is on CPU E on SocketB, and the reader/mutator is on CPU X on SocketT?




    > "Which processors are those again?" was also intended as a rhetorical
    > question. The intent was "Are you even sure on which platforms this
    > will work?".


    One simple example. SiCortex systems. You have to use message passing (e.g.,
    MPI) for inter-socket/node comms, and you can use cache coherent shared
    memory for intra-socket/node comms. This is fairly standard setup. Cray?
     
    Chris M. Thomasson, Oct 8, 2009
    #17
  18. "Chris M. Thomasson" <> wrote in message
    news:hajj2l$fkj$...
    > "Joshua Maurice" <> wrote in message

    [...]
    >> "Which processors are those again?" was also intended as a rhetorical
    >> question. The intent was "Are you even sure on which platforms this
    >> will work?".



    Okay. I just wanted to mention that there are systems in which it (e.g.,
    simple volatile flag with no data-dependencies) will not work without using
    MPI or something. BTW, this setup is not uncommon. Advanced multi-core
    designs are getting more and more NUMA like. IMVHO, you have to use NUMA
    rules on advanced multi-core chips. Should CoreA really be communicating
    with CoreZ? Perhaps CoreA and CoreB has access to the same local memory
    bank.




    > One simple example. SiCortex systems. You have to use message passing
    > (e.g., MPI) for inter-socket/node comms, and you can use cache coherent
    > shared memory for intra-socket/node comms. This is fairly standard setup.
    > Cray?
     
    Chris M. Thomasson, Oct 8, 2009
    #18
  19. goodfella

    goodfella Guest

    On Oct 7, 7:18 pm, REH <> wrote:
    > On Oct 7, 9:31 pm, goodfella <> wrote:
    >
    > > On Oct 6, 6:23 pm, goodfella <> wrote:

    >
    > > I have read the following Wikipedia article about cache coherrency:

    >
    > >http://en.wikipedia.org/wiki/Cache_coherence

    >
    > > and it seems that cache coherency guarantees the following behavior on
    > > reads and writes of memory in the same location:

    >
    > You seem to be confused. Cache coherency is not something that is
    > guaranteed by the architecture. It is something your implementation
    > must guarantee in order to be deterministic, reliable, and portable.
    > You can make all the attempts at a logical argument you want, and run
    > tests as many times as you want, but unless it has the proper memory
    > barriers, you simply cannot guarantee your code is correct.
    >
    > I write real-time, embedded systems with safety critical requirements.
    > Without the proper atomic operations, your code would be rejected at
    > the first code review.
    >
    > REH


    So in general, the writes made by the producer are not guaranteed to
    propagate to the consumer, and writes made by the consumer are not
    guaranteed to propagate to the producer. A simpler demonstration of
    this problem is an example of how to use the volatile keyword. You
    have a variable which gets set to 1 by one thread, called the writer,
    while another thread, called the reader, is in a loop waiting for the
    variable to be equal to one. The use of volatile insures that the
    compiler will not optimize away checking the value of the variable
    because the value of the variable may be changed in ways unknown by
    the compiler. So in this example, the write performed by the writer
    thread is not guaranteed to propagate to the reader thread. Or in
    other words, the reader thread will never exit because it will never
    see the variable equal to 1.
     
    goodfella, Oct 8, 2009
    #19
  20. On 8 oct, 09:43, goodfella <> wrote:
    > On Oct 7, 7:18 pm, REH <> wrote:
    >
    >
    >
    > > On Oct 7, 9:31 pm, goodfella <> wrote:

    >
    > > > On Oct 6, 6:23 pm, goodfella <> wrote:

    >
    > > > I have read the following Wikipedia article about cache coherrency:

    >
    > > >http://en.wikipedia.org/wiki/Cache_coherence

    >
    > > > and it seems that cache coherency guarantees the following behavior on
    > > > reads and writes of memory in the same location:

    >
    > > You seem to be confused. Cache coherency is not something that is
    > > guaranteed by the architecture. It is something your implementation
    > > must guarantee in order to be deterministic, reliable, and portable.
    > > You can make all the attempts at a logical argument you want, and run
    > > tests as many times as you want, but unless it has the proper memory
    > > barriers, you simply cannot guarantee your code is correct.

    >
    > > I write real-time, embedded systems with safety critical requirements.
    > > Without the proper atomic operations, your code would be rejected at
    > > the first code review.

    >
    > So in general, the writes made by the producer are not guaranteed to
    > propagate to the consumer, and writes made by the consumer are not
    > guaranteed to propagate to the producer.  A simpler demonstration of
    > this problem is an example of how to use the volatile keyword.  You
    > have a variable which gets set to 1 by one thread, called the writer,
    > while another thread, called the reader, is in a loop waiting for the
    > variable to be equal to one.  The use of volatile insures that the
    > compiler will not optimize away checking the value of the variable
    > because the value of the variable may be changed in ways unknown by
    > the compiler.


    Just nitpicking:
    1. there is no formal guarantee either from C or C++: volatile
    primarily exists for hardware location, if a compiler can determine a
    pointer in in memory, it could in theory not apply the constraints of
    volatile qualification.
    2. In practice, it may do what you say but more importantly is the
    way it reorganizes the code, volatile operation should be executed as
    they appear in the code.

    >  So in this example, the write performed by the writer
    > thread is not guaranteed to propagate to the reader thread.


    Well, it should. A write to a volatile location should make a write to
    the location (each time, without optimising it away). It may cause
    cache incoherency if there is multiple write but at one point, it will
    be visible.

    The problem is more related to portability across compiler and how
    they intepreted 'volatile'.

    >  Or in
    > other words, the reader thread will never exit because it will never
    > see the variable equal to 1.


    I thought this article may be of interest:
    http://www.ddj.com/cpp/212701484

    --
    Michael
     
    Michael Doubez, Oct 8, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark McKay
    Replies:
    0
    Views:
    444
    Mark McKay
    Dec 9, 2003
  2. Buck Turgidson

    Simple Producer/Consumer Thread Question

    Buck Turgidson, Feb 17, 2004, in forum: Java
    Replies:
    5
    Views:
    538
    Tony Dahlman
    Feb 21, 2004
  3. Usenet Poster!!!
    Replies:
    4
    Views:
    1,830
    Eric Sosman
    Sep 30, 2004
  4. Jeff
    Replies:
    4
    Views:
    675
    xarax
    Oct 22, 2004
  5. Replies:
    0
    Views:
    1,083
Loading...

Share This Page