Memory access vs variable access

Discussion in 'C++' started by Gerhard Fiedler, Jun 24, 2008.

  1. Hello,

    I'm not sure whether this is a problem or not, or how to determine whether
    it is one.

    Say memory access (read and write) happens in 64-bit chunks, and I'm
    looking at 32-bit variables. This would mean that either some other
    variable is also written when writing a 32-bit variable (which means that
    all access to 32-bit variables is of the read-modify-write type, affecting
    some other variable also), or that all 32-bit variables are stored in their
    own 64-bit chunk.

    With single-threaded applications, that's a mere performance question. But
    with multi-threaded applications, there's no way I can imagine that would
    avoid the read-modify-write problems the first alternative would create, as
    it is nowhere defined what the other variable is that is also written -- so
    it can't be protected by a lock. Without it being protected by a lock,
    there's nothing that prevents a thread from altering it while it is in the
    middle of the read-modify-write cycle, which means that the end of it will
    overwrite the altered value with the old value.

    However, there must be a way to deal with this, otherwise multi-threaded
    applications in C++ wouldn't be possible.

    What am I missing?

    Thanks,
    Gerhard
     
    Gerhard Fiedler, Jun 24, 2008
    #1
    1. Advertising

  2. Gerhard Fiedler

    gpderetta Guest

    On Jun 24, 3:59 pm, Victor Bazarov <> wrote:
    > Gerhard Fiedler wrote:
    > > I'm not sure whether this is a problem or not, or how to determine whether
    > > it is one.

    >
    > > Say memory access (read and write) happens in 64-bit chunks, and I'm
    > > looking at 32-bit variables. This would mean that either some other
    > > variable is also written when writing a 32-bit variable (which means that
    > > all access to 32-bit variables is of the read-modify-write type, affecting
    > > some other variable also), or that all 32-bit variables are stored in their
    > > own 64-bit chunk.

    >
    > > With single-threaded applications, that's a mere performance question. But
    > > with multi-threaded applications, there's no way I can imagine that would
    > > avoid the read-modify-write problems the first alternative would create, as
    > > it is nowhere defined what the other variable is that is also written -- so
    > > it can't be protected by a lock. Without it being protected by a lock,
    > > there's nothing that prevents a thread from altering it while it is in the
    > > middle of the read-modify-write cycle, which means that the end of it will
    > > overwrite the altered value with the old value.

    >
    > > However, there must be a way to deal with this, otherwise multi-threaded
    > > applications in C++ wouldn't be possible.

    >
    > > What am I missing?

    >
    > The fact that C++ does not specify any of that, maybe.
    >


    But C++0x will. IIRC, accroding to the draft standard, an
    implementation is prohibited to do many kind of speculative writes
    (with the exception of bitfields) to locations that wouldn't be
    written unconditionally anyway (or something like that).

    If a specific architecture didn't allow 32 bit load/stores to 32 bit
    objects, it would require the implementation to pad every object to
    the smaller load/store granularity. Pretty much all common
    architectures allow access to memory at least at 8/16/32 bit
    granularity (except for DSPs I guess), so it is not a problem.

    Current compilers do not implement the rule above, but thread aware
    compilers approximate it well enough that, as long as you use correct
    locks, things work correctly *most of the time* (some compilers have
    been known to miscompile code which used trylocks for example).

    > Try 'comp.programming.threads' as your starting point since it's the
    > multi-threading that you're concerned about.  The problem does not seem
    > to be language-specific, and as such does not belong to a language
    > newsgroup.
    >


    Actually, discussing whether the next C++ standard prohibits
    speculative writes, is language specific and definitely on topic.

    --
    gpd
     
    gpderetta, Jun 24, 2008
    #2
    1. Advertising

  3. On 2008-06-24 11:50:26, gpderetta wrote:

    > On Jun 24, 3:59 pm, Victor Bazarov <> wrote:
    >> Gerhard Fiedler wrote:
    >>> I'm not sure whether this is a problem or not, or how to determine
    >>> whether it is one.
    >>>
    >>> Say memory access (read and write) happens in 64-bit chunks, and I'm
    >>> looking at 32-bit variables. This would mean that either some other
    >>> variable is also written when writing a 32-bit variable (which means
    >>> that all access to 32-bit variables is of the read-modify-write type,
    >>> affecting some other variable also), or that all 32-bit variables are
    >>> stored in their own 64-bit chunk.
    >>>
    >>> With single-threaded applications, that's a mere performance question.
    >>> But with multi-threaded applications, there's no way I can imagine
    >>> that would avoid the read-modify-write problems the first alternative
    >>> would create, as it is nowhere defined what the other variable is that
    >>> is also written -- so it can't be protected by a lock. Without it
    >>> being protected by a lock, there's nothing that prevents a thread from
    >>> altering it while it is in the middle of the read-modify-write cycle,
    >>> which means that the end of it will overwrite the altered value with
    >>> the old value.
    >>>
    >>> However, there must be a way to deal with this, otherwise
    >>> multi-threaded applications in C++ wouldn't be possible.
    >>>
    >>> What am I missing?

    >>
    >> The fact that C++ does not specify any of that, maybe.


    Just for the record: I didn't really miss that. I just thought that how a
    very common problem present in a sizable part of C++ applications is being
    handled across compilers and platforms is actually on topic in a group
    about the C++ language.

    > But C++0x will. IIRC, accroding to the draft standard, an implementation
    > is prohibited to do many kind of speculative writes (with the exception
    > of bitfields) to locations that wouldn't be written unconditionally
    > anyway (or something like that).
    >
    > If a specific architecture didn't allow 32 bit load/stores to 32 bit
    > objects, it would require the implementation to pad every object to the
    > smaller load/store granularity. Pretty much all common architectures
    > allow access to memory at least at 8/16/32 bit granularity (except for
    > DSPs I guess), so it is not a problem.


    Ah, I didn't know that. So on common hardware (maybe x86, x64, AMD, AMD64,
    IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is
    possible in byte granularity? Which then means that no common compiler
    would write to locations that are not the actual purpose of the write
    access?

    > Current compilers do not implement the rule above, but thread aware
    > compilers approximate it well enough that, as long as you use correct
    > locks, things work correctly *most of the time* (some compilers have
    > been known to miscompile code which used trylocks for example).


    Do you have any links about which compilers specifically don't create code
    that works correctly? One objective of mine is to be able to separate this
    "most of the time" into two clearly defined subsets, one of which works
    "all of the time" :)

    > Actually, discussing whether the next C++ standard prohibits
    > speculative writes, is language specific and definitely on topic.


    Is "speculative writes" the technical term for the situation I described?

    Thanks,
    Gerhard
     
    Gerhard Fiedler, Jun 24, 2008
    #3
  4. Gerhard Fiedler

    gpderetta Guest

    On Jun 24, 5:51 pm, Gerhard Fiedler <> wrote:
    > On 2008-06-24 11:50:26, gpderetta wrote:
    >
    > > If a specific architecture didn't allow 32 bit load/stores to 32 bit
    > > objects, it would require the implementation to pad every object to the
    > > smaller load/store granularity. Pretty much all common architectures
    > > allow access to memory at least at 8/16/32 bit granularity (except for
    > > DSPs I guess), so it is not a problem.

    >
    > Ah, I didn't know that. So on common hardware (maybe x86, x64, AMD, AMD64,
    > IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is
    > possible in byte granularity? Which then means that no common compiler
    > would write to locations that are not the actual purpose of the write
    > access?


    All x86 derivatives allow 8/16/32/64 access at any offset. I think
    both PowerPC and ARM allows access at any granularity as the access is
    properly aligned. IIRC very old Alphas only allowed accessing aligned
    32/64 bits (no byte access), but it got fixed because it was extremely
    inconvenient. I do not know about IA-64, MIPS, SPARC and PA-RISC, but
    I would be extremely surprised if they didn't.

    >
    > > Current compilers do not implement the rule above, but thread aware
    > > compilers approximate it well enough that, as long as you use correct
    > > locks, things work correctly *most of the time* (some compilers have
    > > been known to miscompile code which used trylocks for example).

    >
    > Do you have any links about which compilers specifically don't create code
    > that works correctly? One objective of mine is to be able to separate this
    > "most of the time" into two clearly defined subsets, one of which works
    > "all of the time" :)
    >


    Many in corner cases do. Usually these are considered bugs and are
    fixed when they are encountered.
    See for example http://www.airs.com/blog/archives/79

    > > Actually, discussing whether the next C++ standard prohibits
    > > speculative writes, is language specific and definitely on topic.

    >
    > Is "speculative writes" the technical term for the situation I described?
    >


    I'm not sure if it applies to this example. I think that "speculative
    store" is defined as the motion of a store outside of its position in
    program order (usually sinking it outside of loops or branches). It
    doesn't take much to generalize the concept to that of the *addition*
    of a store not present in the original program (i.e. adjacent fields
    overwrites).

    For details see "Concurrency memory model compiler consequences" by
    Hans Bohem:

    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html

    HTH,

    --
    gpd
     
    gpderetta, Jun 24, 2008
    #4
  5. Gerhard Fiedler

    Guest

    On Jun 24, 7:50 am, gpderetta <> wrote:
    > On Jun 24, 3:59 pm, Victor Bazarov <> wrote:


    > > The fact that C++ does not specify any of that, maybe.

    >
    > But C++0x will.


    A search on "hans boehm c++ memory model" should bring further
    information on that. Including videos of Hans Boehm's presentations on
    the topic.

    Here is a start:

    http://www.hpl.hp.com/personal/Hans_Boehm/c mm/

    Ali
     
    , Jun 24, 2008
    #5
  6. Gerhard Fiedler

    James Kanze Guest

    On Jun 24, 3:48 pm, Gerhard Fiedler <> wrote:

    > I'm not sure whether this is a problem or not, or how to
    > determine whether it is one.


    It's potentially one.

    > Say memory access (read and write) happens in 64-bit chunks,
    > and I'm looking at 32-bit variables. This would mean that
    > either some other variable is also written when writing a
    > 32-bit variable (which means that all access to 32-bit
    > variables is of the read-modify-write type, affecting some
    > other variable also), or that all 32-bit variables are stored
    > in their own 64-bit chunk.


    > With single-threaded applications, that's a mere performance
    > question. But with multi-threaded applications, there's no way
    > I can imagine that would avoid the read-modify-write problems
    > the first alternative would create, as it is nowhere defined
    > what the other variable is that is also written -- so it can't
    > be protected by a lock. Without it being protected by a lock,
    > there's nothing that prevents a thread from altering it while
    > it is in the middle of the read-modify-write cycle, which
    > means that the end of it will overwrite the altered value with
    > the old value.


    > However, there must be a way to deal with this, otherwise
    > multi-threaded applications in C++ wouldn't be possible.


    Most hardware provides for single byte writes (even when the
    read is always 64 bits), and takes care that it works correctly.
    From what I understand, this wasn't the case on some early DEC
    Alphas, and it certainly wasn't the case on many older
    platforms, where when you wrote a byte, the hardware would read
    a word, and rewrite it.

    The upcoming version of the standard will address this problem;
    if nothing changes, it will require that *most* accesses to a
    single "object" work. (The major exception is bit fields. If
    you access an object that is declared as a bit field, and any
    other thread may modify any object in the containing class, you
    need to explicitly synchronize.) Implementations for processors
    where the hardware doesn't support this have their work cut out
    for them (but better them than us), and byte accesses on such
    implementations are likely to be very slow.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 24, 2008
    #6
  7. On 2008-06-24 18:17:52, James Kanze wrote:

    >> Say memory access (read and write) happens in 64-bit chunks, and I'm
    >> looking at 32-bit variables. This would mean that either some other
    >> variable is also written when writing a 32-bit variable (which means
    >> that all access to 32-bit variables is of the read-modify-write type,
    >> affecting some other variable also), or that all 32-bit variables are
    >> stored in their own 64-bit chunk.
    >>
    >> With single-threaded applications, that's a mere performance question.
    >> But with multi-threaded applications, there's no way I can imagine that
    >> would avoid the read-modify-write problems the first alternative would
    >> create, as it is nowhere defined what the other variable is that is
    >> also written -- so it can't be protected by a lock. Without it being
    >> protected by a lock, there's nothing that prevents a thread from
    >> altering it while it is in the middle of the read-modify-write cycle,
    >> which means that the end of it will overwrite the altered value with
    >> the old value.
    >>
    >> However, there must be a way to deal with this, otherwise
    >> multi-threaded applications in C++ wouldn't be possible.

    >
    > Most hardware provides for single byte writes (even when the read is
    > always 64 bits), and takes care that it works correctly.


    What I find a bit disconcerting is that it seems so difficult to find out
    whether a given hardware actually does this. Reality seems to confirm that
    it actually is "most" (or otherwise "most" programs would probably crash a
    lot more than they do), but I haven't found any documentation about any
    specific guarantees of specific compilers on specific platforms. (I'm
    mainly interested in VC++ and gcc.) Does somebody have any pointers for me?

    Thanks,
    Gerhard
     
    Gerhard Fiedler, Jun 24, 2008
    #7
  8. Gerhard Fiedler

    Jerry Coffin Guest

    In article <1om696gj5nba5$>,
    says...

    [ ... ]

    > What I find a bit disconcerting is that it seems so difficult to find out
    > whether a given hardware actually does this. Reality seems to confirm that
    > it actually is "most" (or otherwise "most" programs would probably crash a
    > lot more than they do), but I haven't found any documentation about any
    > specific guarantees of specific compilers on specific platforms. (I'm
    > mainly interested in VC++ and gcc.) Does somebody have any pointers for me?


    There are a number of problems with that. The first is that when you get
    to exotic multiprocessors, a lot of ideas have been tried, and even
    though only a few have really gained much popularity, there are still
    some that bend almost any rule you'd like to make.

    Another problem is that even on a given piece of hardware, the behavior
    can be less predictable than you'd generally like. For example, recent
    versions of the Intel x86 processors all have Memory Type and Range
    Registers (MTRRs). Using an MTRR, one can adjust the behavior of memory
    writes individually for ranges of memory. You can get write-back
    caching, write-through caching, write combining, or no caching at all --
    all on the same machine at the same time for different ranges of memory.

    Also keep in mind that most modern computers use caching. In a typical
    case, any read from or write to main memory happens an entire cache line
    at a time. Bookkeeping is also done on the basis of entire cache lines,
    so the processor doesn't care how many bits in a cache line have been
    modified -- from its viewpoint, the cache line as a whole is either
    modified or not. If, for example, another processor attempts to read
    memory that falls in that cache line, the entire line is written to
    memory before the other processor can read it. Even if the two are
    entirely disjoint, if they fall in the same cache line, the processor
    treats them as a unit.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jun 24, 2008
    #8
  9. Gerhard Fiedler

    James Kanze Guest

    On Jun 25, 12:53 am, Jerry Coffin <> wrote:
    > In article <1om696gj5nba5$>,
    > says...


    > [ ... ]


    > > What I find a bit disconcerting is that it seems so
    > > difficult to find out whether a given hardware actually does
    > > this. Reality seems to confirm that it actually is "most"
    > > (or otherwise "most" programs would probably crash a lot
    > > more than they do), but I haven't found any documentation
    > > about any specific guarantees of specific compilers on
    > > specific platforms. (I'm mainly interested in VC++ and gcc.)
    > > Does somebody have any pointers for me?


    It depends mostly on the hardware architecture, not the
    compiler. The compiler will generate byte, half-word, etc. load
    and store machine instructions (assuming they exist, of course);
    the problem is what the hardware does with them.

    For Sparc architecture, see
    http://www.sparc.org/specificationsDocuments.html. I presume
    that other architecture providers (e.g. Intel, AMD, etc.) have
    similar pages.

    [...]
    > Also keep in mind that most modern computers use caching. In a
    > typical case, any read from or write to main memory happens an
    > entire cache line at a time. Bookkeeping is also done on the
    > basis of entire cache lines, so the processor doesn't care how
    > many bits in a cache line have been modified -- from its
    > viewpoint, the cache line as a whole is either modified or
    > not. If, for example, another processor attempts to read
    > memory that falls in that cache line, the entire line is
    > written to memory before the other processor can read it. Even
    > if the two are entirely disjoint, if they fall in the same
    > cache line, the processor treats them as a unit.


    That's true to a point. Most modern architectures also ensure
    cache coherence at the hardware level: if one thread writes to
    the first byte in a cache line, and a different thread (on a
    different core) writes to the second byte, the hardware will
    ensure that both writes eventually end up in main memory; that
    the write back of the cache line from one core won't overwrite
    the changes made by the other core.

    This issue was discussed in detail by the committee; in the end,
    it was decided that given something like:

    struct S { char a; char b; } ;
    or
    char a[2] ;

    one thread could modify S::a or a[0], and the other S::b or
    a[1], without any explicit synchronization, and the compiler had
    to make it work. This was accepted because in fact, just
    emitting store byte instructions is sufficient for all of the
    current architectures.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 25, 2008
    #9
  10. On 2008-06-25 04:58:41, James Kanze wrote:

    >>> What I find a bit disconcerting is that it seems so difficult to find
    >>> out whether a given hardware actually does this. Reality seems to
    >>> confirm that it actually is "most" (or otherwise "most" programs would
    >>> probably crash a lot more than they do), but I haven't found any
    >>> documentation about any specific guarantees of specific compilers on
    >>> specific platforms. (I'm mainly interested in VC++ and gcc.) Does
    >>> somebody have any pointers for me?

    >
    > It depends mostly on the hardware architecture, not the compiler. The
    > compiler will generate byte, half-word, etc. load and store machine
    > instructions (assuming they exist, of course); the problem is what the
    > hardware does with them.
    >
    > For Sparc architecture, see http://www.sparc.org/specificationsDocuments.html.
    > I presume that other architecture providers (e.g. Intel, AMD, etc.)
    > have similar pages.


    Thanks. I thought that it would also depend on how the compiler generates
    the code, but I guess you're right in assuming that any (halfway decent)
    compiler will generate 8-bit writes for 8-bit variables if that is possible
    :)

    >> Also keep in mind that most modern computers use caching. In a typical
    >> case, any read from or write to main memory happens an entire cache
    >> line at a time. Bookkeeping is also done on the basis of entire cache
    >> lines, so the processor doesn't care how many bits in a cache line have
    >> been modified -- from its viewpoint, the cache line as a whole is
    >> either modified or not. If, for example, another processor attempts to
    >> read memory that falls in that cache line, the entire line is written
    >> to memory before the other processor can read it. Even if the two are
    >> entirely disjoint, if they fall in the same cache line, the processor
    >> treats them as a unit.

    >
    > That's true to a point. Most modern architectures also ensure cache
    > coherence at the hardware level: if one thread writes to the first byte
    > in a cache line, and a different thread (on a different core) writes to
    > the second byte, the hardware will ensure that both writes eventually
    > end up in main memory; that the write back of the cache line from one
    > core won't overwrite the changes made by the other core.


    Taken all this together, it seems that on "most modern architectures" cache
    coherency is mostly guaranteed by the hardware, and for example it is not
    necessary to use memory barriers or locks for access to volatile boolean
    variables that are only read or written (never using a read-modify-write
    cycle). Is this correct? What is all this talk about different threads
    seeing values out of order about, if the cache coherency is maintained by
    the hardware in this way?

    Gerhard
     
    Gerhard Fiedler, Jun 25, 2008
    #10
  11. Gerhard Fiedler

    gpderetta Guest

    On Jun 25, 3:44 pm, Gerhard Fiedler <> wrote:
    <snip>
    > Taken all this together, it seems that on "most modern architectures" cache
    > coherency is mostly guaranteed by the hardware, and for example it is not
    > necessary to use memory barriers or locks for access to volatile boolean
    > variables that are only read or written (never using a read-modify-write
    > cycle). Is this correct? What is all this talk about different threads
    > seeing values out of order about, if the cache coherency is maintained by
    > the hardware in this way?


    Cache coherency is not the only part of a system that can reorder load
    and stores. Write buffers and OoO machinery are also responsible. Even
    x86 which has an otherwise fairly strong memory model, requires for
    example StoreLoad memory barriers (i.e. mfence or locked operations).

    So, AFAIK the answer is no: in general, and for most compilers, even
    volatile is not enough.

    --
    gpd
     
    gpderetta, Jun 25, 2008
    #11
  12. Gerhard Fiedler

    James Kanze Guest

    On Jun 25, 3:44 pm, Gerhard Fiedler <> wrote:
    > On 2008-06-25 04:58:41, James Kanze wrote:

    [...]
    > > For Sparc architecture,
    > > seehttp://www.sparc.org/specificationsDocuments.html. I
    > > presume that other architecture providers (e.g. Intel, AMD,
    > > etc.) have similar pages.


    > Thanks. I thought that it would also depend on how the
    > compiler generates the code, but I guess you're right in
    > assuming that any (halfway decent) compiler will generate
    > 8-bit writes for 8-bit variables if that is possible :)


    Well, it would be nice if they'd document it. But in practice,
    I don't worry too much about a compiler generating code to load
    a word, change one byte of it, and then storing it, if the
    hardware has a single instruction byte store.

    > >> Also keep in mind that most modern computers use caching.
    > >> In a typical case, any read from or write to main memory
    > >> happens an entire cache line at a time. Bookkeeping is also
    > >> done on the basis of entire cache lines, so the processor
    > >> doesn't care how many bits in a cache line have been
    > >> modified -- from its viewpoint, the cache line as a whole
    > >> is either modified or not. If, for example, another
    > >> processor attempts to read memory that falls in that cache
    > >> line, the entire line is written to memory before the other
    > >> processor can read it. Even if the two are entirely
    > >> disjoint, if they fall in the same cache line, the
    > >> processor treats them as a unit.


    > > That's true to a point. Most modern architectures also
    > > ensure cache coherence at the hardware level: if one thread
    > > writes to the first byte in a cache line, and a different
    > > thread (on a different core) writes to the second byte, the
    > > hardware will ensure that both writes eventually end up in
    > > main memory; that the write back of the cache line from one
    > > core won't overwrite the changes made by the other core.


    > Taken all this together, it seems that on "most modern
    > architectures" cache coherency is mostly guaranteed by the
    > hardware, and for example it is not necessary to use memory
    > barriers or locks for access to volatile boolean variables
    > that are only read or written (never using a read-modify-write
    > cycle). Is this correct? What is all this talk about different
    > threads seeing values out of order about, if the cache
    > coherency is maintained by the hardware in this way?


    Several things. The first, of course, is what we've just been
    talking about only concerns a single cache line; the hardware
    might not be so careful between cache lines (which results in
    multiple physical writes). But the real reason is that reads
    and writes, even to the cache, are pipelined in the processor
    itself, and can be reordered in the pipeline. Thus, for
    example, if we suppose two int's, i and j, both initially 0, and
    one processor executes:

    store #1, i
    store #1, j

    a second processor can still see the condition i==0, j==1,
    because either the first processor has reordered the writes
    (because of pipeline considerations), or because the second
    recognized that it already had a read of the cache line with j
    in its pipeline, and used the results of that read for j.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 25, 2008
    #12
  13. Gerhard Fiedler

    Jerry Coffin Guest

    In article <>,
    says...

    [ ... ]

    > Taken all this together, it seems that on "most modern architectures" cache
    > coherency is mostly guaranteed by the hardware, and for example it is not
    > necessary to use memory barriers or locks for access to volatile boolean
    > variables that are only read or written (never using a read-modify-write
    > cycle). Is this correct? What is all this talk about different threads
    > seeing values out of order about, if the cache coherency is maintained by
    > the hardware in this way?


    Yes and no. The hardware normally ensures coherency for a single
    variable -- but it doesn't know anything about the relationships you've
    established between variables. For example, assume a really simple
    situation where you have some data and a bool to tell when the data is
    valid:

    struct whatever {
    int data1;
    float data2;
    bool valid;
    public:
    whatever() : valid(false) {}
    } thing;

    If you have code like:

    thing.data1 = 1;
    thing.data2 = 2.0f;
    thing.valid = true;

    The hardware will assure that when a write has taken place to any of the
    variables, any other core looking at the memory location of that
    variable will see the value that was written.

    Now, we don't care at all about the relative order in which data1 and
    data2 are written -- whichever way the hardware can do it the fastest is
    fine by us. BUT we need to assure that 'valid' is only see as true AFTER
    the values have been written to both data1 and data2.

    The hardware doesn't know this on its own. It just sees three separate
    assignments to three separate variables. As such, the programmer needs
    to "inform" the hardware about the relationship involved.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jun 25, 2008
    #13
  14. On 2008-06-25 14:56:16, Jerry Coffin wrote:

    >> Taken all this together, it seems that on "most modern architectures"
    >> cache coherency is mostly guaranteed by the hardware, and for example
    >> it is not necessary to use memory barriers or locks for access to
    >> volatile boolean variables that are only read or written (never using a
    >> read-modify-write cycle). Is this correct? What is all this talk about
    >> different threads seeing values out of order about, if the cache
    >> coherency is maintained by the hardware in this way?

    >
    > Yes and no. [Lots of useful stuff snipped.]


    Thanks to all who responded in this thread. It has helped me a good deal in
    understanding what I can rely on and what not.

    Gerhard
     
    Gerhard Fiedler, Jun 25, 2008
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mahesh Prasad
    Replies:
    1
    Views:
    723
    Tom Wells
    Feb 22, 2004
  2. Cy Huckaba
    Replies:
    1
    Views:
    1,160
    Xie Xiao
    Jun 26, 2003
  3. Julián Sanz García

    RAM Memory or virual memory

    Julián Sanz García, Nov 12, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    909
    Julián Sanz García
    Nov 12, 2004
  4. mfglinux
    Replies:
    11
    Views:
    726
    Roberto Bonvallet
    Sep 12, 2007
  5. David Filmer
    Replies:
    19
    Views:
    257
    Kevin Collins
    May 21, 2004
Loading...

Share This Page