Re: multithreaded cache?

Discussion in 'Java' started by Robert Klemme, May 17, 2012.

  1. On 05/15/2012 11:14 AM, bugbear wrote:
    > However, if the underlying function is slow
    > and/or resource hungry (consider cacheing
    > a ray traced image!) many threads can
    > end up calling the real function (second
    > and subsequent threads to the first get a miss
    > during the first threads call to the underlying function).
    >
    > "obviously..." what I want is for only
    > the thread that FIRST has a cache miss
    > calls the underlying function, whilst other
    > threads (for the same key) wait.


    I provide a variant of Silvio's, Eric's and Daniel's solution which
    should yield higher throughput because it works without read write
    locking. You can find it as gist in case the code is garbled in the
    newsgroup posting:
    https://gist.github.com/2717818

    Kind regards

    robert


    The code (untested):

    package clj;

    import java.util.concurrent.ConcurrentHashMap;
    import java.util.concurrent.ConcurrentMap;

    /**
    * The cache works with as few locking as possible. Lookup is done in
    two steps
    * on cache miss:
    * <ol>
    * <li>On a cache miss a retriever is inserted into the cache which
    will obtain
    * the value synchronized from a {@link Calculator}.</li>
    * <li>Once calculation has finished a simple lock free reference to
    the value
    * replaces the retriever in the cache and the value is returned.</li>
    * </ol>
    *
    * @author robert klemme
    *
    * @param <K>
    * key type
    * @param <V>
    * value type
    */
    public final class LazyCache<K, V> {
    /**
    * Calculate values from given keys.
    *
    * @param <K>
    * key type
    * @param <V>
    * value type
    */
    public interface Calculator<K, V> {
    V get(K key);
    }

    /**
    * Obtain a value.
    *
    * @param <V>
    * value type.
    */
    private interface Reference<V> {
    V get();
    }

    /**
    * Stupid simple reference which only hands out a fixed value all
    the time
    * without synchronization.
    *
    * @param <V>
    * value type.
    */
    private static final class Ref<V> implements Reference<V> {
    private final V val;

    public Ref(V val) {
    this.val = val;
    }

    @Override
    public V get() {
    return val;
    }
    }

    /** Mapping from keys to objects which yield values. */
    private final ConcurrentMap<K, Reference<V>> map = new
    ConcurrentHashMap<K, Reference<V>>();

    /** User provided. */
    private final Calculator<K, V> calc;

    /**
    * Create a cache.
    *
    * @param calc
    * user must provide a reasonable implementation, not
    * <code>null</code>.
    */
    public LazyCache(final Calculator<K, V> calc) {
    if (calc == null)
    throw new NullPointerException();
    this.calc = calc;
    }

    /**
    * Get a value from the cache. The value might have to be calculated.
    *
    * @param key
    * lookup key.
    * @return value, might even be <code>null</code> depending on
    algorithm.
    */
    public V get(final K key) {
    Reference<V> ref = map.get(key);

    if (ref == null) {
    // miss
    ref = new Reference<V>() {
    @Override
    public synchronized V get() {
    final V val = calc.get(key);
    // next time lock free access:
    Reference<V> x = map.put(key, new Ref<V>(val));
    assert x == this;
    return val;
    }
    };

    final Reference<V> old = map.putIfAbsent(key, ref);

    if (old != null)
    ref = old; // someone else was faster
    }

    return ref.get();
    }
    }
    Robert Klemme, May 17, 2012
    #1
    1. Advertising

  2. On 05/17/2012 11:54 AM, Robert Klemme wrote:
    > On 05/15/2012 11:14 AM, bugbear wrote:
    >> However, if the underlying function is slow
    >> and/or resource hungry (consider cacheing
    >> a ray traced image!) many threads can
    >> end up calling the real function (second
    >> and subsequent threads to the first get a miss
    >> during the first threads call to the underlying function).
    >>
    >> "obviously..." what I want is for only
    >> the thread that FIRST has a cache miss
    >> calls the underlying function, whilst other
    >> threads (for the same key) wait.

    >
    > I provide a variant of Silvio's, Eric's and Daniel's solution which
    > should yield higher throughput because it works without read write
    > locking. You can find it as gist in case the code is garbled in the
    > newsgroup posting:
    > https://gist.github.com/2717818


    There was one important detail missing. This is the corrected code
    (gist is updated as well):

    package clj;

    import java.util.concurrent.ConcurrentHashMap;
    import java.util.concurrent.ConcurrentMap;

    /**
    * The cache works with as few locking as possible. Lookup is done in
    two steps
    * on cache miss:
    * <ol>
    * <li>On a cache miss a retriever is inserted into the cache which
    will obtain
    * the value synchronized from a {@link Calculator}.</li>
    * <li>Once calculation has finished a simple lock free reference to
    the value
    * replaces the retriever in the cache and the value is returned.</li>
    * </ol>
    *
    * @author robert klemme
    *
    * @param <K>
    * key type
    * @param <V>
    * value type
    */
    public final class LazyCache<K, V> {
    /**
    * Calculate values from given keys.
    *
    * @param <K>
    * key type
    * @param <V>
    * value type
    */
    public interface Calculator<K, V> {
    V get(K key);
    }

    /**
    * Obtain a value.
    *
    * @param <V>
    * value type.
    */
    private interface Reference<V> {
    V get();
    }

    /**
    * Stupid simple reference which only hands out a fixed value all
    the time
    * without synchronization.
    *
    * @param <V>
    * value type.
    */
    private static final class Ref<V> implements Reference<V> {
    private final V val;

    public Ref(V val) {
    this.val = val;
    }

    @Override
    public V get() {
    return val;
    }
    }

    /** Mapping from keys to objects which yield values. */
    private final ConcurrentMap<K, Reference<V>> map = new
    ConcurrentHashMap<K, Reference<V>>();

    /** User provided. */
    private final Calculator<K, V> calc;

    /**
    * Create a cache.
    *
    * @param calc
    * user must provide a reasonable implementation, not
    * <code>null</code>.
    */
    public LazyCache(final Calculator<K, V> calc) {
    if (calc == null)
    throw new NullPointerException();
    this.calc = calc;
    }

    /**
    * Get a value from the cache. The value might have to be calculated.
    *
    * @param key
    * lookup key.
    * @return value, might even be <code>null</code> depending on
    algorithm.
    */
    public V get(final K key) {
    Reference<V> ref = map.get(key);

    if (ref == null) {
    // miss
    ref = new Reference<V>() {
    private V val;
    private boolean here = false; // superfluous but explicit

    @Override
    public synchronized V get() {
    if (!here) {
    val = calc.get(key);
    here = true;
    // next time lock free access:
    Reference<V> x = map.put(key, new Ref<V>(val));
    assert x == this;
    }

    return val;
    }
    };

    final Reference<V> old = map.putIfAbsent(key, ref);

    if (old != null)
    ref = old; // someone else was faster
    }

    return ref.get();
    }
    }
    Robert Klemme, May 17, 2012
    #2
    1. Advertising

  3. Robert Klemme

    Daniel Pitts Guest

    On 5/17/12 3:06 AM, Robert Klemme wrote:
    > On 05/17/2012 11:54 AM, Robert Klemme wrote:
    >> On 05/15/2012 11:14 AM, bugbear wrote:
    >>> However, if the underlying function is slow
    >>> and/or resource hungry (consider cacheing
    >>> a ray traced image!) many threads can
    >>> end up calling the real function (second
    >>> and subsequent threads to the first get a miss
    >>> during the first threads call to the underlying function).
    >>>
    >>> "obviously..." what I want is for only
    >>> the thread that FIRST has a cache miss
    >>> calls the underlying function, whilst other
    >>> threads (for the same key) wait.

    >>
    >> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >> should yield higher throughput because it works without read write
    >> locking. You can find it as gist in case the code is garbled in the
    >> newsgroup posting:
    >> https://gist.github.com/2717818

    >
    > There was one important detail missing. This is the corrected code (gist
    > is updated as well):

    [snip]
    > if (!here) {
    > val = calc.get(key);
    > here = true;
    > // next time lock free access:
    > Reference<V> x = map.put(key, new Ref<V>(val));
    > assert x == this;
    > }

    This assert (while true) is unnecessary and potentially harmful.
    Nothing becomes inconsistent if the value you replace is not the
    calculating reference. However, your assert may break at runtime if
    something else changes about this code. Assert should be used to
    validate consistency.

    I like this class, though I have a couple of style comments.

    I personally would name "Ref" as "ConstantReference", and your anonymous
    Reference implementation I would make a private static class
    CalculatingReference.

    In my own personal library of utilities, I actually have two interfaces
    "Factory<T>" with a "T create()" method and "ParameterizedFactory<T,P>"
    with a "T create(P parameter)". Including a few interesting
    implementations of those. In my case, I would probably implement this
    cache as a ParameterizedFactory impl that takes another
    ParameterizedFactory as the "calc" field.
    Daniel Pitts, May 17, 2012
    #3
  4. On 17.05.2012 18:51, Daniel Pitts wrote:
    > On 5/17/12 3:06 AM, Robert Klemme wrote:
    >> On 05/17/2012 11:54 AM, Robert Klemme wrote:
    >>> On 05/15/2012 11:14 AM, bugbear wrote:
    >>>> However, if the underlying function is slow
    >>>> and/or resource hungry (consider cacheing
    >>>> a ray traced image!) many threads can
    >>>> end up calling the real function (second
    >>>> and subsequent threads to the first get a miss
    >>>> during the first threads call to the underlying function).
    >>>>
    >>>> "obviously..." what I want is for only
    >>>> the thread that FIRST has a cache miss
    >>>> calls the underlying function, whilst other
    >>>> threads (for the same key) wait.
    >>>
    >>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>> should yield higher throughput because it works without read write
    >>> locking. You can find it as gist in case the code is garbled in the
    >>> newsgroup posting:
    >>> https://gist.github.com/2717818

    >>
    >> There was one important detail missing. This is the corrected code (gist
    >> is updated as well):

    > [snip]
    >> if (!here) {
    >> val = calc.get(key);
    >> here = true;
    >> // next time lock free access:
    >> Reference<V> x = map.put(key, new Ref<V>(val));
    >> assert x == this;
    >> }

    > This assert (while true) is unnecessary and potentially harmful. Nothing
    > becomes inconsistent if the value you replace is not the calculating
    > reference.


    The algorithm works under the assumption that the first element placed
    into the map is the one doing the calculation and only after that the
    constant reference is inserted. If that is not the case (e.g. because
    another calculator is put into the map) then obviously something must be
    wrong and we might even have more than one lengthy calculation of the
    cached value. This assumption is checked with the assert which also
    serves as documentation.

    > However, your assert may break at runtime if something else
    > changes about this code. Assert should be used to validate consistency.


    Well, if the logic is changed then of course asserts need to change,
    too. There is nothing strange or harmful about that. With your
    argument no asserts would make sense because they would need to be
    changed if a class is changed. Note that internal class invariants can
    change without changing the observable invariant. So even though a
    class "looks" the same and behaves the same from the outside asserts
    might have to be changed if internal logic of the class changes.

    > I like this class, though I have a couple of style comments.


    Thank you!

    > I personally would name "Ref" as "ConstantReference",


    Well, I was lazy typing and the class isn't public anyway. This is not
    production code but an illustrating example for cljp.

    > and your anonymous
    > Reference implementation I would make a private static class
    > CalculatingReference.


    Why? You would need to either pass in a reference to LazyCache or pass
    map and calc - plus declaring a constructor and type parameters. In
    this case I don't really see the benefit.

    > In my own personal library of utilities, I actually have two interfaces
    > "Factory<T>" with a "T create()" method and "ParameterizedFactory<T,P>"
    > with a "T create(P parameter)". Including a few interesting
    > implementations of those. In my case, I would probably implement this
    > cache as a ParameterizedFactory impl that takes another
    > ParameterizedFactory as the "calc" field.


    That could be done. But then method naming could be considered
    misleading because the create() of the cache isn't really a create in
    all cases.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 17, 2012
    #4
  5. On 05/17/2012 11:54 AM, Robert Klemme wrote:
    > On 05/15/2012 11:14 AM, bugbear wrote:
    >> However, if the underlying function is slow
    >> and/or resource hungry (consider cacheing
    >> a ray traced image!) many threads can
    >> end up calling the real function (second
    >> and subsequent threads to the first get a miss
    >> during the first threads call to the underlying function).
    >>
    >> "obviously..." what I want is for only
    >> the thread that FIRST has a cache miss
    >> calls the underlying function, whilst other
    >> threads (for the same key) wait.

    >
    > I provide a variant of Silvio's, Eric's and Daniel's solution which
    > should yield higher throughput because it works without read write
    > locking. You can find it as gist in case the code is garbled in the
    > newsgroup posting:
    > https://gist.github.com/2717818
    >
    > Kind regards
    >
    > robert
    >
    >
    > The code (untested):
    >
    > package clj;
    >
    > import java.util.concurrent.ConcurrentHashMap;
    > import java.util.concurrent.ConcurrentMap;
    >
    > /**
    > * The cache works with as few locking as possible. Lookup is done in two
    > steps
    > * on cache miss:
    > * <ol>
    > * <li>On a cache miss a retriever is inserted into the cache which will
    > obtain
    > * the value synchronized from a {@link Calculator}.</li>
    > * <li>Once calculation has finished a simple lock free reference to the
    > value
    > * replaces the retriever in the cache and the value is returned.</li>
    > * </ol>
    > *
    > * @author robert klemme
    > *
    > * @param <K>
    > * key type
    > * @param <V>
    > * value type
    > */
    > public final class LazyCache<K, V> {
    > /**
    > * Calculate values from given keys.
    > *
    > * @param <K>
    > * key type
    > * @param <V>
    > * value type
    > */
    > public interface Calculator<K, V> {
    > V get(K key);
    > }
    >
    > /**
    > * Obtain a value.
    > *
    > * @param <V>
    > * value type.
    > */
    > private interface Reference<V> {
    > V get();
    > }
    >
    > /**
    > * Stupid simple reference which only hands out a fixed value all the time
    > * without synchronization.
    > *
    > * @param <V>
    > * value type.
    > */
    > private static final class Ref<V> implements Reference<V> {
    > private final V val;
    >
    > public Ref(V val) {
    > this.val = val;
    > }
    >
    > @Override
    > public V get() {
    > return val;
    > }
    > }
    >
    > /** Mapping from keys to objects which yield values. */
    > private final ConcurrentMap<K, Reference<V>> map = new
    > ConcurrentHashMap<K, Reference<V>>();
    >
    > /** User provided. */
    > private final Calculator<K, V> calc;
    >
    > /**
    > * Create a cache.
    > *
    > * @param calc
    > * user must provide a reasonable implementation, not
    > * <code>null</code>.
    > */
    > public LazyCache(final Calculator<K, V> calc) {
    > if (calc == null)
    > throw new NullPointerException();
    > this.calc = calc;
    > }
    >
    > /**
    > * Get a value from the cache. The value might have to be calculated.
    > *
    > * @param key
    > * lookup key.
    > * @return value, might even be <code>null</code> depending on algorithm.
    > */
    > public V get(final K key) {
    > Reference<V> ref = map.get(key);
    >
    > if (ref == null) {
    > // miss
    > ref = new Reference<V>() {
    > @Override
    > public synchronized V get() {
    > final V val = calc.get(key);
    > // next time lock free access:
    > Reference<V> x = map.put(key, new Ref<V>(val));
    > assert x == this;
    > return val;
    > }
    > };
    >
    > final Reference<V> old = map.putIfAbsent(key, ref);
    >
    > if (old != null)
    > ref = old; // someone else was faster
    > }
    >
    > return ref.get();
    > }
    > }



    I think you have as many locks as I suggested (being one)? My initial
    implementations of something like this used a plain map with an extra
    lock but later cases used the by then available ConcurrentHashMap as
    well, making one lock redundant.

    Silvio
    Silvio Bierman, May 18, 2012
    #5
  6. On 18.05.2012 20:42, Silvio Bierman wrote:
    > On 05/17/2012 11:54 AM, Robert Klemme wrote:


    >> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >> should yield higher throughput because it works without read write
    >> locking. You can find it as gist in case the code is garbled in the
    >> newsgroup posting:
    >> https://gist.github.com/2717818


    > I think you have as many locks as I suggested (being one)? My initial
    > implementations of something like this used a plain map with an extra
    > lock but later cases used the by then available ConcurrentHashMap as
    > well, making one lock redundant.


    You didn't show it here, did you? I can's seem to find it in the
    thread. Note that CHM does also do synchronization. I am not sure from
    your statement what exact locking scheme you apply. There does seem to
    be one difference though: I my version the second lock goes away after
    the value has been computed so there is only the sync of CHM left.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 18, 2012
    #6
  7. Robert Klemme

    Eric Sosman Guest

    On 5/18/2012 5:45 PM, Robert Klemme wrote:
    > On 18.05.2012 20:42, Silvio Bierman wrote:
    >> On 05/17/2012 11:54 AM, Robert Klemme wrote:

    >
    >>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>> should yield higher throughput because it works without read write
    >>> locking. You can find it as gist in case the code is garbled in the
    >>> newsgroup posting:
    >>> https://gist.github.com/2717818

    >
    >> I think you have as many locks as I suggested (being one)? My initial
    >> implementations of something like this used a plain map with an extra
    >> lock but later cases used the by then available ConcurrentHashMap as
    >> well, making one lock redundant.

    >
    > You didn't show it here, did you? I can's seem to find it in the thread.
    > Note that CHM does also do synchronization. I am not sure from your
    > statement what exact locking scheme you apply. There does seem to be one
    > difference though: I my version the second lock goes away after the
    > value has been computed so there is only the sync of CHM left.


    It seems to me that if N threads query the same key at about
    the same time, they may all miss the map and go off to perform
    the slow computation. If "slow" is large compared to the cost of
    a lock-release pair (and if it weren't, why cache?), the tradeoff
    seems suspect.

    Also, different threads may wind up using different value
    instances. If the cache is purely a convenience for a value-only
    object that may be all right, but it's not all right if the values
    are supposed to be singletons.

    Finally, there's more than a whiff of the double-checked locking
    antipattern about what you're doing with the `here' flag. I'm not
    absolutely sure what you've got is in fact DCL (hence, wrong), but
    I'm also not absolutely sure it's DCL-free. Before using the pattern
    in any important way, I'd want to check with a major-league guru,
    just as "due diligence."

    --
    Eric Sosman
    d
    Eric Sosman, May 18, 2012
    #7
  8. Robert Klemme

    Daniel Pitts Guest

    On 5/18/12 3:31 PM, Eric Sosman wrote:
    > On 5/18/2012 5:45 PM, Robert Klemme wrote:
    >> On 18.05.2012 20:42, Silvio Bierman wrote:
    >>> On 05/17/2012 11:54 AM, Robert Klemme wrote:

    >>
    >>>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>>> should yield higher throughput because it works without read write
    >>>> locking. You can find it as gist in case the code is garbled in the
    >>>> newsgroup posting:
    >>>> https://gist.github.com/2717818

    >>
    >>> I think you have as many locks as I suggested (being one)? My initial
    >>> implementations of something like this used a plain map with an extra
    >>> lock but later cases used the by then available ConcurrentHashMap as
    >>> well, making one lock redundant.

    >>
    >> You didn't show it here, did you? I can's seem to find it in the thread.
    >> Note that CHM does also do synchronization. I am not sure from your
    >> statement what exact locking scheme you apply. There does seem to be one
    >> difference though: I my version the second lock goes away after the
    >> value has been computed so there is only the sync of CHM left.

    >
    > It seems to me that if N threads query the same key at about
    > the same time, they may all miss the map and go off to perform
    > the slow computation.

    Nope, they will all create a "Reference" object that *can* perform the
    calculation, however because the method uses "putIfAbsent", only the
    first calculating "Reference" object is actually ever used.

    After that, the Reference.get() method will block until that particular
    calculation is complete. Once that calculation is done, the calculating
    Reference will replace itself with a constant Reference for future
    accesses to use.
    > If "slow" is large compared to the cost of
    > a lock-release pair (and if it weren't, why cache?), the tradeoff
    > seems suspect.

    It would be suspect, but there isn't a tradeoff here. All threads lock
    until one value is calculated, basically coalescing the requests for
    that value.
    >
    > Also, different threads may wind up using different value
    > instances. If the cache is purely a convenience for a value-only
    > object that may be all right, but it's not all right if the values
    > are supposed to be singletons.

    Again, invalid if you recognize what is actually happening.
    >
    > Finally, there's more than a whiff of the double-checked locking
    > antipattern about what you're doing with the `here' flag. I'm not
    > absolutely sure what you've got is in fact DCL (hence, wrong), but
    > I'm also not absolutely sure it's DCL-free. Before using the pattern
    > in any important way, I'd want to check with a major-league guru,
    > just as "due diligence."
    >


    "double-checked" locking is only broken if you don't have the
    appropriate memory semantics enforced elsewhere. The reason the
    "classical" double-check locking fails is because there could be a
    data-race condition, where the reference is not null, but the object
    state hasn't been flushed between threads.

    In this case, the "first-check" of obtaining a value from the
    ConcurrentMap causes happens-before relationships, and causes the object
    state to flush.

    Robert Klemme's code is correct, according to everything I've read in
    JCIP and elsewhere.

    Cheers,
    Daniel.
    Daniel Pitts, May 19, 2012
    #8
  9. On 19.05.2012 00:31, Eric Sosman wrote:
    > On 5/18/2012 5:45 PM, Robert Klemme wrote:
    >> On 18.05.2012 20:42, Silvio Bierman wrote:
    >>> On 05/17/2012 11:54 AM, Robert Klemme wrote:

    >>
    >>>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>>> should yield higher throughput because it works without read write
    >>>> locking. You can find it as gist in case the code is garbled in the
    >>>> newsgroup posting:
    >>>> https://gist.github.com/2717818

    >>
    >>> I think you have as many locks as I suggested (being one)? My initial
    >>> implementations of something like this used a plain map with an extra
    >>> lock but later cases used the by then available ConcurrentHashMap as
    >>> well, making one lock redundant.

    >>
    >> You didn't show it here, did you? I can's seem to find it in the thread.
    >> Note that CHM does also do synchronization. I am not sure from your
    >> statement what exact locking scheme you apply. There does seem to be one
    >> difference though: I my version the second lock goes away after the
    >> value has been computed so there is only the sync of CHM left.

    >
    > It seems to me that if N threads query the same key at about
    > the same time, they may all miss the map and go off to perform
    > the slow computation. If "slow" is large compared to the cost of
    > a lock-release pair (and if it weren't, why cache?), the tradeoff
    > seems suspect.
    >
    > Also, different threads may wind up using different value
    > instances. If the cache is purely a convenience for a value-only
    > object that may be all right, but it's not all right if the values
    > are supposed to be singletons.
    >
    > Finally, there's more than a whiff of the double-checked locking
    > antipattern about what you're doing with the `here' flag. I'm not
    > absolutely sure what you've got is in fact DCL (hence, wrong), but
    > I'm also not absolutely sure it's DCL-free. Before using the pattern
    > in any important way, I'd want to check with a major-league guru,
    > just as "due diligence."


    There is no double checked locking going on - neither in LazyCache
    itself nor in the retriever instance.
    http://en.wikipedia.org/wiki/Double-checked_locking

    Daniel has explained all the details already. I'd just add that
    creation of multiple retriever objects is the price you pay for
    increased concurrency due to CHM. But only one of them is ever going to
    be used per key. (Documenting this is one of the tasks of the assert
    Daniel questioned earlier.)

    I think rw-locking is an inferior technique compared to CHM in this case
    because in case of cache miss you cannot escalate a r-lock to a w-lock
    (to avoid deadlocks) and hence need to release the r-lock, acquire the
    w-lock, *and check again*. In other words you have two more
    synchronization points with memory barriers, scheduling effects etc.

    In case of cache hit you might still suffer throughput because of thread
    starvation caused by all lookups for values missing from the cache might
    be blocked for indefinite time when using an unfair rw-lock. Using a
    fair rw-lock on the other hand is more expensive and still has the
    downside of a global lock which affects the complete range of keys.

    CHM improves this by partitioning value space and only lock individual
    partitions for atomic operations. These partitions are completely
    independent from a locking perspective. That's the main advantage of CHM.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 19, 2012
    #9
  10. Robert Klemme

    Eric Sosman Guest

    On 5/19/2012 1:15 AM, Daniel Pitts wrote:
    > On 5/18/12 3:31 PM, Eric Sosman wrote:
    >> On 5/18/2012 5:45 PM, Robert Klemme wrote:
    >>>[...]
    >>> You didn't show it here, did you? I can's seem to find it in the thread.
    >>> Note that CHM does also do synchronization. I am not sure from your
    >>> statement what exact locking scheme you apply. There does seem to be one
    >>> difference though: I my version the second lock goes away after the
    >>> value has been computed so there is only the sync of CHM left.

    >>
    >> It seems to me that if N threads query the same key at about
    >> the same time, they may all miss the map and go off to perform
    >> the slow computation.

    > Nope, they will all create a "Reference" object that *can* perform the
    > calculation, however because the method uses "putIfAbsent", only the
    > first calculating "Reference" object is actually ever used.
    > [...]


    Re-reading, I think you're right. "Never mind," as Emily
    Litella used to say.

    --
    Eric Sosman
    d
    Eric Sosman, May 19, 2012
    #10
  11. Robert Klemme

    Lew Guest

    Robert Klemme wrote:
    > Eric Sosman wrote:
    >> Robert Klemme wrote:
    >>> Silvio Bierman wrote:
    >>>> Robert Klemme wrote:
    >>>
    >>>>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>>>> should yield higher throughput because it works without read write
    >>>>> locking. You can find it as gist in case the code is garbled in the
    >>>>> newsgroup posting:
    >>>>> https://gist.github.com/2717818
    >>>
    >>>> I think you have as many locks as I suggested (being one)? My initial
    >>>> implementations of something like this used a plain map with an extra
    >>>> lock but later cases used the by then available ConcurrentHashMap as
    >>>> well, making one lock redundant.
    >>>
    >>> You didn't show it here, did you? I can's seem to find it in the thread.
    >>> Note that CHM does also do synchronization. I am not sure from your
    >>> statement what exact locking scheme you apply. There does seem to be one
    >>> difference though: I my version the second lock goes away after the
    >>> value has been computed so there is only the sync of CHM left.

    >>
    >> It seems to me that if N threads query the same key at about
    >> the same time, they may all miss the map and go off to perform
    >> the slow computation. If "slow" is large compared to the cost of
    >> a lock-release pair (and if it weren't, why cache?), the tradeoff
    >> seems suspect.
    >>
    >> Also, different threads may wind up using different value
    >> instances. If the cache is purely a convenience for a value-only
    >> object that may be all right, but it's not all right if the values
    >> are supposed to be singletons.
    >>
    >> Finally, there's more than a whiff of the double-checked locking
    >> antipattern about what you're doing with the `here' flag. I'm not
    >> absolutely sure what you've got is in fact DCL (hence, wrong), but
    >> I'm also not absolutely sure it's DCL-free. Before using the pattern
    >> in any important way, I'd want to check with a major-league guru,
    >> just as "due diligence."

    >
    > There is no double checked locking going on - neither in LazyCache itself nor
    > in the retriever instance.
    > http://en.wikipedia.org/wiki/Double-checked_locking
    >
    > Daniel has explained all the details already. I'd just add that creation of
    > multiple retriever objects is the price you pay for increased concurrency due
    > to CHM. But only one of them is ever going to be used per key. (Documenting
    > this is one of the tasks of the assert Daniel questioned earlier.)
    >
    > I think rw-locking is an inferior technique compared to CHM in this case
    > because in case of cache miss you cannot escalate a r-lock to a w-lock (to
    > avoid deadlocks) and hence need to release the r-lock, acquire the w-lock,
    > *and check again*. In other words you have two more synchronization points
    > with memory barriers, scheduling effects etc.
    >
    > In case of cache hit you might still suffer throughput because of thread
    > starvation caused by all lookups for values missing from the cache might be
    > blocked for indefinite time when using an unfair rw-lock. Using a fair rw-lock
    > on the other hand is more expensive and still has the downside of a global
    > lock which affects the complete range of keys.
    >
    > CHM improves this by partitioning value space and only lock individual
    > partitions for atomic operations. These partitions are completely independent
    > from a locking perspective. That's the main advantage of CHM.


    First off, thank you for the very professional and elegant code.

    I shall study it frequently.

    I had experience with CHM on an Enterprise Java project a few years back that
    involved processing documents up to 1GB or so at millions of
    documents per hour.

    As you can imagine, concurrency was a consideration there, but due to
    bureaucracy was not properly managed for a few years. I was hired around the
    time they started to pay attention to such issues.

    The code involved a properly but naively synchronized Map at one point.
    Detailed profiling revealed that lock contention for the Map was the number
    one throughput chokepoint in the whole system. Even above database concurrency
    and I/O.

    Boy howdy, the pundits are right to recommend hard measurement.

    Lock contention has a cascade effect. In modern JVMs, like IBM's
    mainframe-level ones that we used, uncontended locks process quite quickly.
    Contention introduces roadblocks that delay threads, allowing more to queue
    up, causing more contention, slowing things down still more, causing more,
    yada. It only takes one skunk in the middle of the main road to completely tie
    up rush hour everywhere.

    CHM by default partitions lock space (though I'm not clear it uses locks
    exactly) into sixteen independent slices. This meant far more than sixteen
    times faster for us. Writes tend to happen in a solitary thread without a
    whole lot of fight while reads run like greased pigs through the other
    fifteen. With our mix of reads and writes, and transaction volume, CHM pretty
    near eliminated lock contention. YMMV, as always, but in this case that
    chokepoint went from number one to off the list.

    It was still the wrong solution, since a simple, effortless, non-concurrent
    better one that would also have eliminated a raft of other problems was
    available, but had no political traction. However, good enough to eliminate
    the throughput impact was good enough, so I didn't raise a fuss when they
    decided against it.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 19, 2012
    #11
  12. Robert Klemme

    Sebastian Guest

    Am 17.05.2012 12:06, schrieb Robert Klemme:
    > On 05/17/2012 11:54 AM, Robert Klemme wrote:
    >> On 05/15/2012 11:14 AM, bugbear wrote:
    >>> However, if the underlying function is slow
    >>> and/or resource hungry (consider cacheing
    >>> a ray traced image!) many threads can
    >>> end up calling the real function (second
    >>> and subsequent threads to the first get a miss
    >>> during the first threads call to the underlying function).
    >>>
    >>> "obviously..." what I want is for only
    >>> the thread that FIRST has a cache miss
    >>> calls the underlying function, whilst other
    >>> threads (for the same key) wait.

    >>
    >> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >> should yield higher throughput because it works without read write
    >> locking. You can find it as gist in case the code is garbled in the
    >> newsgroup posting:
    >> https://gist.github.com/2717818

    >
    > There was one important detail missing. This is the corrected code (gist
    > is updated as well):
    >

    [snip]
    The important detail obiously is the extra check with the "here" flag.
    Would you mind explaining why that is necessary? Everyone seems to have
    got it, but I must admit I haven't. Why is it not enough that
    Reference#get() is synchronized? As you yourself have missed this at
    your first attempt, I hope the reason isn't trivial...
    -- Sebastian
    Sebastian, May 20, 2012
    #12
  13. Robert Klemme

    Sebastian Guest

    Am 20.05.2012 01:35, schrieb Sebastian:
    > Am 17.05.2012 12:06, schrieb Robert Klemme:
    >> On 05/17/2012 11:54 AM, Robert Klemme wrote:
    >>> On 05/15/2012 11:14 AM, bugbear wrote:
    >>>> However, if the underlying function is slow
    >>>> and/or resource hungry (consider cacheing
    >>>> a ray traced image!) many threads can
    >>>> end up calling the real function (second
    >>>> and subsequent threads to the first get a miss
    >>>> during the first threads call to the underlying function).
    >>>>
    >>>> "obviously..." what I want is for only
    >>>> the thread that FIRST has a cache miss
    >>>> calls the underlying function, whilst other
    >>>> threads (for the same key) wait.
    >>>
    >>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>> should yield higher throughput because it works without read write
    >>> locking. You can find it as gist in case the code is garbled in the
    >>> newsgroup posting:
    >>> https://gist.github.com/2717818

    >>
    >> There was one important detail missing. This is the corrected code (gist
    >> is updated as well):
    >>

    > [snip]
    > The important detail obiously is the extra check with the "here" flag.
    > Would you mind explaining why that is necessary? Everyone seems to have
    > got it, but I must admit I haven't. Why is it not enough that
    > Reference#get() is synchronized? As you yourself have missed this at
    > your first attempt, I hope the reason isn't trivial...
    > -- Sebastian
    >

    I think I've got it - the check prevents double calculation of the
    value by two threads calling get() on that same reference instance.
    Which may occur if putIfAbsent() returns a non-null value before the
    "calculating reference" has had time to replace itself with the
    constant reference. Is that right? -- Sebastian
    Sebastian, May 20, 2012
    #13
  14. Robert Klemme

    Lew Guest

    Sebastian wrote:
    > schrieb Sebastian:
    >> The important detail obiously is the extra check with the "here" flag.
    >> Would you mind explaining why that is necessary? Everyone seems to have
    >> got it, but I must admit I haven't. Why is it not enough that
    >> Reference#get() is synchronized? As you yourself have missed this at
    >> your first attempt, I hope the reason isn't trivial...
    >>

    > I think I've got it - the check prevents double calculation of the value by
    > two threads calling get() on that same reference instance.
    > Which may occur if putIfAbsent() returns a non-null value before the
    > "calculating reference" has had time to replace itself with the
    > constant reference. Is that right? -- Sebastian


    Yep.

    https://gist.github.com/2717818

    ============
    LazyCache
    ============

    public V get(final K key) {
    Reference<V> ref = map.get(key);

    if (ref == null) {
    // miss
    ref = new Reference<V>() {
    private V val;
    private boolean here = false; // superfluous but explicit

    @Override
    public synchronized V get() {
    if (!here) {
    val = calc.get(key);
    here = true;
    // next time lock free access:
    Reference<V> x = map.put(key, new Ref<V>(val));
    assert x == this;
    }

    return val;
    }
    };

    final Reference<V> old = map.putIfAbsent(key, ref);

    if (old != null)
    ref = old; // someone else was faster
    }

    return ref.get();
    }

    ============

    The first cluster of requestors to 'LazyCache#get(K key)' enter the
    'if (ref == null) {' block.

    They all construct a new anonymous-type instance of 'Reference<V>'.

    The first one to hit 'putIfAbsent()' wins. The rest replace their
    painstakingly - actually very, very quickly - contructed anonymous-type
    'Reference<V>' instances with the anonymous-type instance the first client put
    into the 'Map'.

    So only one 'Reference<V>' comes back from all the clients that first hit the
    map and saw 'null'.

    It is of the anonymous type.

    The purpose of the 'here' variable is for the anonymous-type instance. That
    one-and-only possible instance can tell that it has never been 'here' before,
    i.e., inside the 'get()' method.

    The next batch of 'LazyCache' clients hit the map when that anonymous-type
    instance is still in there.

    They all call 'get()' on that anonymous-type instance. First one sees 'here'
    is still false. Up until now, no one has called the expensive operation
    inside of 'calc'. No matter how many threads hit the map when the value was
    'null', and no matter how many of them hit it when the map contained the
    anonymous-type instance, all those threads have to wait until that one and
    only instance calls 'get()' in turn, and the first one of that whole cluster
    is the only one to see '!here'.

    It calls the expensive operation and replaces its own internal value with the
    newly-calculated one.

    In addition, it replaces itself in the map with a simple, final-value holder.

    All the threads currently blocked on the anonymous-type instance 'get()' will
    see the stashed value.

    None of them will call the expensive operation again.

    Any clients thereafter will hit the 'LazyCache' map and find only the simple,
    final-value named-type instance, whose 'get()' returns the pre-calculated
    value in a completely thread-safe manner without any more locking.

    Once the named-type instance is in the map, and all threads that found the
    anonymous one finish their 'get()' calls, the anonymous-type instance is
    eligible for garbage collection, thus letting go of an extra reference to the
    calculator. I've chased that rabbit down the hole a bit without finding a real
    danger to the circular reference in this scenario, but it's never bad to break
    reference-graph cycles. They are often a major bug breeding ground, especially
    on resource managers and types with non-trivial 'finalize()' methods.

    Again, while here the lifetime of the calculator is that of the 'LazyCache'
    instance, it's good practice to decouple value clients from their producers'
    finalizers. It simplifies memory management. See Josh Bloch's /Effective Java/
    for details on finalizers.

    You get tremendous results from the twin engines of scope and lifetime management.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 20, 2012
    #14
  15. On 05/18/2012 11:45 PM, Robert Klemme wrote:
    > On 18.05.2012 20:42, Silvio Bierman wrote:
    >> On 05/17/2012 11:54 AM, Robert Klemme wrote:

    >
    >>> I provide a variant of Silvio's, Eric's and Daniel's solution which
    >>> should yield higher throughput because it works without read write
    >>> locking. You can find it as gist in case the code is garbled in the
    >>> newsgroup posting:
    >>> https://gist.github.com/2717818

    >
    >> I think you have as many locks as I suggested (being one)? My initial
    >> implementations of something like this used a plain map with an extra
    >> lock but later cases used the by then available ConcurrentHashMap as
    >> well, making one lock redundant.

    >
    > You didn't show it here, did you? I can's seem to find it in the thread.
    > Note that CHM does also do synchronization. I am not sure from your
    > statement what exact locking scheme you apply. There does seem to be one
    > difference though: I my version the second lock goes away after the
    > value has been computed so there is only the sync of CHM left.
    >
    > Kind regards
    >
    > robert
    >


    Actually, I didn't, and I did mention the word "lock" alongside the
    wrapper object.

    Since the only code I have at hand is a Scala version there was little
    point in posting it. It uses a Scala "lazy val" in the wrapper object
    which translates to a compiler generated DCL (assuming 1.5+). So the
    only lock that would remain would be the CHM one.

    I vaguely remember a DCL in one of the more recent Java versions I
    coughed up.
    Silvio Bierman, May 21, 2012
    #15
  16. On 19.05.2012 23:24, Lew wrote:

    > First off, thank you for the very professional and elegant code.
    >
    > I shall study it frequently.


    Thank you!

    > I had experience with CHM on an Enterprise Java project a few years back
    > that involved processing documents up to 1GB or so at millions of
    > documents per hour.
    >
    > As you can imagine, concurrency was a consideration there, but due to
    > bureaucracy was not properly managed for a few years. I was hired around
    > the time they started to pay attention to such issues.
    >
    > The code involved a properly but naively synchronized Map at one point.
    > Detailed profiling revealed that lock contention for the Map was the
    > number one throughput chokepoint in the whole system. Even above
    > database concurrency and I/O.
    >
    > Boy howdy, the pundits are right to recommend hard measurement.
    >
    > Lock contention has a cascade effect. In modern JVMs, like IBM's
    > mainframe-level ones that we used, uncontended locks process quite
    > quickly. Contention introduces roadblocks that delay threads, allowing
    > more to queue up, causing more contention, slowing things down still
    > more, causing more, yada. It only takes one skunk in the middle of the
    > main road to completely tie up rush hour everywhere.
    >
    > CHM by default partitions lock space (though I'm not clear it uses locks
    > exactly) into sixteen independent slices. This meant far more than
    > sixteen times faster for us. Writes tend to happen in a solitary thread
    > without a whole lot of fight while reads run like greased pigs through
    > the other fifteen. With our mix of reads and writes, and transaction
    > volume, CHM pretty near eliminated lock contention. YMMV, as always, but
    > in this case that chokepoint went from number one to off the list.
    >
    > It was still the wrong solution, since a simple, effortless,
    > non-concurrent better one that would also have eliminated a raft of
    > other problems was available, but had no political traction. However,
    > good enough to eliminate the throughput impact was good enough, so I
    > didn't raise a fuss when they decided against it.


    Thanks for sharing that story. What I find amazing about this is that
    what you did isn't exactly rocket science and yet they didn't do it
    before. You would guess that it's just what every engineer would do but
    no: something prevents this from happening.

    And it is true for a number of other techniques I would consider bread
    and butter tools:

    - Ensuring requirements are gathered properly and understood before
    starting to code (and design of course).

    - Testing code _before_ shipping.

    - When writing unit tests, making sure to also include tests for
    critical values (usually corner cases such as -1, 0, 1, limit, limit -
    1, limit + 1, null, "", etc.).

    - Thinking about the person who must use what you produce, regardless
    whether it's a document, a configuration file layout, a DSL, a class, a
    library. It seems many people in software development are far more
    concerned with the inner workings of what they create instead of
    considering how it will be used. Maybe it's easier or it is because
    making it work takes the largest part of coding - still the outcome
    often is dissatisfying and a little more thought upfront goes a long way
    at avoiding maintenance headaches and worse things.

    ....

    This can't be so difficult, can it?

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 21, 2012
    #16
  17. Robert Klemme

    Lew Guest

    Robert Klemme wrote:
    > Thanks for sharing that story. What I find amazing about this is that what you
    > did isn't exactly rocket science and yet they didn't do it before. You would
    > guess that it's just what every engineer would do but no: something prevents
    > this from happening.


    It was worse.

    Turns out the resource acquisition managed by the inter-thread map coulda
    woulda shoulda been handled better by an object instantiated with a simple new
    ...() call.

    The returned reference and access to the object would perforce have been
    limited to the instantiating thread. Any other thread woulda shoulda coulda
    had its own independent resource, object, reference, whatever. Sharing was not
    needed.

    Loosely speaking -

    Desirable desire = getFromMap(FANCY_CAR);
    // map is shared, with fancy-schmancy synch

    vs.

    Desirable desire = new FancyCar();
    // Map? Meshugge! I'm not sharing!

    > And it is true for a number of other techniques I would consider bread and
    > butter tools:
    >
    > - Ensuring requirements are gathered properly and understood before starting
    > to code (and design of course).


    True, with provisos.

    = Requirements are not fixed. What you should have now is the best available
    for now.

    = "Properly"? "Understood"? There are books on this, but it boils down to:

    Your software will be used. By whom? Why? What would annoy them? Don't do
    that! What would make them forget the software is there and just get their job
    done? Do that!

    "Proper" is always from the user's standpoint. Not what they say, by the way,
    but what they think and feel - really. Behind the bullshit and self-deception,
    when they aren't posturing for your benefit and really are just trying to get
    their work done. And that's where "understood" comes in. Give'em what they
    want, not necessarily what they ask for.

    > - Testing code _before_ shipping.


    Some espouse before even writing it. I say, sure, great idea, when you make it
    easy to express my ideas in tests instead of imperative code.

    Hmm.

    BTW, this being Java, have you guys tried EasyMock? (AndroidMock in Android
    Java.) That shtuff rocks. It makes your very code better, not just tested but
    idiomatically. Tricky stuff. I'll post about that when I get back into it.
    I've discovered a Right Way and a Wrong Way to do mocks, just like I did with JPA.

    > - When writing unit tests, making sure to also include tests for critical
    > values (usually corner cases such as -1, 0, 1, limit, limit - 1, limit + 1,
    > null, "", etc.).


    Consider this. No matter what input a module receives, it must achieve a valid
    program state in response.

    No matter what.

    Coding to this formalism means, for example, that a method can handle all
    possible argument values.

    public interface Caromer
    {
    public Fromitz carom(Flamo flame);
    }

    As Herr Klemme points out, 'null' is one possible input for 'flame'.
    Non-'null' is another. Presumably a 'Flamo' has observable state that
    distinguishes input values. If the type were 'Integer' or 'int', Robert's
    example int values might pertain.

    But you cannot give infinite test cases to a unit test. So you pick edge
    cases, on both the "happy path" (positive) and "unhappy path" (negative)
    scenarios. Consider this:

    public interface Quadrupler
    {
    public int quadruple(int val);
    }

    What if it were called with an auto-unboxed Integer?

    public int someClient(Integer val)
    {
    Quadrupler quadrupler = QuadFactory.get();
    Integer quadded = quadrupler.quadruple(val);
    }


    > - Thinking about the person who must use what you produce, regardless whether
    > it's a document, a configuration file layout, a DSL, a class, a library. It
    > seems many people in software development are far more concerned with the
    > inner workings of what they create instead of considering how it will be used.
    > Maybe it's easier or it is because making it work takes the largest part of
    > coding - still the outcome often is dissatisfying and a little more thought
    > upfront goes a long way at avoiding maintenance headaches and worse things.
    >
    > ...
    >
    > This can't be so difficult, can it?


    You have a fine sense of humor.

    It's as easy as empathy, and getting inside the mind of a professional in a
    field of which you know little to nothing.

    Testing certainly helps. Functional tests (as opposed to unit tests) can and
    should be part of and produced from requirements. At the user-facing level you
    can send test scenarios back as memos of understanding (MoUs) - what better
    way to make clear what you understand and how it will play out?

    Prototypes count, too. They can meet tests, even. Building a prototype that
    meets tests, starting within a day or so of starting, kicks ass for converging
    on clarity. Keep it up - every day the prototype runs, works (up to whatever
    you've done that day), can be checked out and run in any directory, by any
    developer, and run, etc.

    Add a CI tool like Jenkins to run your tests every time you "git commit" to
    "origin/master" (or whatever); you got yourself a one-entity professional shop.

    <http://www.junit.org/>
    <http://www.easymock.org/>
    <http://code.google.com/p/android-mock/>
    <http://jenkins-ci.org/>

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 24, 2012
    #17
  18. On 24.05.2012 08:11, Lew wrote:
    > Robert Klemme wrote:


    >> And it is true for a number of other techniques I would consider bread
    >> and
    >> butter tools:
    >>
    >> - Ensuring requirements are gathered properly and understood before
    >> starting
    >> to code (and design of course).

    >
    > True, with provisos.
    >
    > = Requirements are not fixed. What you should have now is the best
    > available for now.


    But there is a minimum standard: there must be enough information to at
    least get an idea of what is to be done. I have seen three sentence
    requirements for a feature in a complex system which left engineers
    wondering what they are supposed to do.

    > = "Properly"? "Understood"? There are books on this, but it boils down to:
    >
    > Your software will be used. By whom? Why? What would annoy them? Don't
    > do that! What would make them forget the software is there and just get
    > their job done? Do that!


    !

    > "Proper" is always from the user's standpoint. Not what they say, by the
    > way, but what they think and feel - really. Behind the bullshit and
    > self-deception, when they aren't posturing for your benefit and really
    > are just trying to get their work done. And that's where "understood"
    > comes in. Give'em what they want, not necessarily what they ask for.


    Pretty early in my career I had a quite instructive experience: we were
    building a web catalog for a business customer with user management, a
    lot of technical data about their products - and of course a link to
    their ERP system. They wanted to have a price calculation in that
    system which could take several routes depending on the user. They
    explained it, I wrote a document which also included an UML activity
    diagram of the flow with explanations. We sent it in. They rejected:
    "No, it must be completely different." So we sat together. End of the
    story: it was exactly what they wanted. So they either did not
    understood the document and didn't want to ask - or their own
    requirements. :)

    >> - Testing code _before_ shipping.

    >
    > Some espouse before even writing it. I say, sure, great idea, when you
    > make it easy to express my ideas in tests instead of imperative code.
    >
    > Hmm.
    >
    > BTW, this being Java, have you guys tried EasyMock? (AndroidMock in
    > Android Java.) That shtuff rocks. It makes your very code better, not
    > just tested but idiomatically. Tricky stuff. I'll post about that when I
    > get back into it. I've discovered a Right Way and a Wrong Way to do
    > mocks, just like I did with JPA.


    Not yet, thank you for the pointer!

    >> - When writing unit tests, making sure to also include tests for critical
    >> values (usually corner cases such as -1, 0, 1, limit, limit - 1, limit
    >> + 1,
    >> null, "", etc.).

    >
    > Consider this. No matter what input a module receives, it must achieve a
    > valid program state in response.
    >
    > No matter what.


    Absolutely!

    > Coding to this formalism means, for example, that a method can handle
    > all possible argument values.
    >
    > public interface Caromer
    > {
    > public Fromitz carom(Flamo flame);
    > }
    >
    > As Herr Klemme points out, 'null' is one possible input for 'flame'.
    > Non-'null' is another. Presumably a 'Flamo' has observable state that
    > distinguishes input values. If the type were 'Integer' or 'int',
    > Robert's example int values might pertain.
    >
    > But you cannot give infinite test cases to a unit test. So you pick edge
    > cases, on both the "happy path" (positive) and "unhappy path" (negative)
    > scenarios.


    Exactly.

    >> - Thinking about the person who must use what you produce, regardless
    >> whether
    >> it's a document, a configuration file layout, a DSL, a class, a
    >> library. It
    >> seems many people in software development are far more concerned with the
    >> inner workings of what they create instead of considering how it will
    >> be used.
    >> Maybe it's easier or it is because making it work takes the largest
    >> part of
    >> coding - still the outcome often is dissatisfying and a little more
    >> thought
    >> upfront goes a long way at avoiding maintenance headaches and worse
    >> things.
    >>
    >> ...
    >>
    >> This can't be so difficult, can it?

    >
    > You have a fine sense of humor.


    Well, the fun disappears when you see the same mistakes made over and
    over again and attempts to improve it do not have effects...

    > It's as easy as empathy, and getting inside the mind of a professional
    > in a field of which you know little to nothing.


    I agree for library API design: that takes a bit of experience to get
    right. But when it comes to writing documents I think the rules for
    readability are quite straight forward (some universities even have them
    in their curriculum): do not start with the details but give the reader
    an overview first. Give previews and summaries. Generally watch
    document structure. Do not put too much in a single diagram. Things
    like that.

    > Testing certainly helps. Functional tests (as opposed to unit tests) can
    > and should be part of and produced from requirements. At the user-facing
    > level you can send test scenarios back as memos of understanding (MoUs)
    > - what better way to make clear what you understand and how it will play
    > out?


    Good point.

    > Prototypes count, too. They can meet tests, even. Building a prototype
    > that meets tests, starting within a day or so of starting, kicks ass for
    > converging on clarity. Keep it up - every day the prototype runs, works
    > (up to whatever you've done that day), can be checked out and run in any
    > directory, by any developer, and run, etc.


    And prototypes are a great way to test whether ideas work with low
    effort before basing a whole design on an idea.

    > Add a CI tool like Jenkins to run your tests every time you "git commit"
    > to "origin/master" (or whatever); you got yourself a one-entity
    > professional shop.


    Jenkins is great. We use it. Broken tests are quickly discovered. :)

    Kind regards

    robert


    PS: Hey, you managed to sneak in a German word. :)

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 24, 2012
    #18
  19. On Thu, 24 May 2012 23:05:19 +0200, Robert Klemme
    <> wrote:

    [snip]

    >Pretty early in my career I had a quite instructive experience: we were
    >building a web catalog for a business customer with user management, a
    >lot of technical data about their products - and of course a link to
    >their ERP system. They wanted to have a price calculation in that
    >system which could take several routes depending on the user. They
    >explained it, I wrote a document which also included an UML activity
    >diagram of the flow with explanations. We sent it in. They rejected:
    >"No, it must be completely different." So we sat together. End of the
    >story: it was exactly what they wanted. So they either did not
    >understood the document and didn't want to ask - or their own
    >requirements. :)

    ^^^
    What is this?

    (I think you called it right, no joking.)

    A UML diagram might as well be code. It is something of ours.
    Why expect them to be able to read it?

    As to possibly not knowing their own requirements, that is quite
    possible. Many a person knows how to do his job as it is currently
    configured. That does not mean that he knows how to discuss the whys
    and wherefores of it.

    [snip]

    Sincerely,

    Gene Wirchenko
    Gene Wirchenko, May 25, 2012
    #19
  20. On 12-05-21 04:15 PM, Robert Klemme wrote:
    >

    [ SNIP ]>
    > And it is true for a number of other techniques I would consider bread
    > and butter tools:
    >
    > - Ensuring requirements are gathered properly and understood before
    > starting to code (and design of course).


    I want to blame an improper understanding of agile and lean and
    iterative here, but I won't. On the other hand, I do think that a
    widespread kneejerk rejection of waterfall has also led, in many cases,
    to people rejecting formal requirements analysis and intensive BA work.
    Maybe they've read enough about agile to think that they'll find out
    everything they need to find out when it's time to build Slice 3, and
    it's OK to find out about Slice 3 requirements then, not upfront.

    My philosophy - perhaps old-school - is based on these observations over
    the years:

    1) In the vast majority of projects of all sizes, the large majority of
    requirements were known and identifiable at the beginning;

    2) Corollary to #1: the large majority of requirements do not change
    over the lifetime of a project, of any size. They may be identified
    late, or not at all (before release), but they were always knowable;

    3) An impression of changing requirements is almost always due to
    imprecise stakeholder/BA/architect/designer interactions and
    conversations. If you get one answer in April and a different one in
    October, it's either down to who you are asking or what your questions
    are, or both. The actual final "real" requirement probably never changed.

    As I say, these are my observations. YMMV. In my case I do substantially
    more requirements analysis up front than maybe other folks do. It has
    served me well. I also do a lot of prototyping and wire-framing and
    proofs-of-concept/technology, all of which in some developer camps seem
    to have been relegated.

    > - Testing code _before_ shipping.
    >
    > - When writing unit tests, making sure to also include tests for
    > critical values (usually corner cases such as -1, 0, 1, limit, limit -
    > 1, limit + 1, null, "", etc.).


    Testing is a different art. There is a lot of truth to the saying that a
    good tester is a mediocre coder, and vice versa. At a minimum, if a
    single person does have the expertise to do both, they need to be
    schizophrenic.

    Although I've not read anything about it, I've occasionally thought that
    by generally accepted testing principles, that a person who writes unit
    tests (just like any other tests) cannot, and should not, be the
    developer. And yet it's the developer who invariably writes unit tests
    for his own code. Food for thought.

    > - Thinking about the person who must use what you produce, regardless
    > whether it's a document, a configuration file layout, a DSL, a class, a
    > library. It seems many people in software development are far more
    > concerned with the inner workings of what they create instead of
    > considering how it will be used. Maybe it's easier or it is because
    > making it work takes the largest part of coding - still the outcome
    > often is dissatisfying and a little more thought upfront goes a long way
    > at avoiding maintenance headaches and worse things.
    >
    > ...

    See my thoughts about prototypes, wireframes, Pocs/PoTs above. Using
    those methods I've saved a lot of time over the years. All of them show
    the end-user a good impression of *their* final interface. Which is what
    matters.
    >
    > This can't be so difficult, can it?
    >
    > Kind regards
    >
    > robert
    >
    >

    AHS

    --
    Never interrupt your enemy when he is making a mistake.
    --Napoleon
    Arved Sandstrom, May 25, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Nokes

    Cache::Cache Stale Segments

    Jeff Nokes, Sep 30, 2003, in forum: Perl
    Replies:
    0
    Views:
    558
    Jeff Nokes
    Sep 30, 2003
  2. DesignerX

    Page.Cache vs HttpContext.Current.Cache

    DesignerX, Jan 20, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    8,227
    vMike
    Jan 20, 2004
  3. Silvio Bierman

    Re: multithreaded cache?

    Silvio Bierman, May 15, 2012, in forum: Java
    Replies:
    10
    Views:
    538
    Mike Schilling
    May 26, 2012
  4. Eric Sosman

    Re: multithreaded cache?

    Eric Sosman, May 15, 2012, in forum: Java
    Replies:
    0
    Views:
    462
    Eric Sosman
    May 15, 2012
  5. Kevin McMurtrie

    Re: multithreaded cache?

    Kevin McMurtrie, May 25, 2012, in forum: Java
    Replies:
    3
    Views:
    313
    Robert Klemme
    Jun 2, 2012
Loading...

Share This Page