Infinite loop from HashMap.keySet iteration.

Discussion in 'Java' started by Daniel Pitts, Jun 14, 2012.

  1. Daniel Pitts

    Daniel Pitts Guest

    Just wanted to share some experience with the group.

    I've come across a very strange situation, and I think I have the
    solution (still needs to be verified in production, but looks promising).

    I've inherited some code which basically does the following. Obfuscated
    and simplified for demonstration purposes.

    void logResults(Map<String, Result> resultsMap) {
    StringBuilder logLine = new StringBuilder("Some stuff");
    Set<String> keySet = resultsMap.keySet();
    for (String key: keySet) {
    logLine.append(key).append(": ").append(resultsMap.get(key));
    }
    logger.info(logLine.toString());
    }

    One day, we upgraded our application container, and this (which had been
    working fine), started to cause OOM exceptions. Digging through an
    HPROF, I found the "Some stuff" string in a StringBuilder which was
    600MB is size, and had the same values of a couple of keys repeated over
    and over again.

    I found a newer version of this library function which someone had added
    a simple counter to break out of the loop if the count > keySet.size();

    Fixes the catastrophic symptom, but not the cause.

    My belief is that this is an unusual manifestation of a concurrency
    issue. the resultsMap is actually written to in this manor, by multiple
    threads.


    void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    synchronize(factory) {
    Result result;
    result = resultsMap.get(factory.getName());
    if (result == null) {
    result = new Result();
    resultsMap.put(factory.getName(), result);
    }
    result.recordResult(factory.getResult());
    }
    }

    This may look "safe", but really the resultsMap internal data structure
    can become inconsistent because it is being accessed by two threads at
    the same time.

    Since this is legacy code and I don't have time to do a full analysis
    and repair the fundamental design flaws, my solution was to make
    resultsMap a Collections.synchronizedMap(new HashMap<String, Result>()),
    instead of a straight up HashMap.

    From what I could tell, addResult is the only method which writes to
    that map (but since its a Map being sent around all over the place,
    that's hard to tell for sure). addResult's synchronization should be
    enough to handle the "atomic" operation of "add if absent". Given that
    factory.getName() is unique to that factory instance.

    Anyway, just thought I'd share this interesting case-study with the
    group. I just wish my predecessor had read /Java Concurrency in
    Practice/ before trying to write a threaded framework. This is only
    *one* of the issues that's come out of this framework.

    Sincerely,
    Daniel.
    Daniel Pitts, Jun 14, 2012
    #1
    1. Advertising

  2. Daniel Pitts

    Daniel Pitts Guest

    On 6/14/12 10:17 AM, markspace wrote:
    > Well, it doesn't look "safe" to me because you are reading
    > unsynchronized from a shared object in the logResults() method. Always
    > always always you must synchronized both the read and the write, or it's
    > not thread safe.

    Collections.synchronizedMap adds the synchronization necessary to safely
    read/write (as separate operations, not atomic).
    >
    > You're also not MUTEX safe either, which clearly needs to be dealt with.
    > If you were using some form of Collections iterator, you'd probably be
    > getting concurrent modification exceptions.

    The logResults is only called after all the addResults methods have
    completed.
    >
    > For the MUTEX/concurrent modification problem, I don't think using
    > Collections.synchronizedMap() is enough. You're going to have to block
    > all access during the read. This could ruin the concurrency of your app,
    > so there are even further issues here to consider, but strictly from a
    > thread safety issue it's the better solution I think.

    If you look at the addResult method, that handles the MUTEX case.
    >
    > void logResults(Map<String, Result> resultsMap) {
    > StringBuilder logLine = new StringBuilder("Some stuff");
    > synchronized( resultsMap ) {
    > Set<String> keySet = resultsMap.keySet();
    > for (String key: keySet) {
    > logLine.append(key).append(": ").append(resultsMap.get(key));
    > }
    > }
    > logger.info(logLine.toString());
    > }
    >
    > Assuming "Some stuff" is thread safe.
    >
    > However I think a different scheme altogether is needed. Something like
    > a ConcurrentLinkedQueue rather than a Map is probably an even better
    > solution overall, from both an application performance & concurrency
    > standpoint, and a thread safe & correct standpoint.

    Actually, the entire scheme that requires all of this is junk, so it
    wouldn't be worth fixing. A complete rewrite the better idioms and real
    abstractions is what is needed. Unfortunately, this code-base is being
    abandoned so we can move to a PHP based platform. Yes, I know. I don't
    want to hear it, because you're preaching to the choir. That decision
    was made by the CTO, and I've given all the feedback my position allows.
    >
    > I realize you say your app is legacy. I'm just tossing ideas out for
    > future reference.

    Understood. :)
    Daniel Pitts, Jun 14, 2012
    #2
    1. Advertising

  3. Daniel Pitts

    Roedy Green Guest

    On Thu, 14 Jun 2012 09:54:59 -0700, Daniel Pitts
    <> wrote, quoted or indirectly
    quoted someone who said :

    >This may look "safe", but really the resultsMap internal data structure
    >can become inconsistent because it is being accessed by two threads at
    >the same time.


    I did a cursory glance, but are you not modifying the values of keys?
    That is a definite no no, even on the same thread.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    Controlling complexity is the essence of computer programming.
    ~ Brian W. Kernighan 1942-01-01
    ..
    Roedy Green, Jun 14, 2012
    #3
  4. Daniel Pitts

    Daniel Pitts Guest

    On 6/14/12 11:34 AM, Roedy Green wrote:
    > On Thu, 14 Jun 2012 09:54:59 -0700, Daniel Pitts
    > <> wrote, quoted or indirectly
    > quoted someone who said :
    >
    >> This may look "safe", but really the resultsMap internal data structure
    >> can become inconsistent because it is being accessed by two threads at
    >> the same time.

    >
    > I did a cursory glance, but are you not modifying the values of keys?
    > That is a definite no no, even on the same thread.

    Keys are String, String is immutable.
    Daniel Pitts, Jun 14, 2012
    #4
  5. On Thursday, June 14, 2012 6:54:59 PM UTC+2, Daniel Pitts wrote:
    > Just wanted to share some experience with the group.
    >
    > I've come across a very strange situation, and I think I have the
    > solution (still needs to be verified in production, but looks promising).
    >
    > I've inherited some code which basically does the following. Obfuscated
    > and simplified for demonstration purposes.
    >
    > void logResults(Map<String, Result> resultsMap) {
    > StringBuilder logLine = new StringBuilder("Some stuff");
    > Set<String> keySet = resultsMap.keySet();
    > for (String key: keySet) {
    > logLine.append(key).append(": ").append(resultsMap.get(key));
    > }
    > logger.info(logLine.toString());
    > }


    That should look roughly like this (more efficient to use the entrySet() and synchronize added):

    void logResults(Map<String, Result> resultsMap) {
    final StringBuilder logLine = new StringBuilder("Some stuff");

    synchronized (resultsMap) {
    for (final Entry<String, Result> entry: resultsMap.entrySet()) {
    logLine.append(entry.getKey()).append(": ").append(entry.getValue());
    }
    }

    logger.info(logLine.toString());
    }

    > void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    > synchronize(factory) {
    > Result result;
    > result = resultsMap.get(factory.getName());
    > if (result == null) {
    > result = new Result();
    > resultsMap.put(factory.getName(), result);
    > }
    > result.recordResult(factory.getResult());
    > }
    > }


    It looks like there is a spelling error and it was intended to be

    void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    synchronize(resultsMap) { // error fixed
    Result result = resultsMap.get(factory.getName());
    if (result == null) {
    result = new Result();
    resultsMap.put(factory.getName(), result);
    }
    result.recordResult(factory.getResult());
    }
    }

    Or the synchronize(resultsMap) was simply forgotten.

    Kind regards

    robert
    Robert Klemme, Jun 18, 2012
    #5
  6. Daniel Pitts

    Daniel Pitts Guest

    On 6/18/12 4:42 AM, Robert Klemme wrote:
    > On Thursday, June 14, 2012 6:54:59 PM UTC+2, Daniel Pitts wrote:
    >> Just wanted to share some experience with the group.
    >>
    >> I've come across a very strange situation, and I think I have the
    >> solution (still needs to be verified in production, but looks promising).
    >>
    >> I've inherited some code which basically does the following. Obfuscated
    >> and simplified for demonstration purposes.
    >>
    >> void logResults(Map<String, Result> resultsMap) {
    >> StringBuilder logLine = new StringBuilder("Some stuff");
    >> Set<String> keySet = resultsMap.keySet();
    >> for (String key: keySet) {
    >> logLine.append(key).append(": ").append(resultsMap.get(key));
    >> }
    >> logger.info(logLine.toString());
    >> }

    >
    > That should look roughly like this (more efficient to use the entrySet() and synchronize added):
    >
    > void logResults(Map<String, Result> resultsMap) {
    > final StringBuilder logLine = new StringBuilder("Some stuff");
    >
    > synchronized (resultsMap) {
    > for (final Entry<String, Result> entry: resultsMap.entrySet()) {
    > logLine.append(entry.getKey()).append(": ").append(entry.getValue());
    > }
    > }
    >
    > logger.info(logLine.toString());
    > }

    Agreed. However, the code I was handed used keySet(). I didn't feel
    like refactoring this (for many reasons).
    >
    >> void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    >> synchronize(factory) {
    >> Result result;
    >> result = resultsMap.get(factory.getName());
    >> if (result == null) {
    >> result = new Result();
    >> resultsMap.put(factory.getName(), result);
    >> }
    >> result.recordResult(factory.getResult());
    >> }
    >> }

    >
    > It looks like there is a spelling error and it was intended to be
    >
    > void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    > synchronize(resultsMap) { // error fixed

    Nope, that is not, in fact, the original author's intent. There was no
    other synchronizing on the resultMap. The original author was only
    worried about protecting the key, not the whole map. This of course is
    not sufficient, but it is what they implemented.
    > Result result = resultsMap.get(factory.getName());
    > if (result == null) {
    > result = new Result();
    > resultsMap.put(factory.getName(), result);
    > }
    > result.recordResult(factory.getResult());
    > }
    > }
    >
    > Or the synchronize(resultsMap) was simply forgotten.

    Exactly, which is why I ensure resultsMap is a synchronized Map instead.
    >
    > Kind regards
    >
    > robert
    Daniel Pitts, Jun 18, 2012
    #6
  7. On 06/18/2012 06:08 PM, Daniel Pitts wrote:
    > On 6/18/12 4:42 AM, Robert Klemme wrote:


    >> [refactoring from keySet() to entrySet()]


    > Agreed. However, the code I was handed used keySet(). I didn't feel like
    > refactoring this (for many reasons).


    For example?

    > Nope, that is not, in fact, the original author's intent. There was no
    > other synchronizing on the resultMap. The original author was only
    > worried about protecting the key, not the whole map. This of course is
    > not sufficient, but it is what they implemented.


    The key is a String which does not need synchronization because it is
    immutable. Did you mean to say "factory" instead? I have no idea what
    ResultFactory actually does but getName() looks like something which
    always returns the same name. If that was the case then no
    synchronization would be necessary. And if getResult() involves a
    calculation which needs to be thread safe then that method could be
    synchronized. Can you share more details about that class?

    >> Result result = resultsMap.get(factory.getName());
    >> if (result == null) {
    >> result = new Result();
    >> resultsMap.put(factory.getName(), result);
    >> }
    >> result.recordResult(factory.getResult());
    >> }
    >> }
    >>
    >> Or the synchronize(resultsMap) was simply forgotten.

    > Exactly, which is why I ensure resultsMap is a synchronized Map instead.


    This won't make the code entirely thread safe - at least if you want to
    ensure that the fist thread which detects the missing entry also adds it
    to the Map and - maybe more important - that getResult() is invoked only
    once per key. Collections.synchronizedMap() only ensures all accesses
    are synchronized - not more.

    Kind regards

    robert
    Robert Klemme, Jun 18, 2012
    #7
  8. Daniel Pitts

    Daniel Pitts Guest

    On 6/18/12 9:34 AM, Robert Klemme wrote:
    > On 06/18/2012 06:08 PM, Daniel Pitts wrote:
    >> On 6/18/12 4:42 AM, Robert Klemme wrote:

    >
    >>> [refactoring from keySet() to entrySet()]

    >
    >> Agreed. However, the code I was handed used keySet(). I didn't feel like
    >> refactoring this (for many reasons).

    >
    > For example?

    1. This is legacy code.
    2. Changing the semantics of this method might introduce bugs. I don't
    feel like debugging this package.
    3. This is a centrally used library for many different development
    teams. I didn't want to worry about side-effects on other teams.
    4. This code base *will* be replaced in the next year.
    5. As an engineer with decades of experience, I have decided that
    changing this code-base was going to be more expensive than leaving it
    alone.
    >
    >> Nope, that is not, in fact, the original author's intent. There was no
    >> other synchronizing on the resultMap. The original author was only
    >> worried about protecting the key, not the whole map. This of course is
    >> not sufficient, but it is what they implemented.

    >
    > The key is a String which does not need synchronization because it is
    > immutable. Did you mean to say "factory" instead? I have no idea what
    > ResultFactory actually does but getName() looks like something which
    > always returns the same name. If that was the case then no
    > synchronization would be necessary. And if getResult() involves a
    > calculation which needs to be thread safe then that method could be
    > synchronized. Can you share more details about that class?

    getName always returned the same name for the specific factory.
    getResult itself needn't be synchronized.

    Again, see the reasons above why I'm not making sweeping changes to this
    package.
    >
    >>> Result result = resultsMap.get(factory.getName());
    >>> if (result == null) {
    >>> result = new Result();
    >>> resultsMap.put(factory.getName(), result);
    >>> }
    >>> result.recordResult(factory.getResult());
    >>> }
    >>> }
    >>>
    >>> Or the synchronize(resultsMap) was simply forgotten.

    >> Exactly, which is why I ensure resultsMap is a synchronized Map instead.

    >
    > This won't make the code entirely thread safe - at least if you want to
    > ensure that the fist thread which detects the missing entry also adds it
    > to the Map and - maybe more important - that getResult() is invoked only
    > once per key. Collections.synchronizedMap() only ensures all accesses
    > are synchronized - not more.

    That is taking care of by the synchronization on the factory. In
    reality, the Factory should have been the key, but that isn't how the
    original coder decided to do things.

    In any case, the real problem was that there was no happens-before
    between any two "put" method calls, which lead to inconsistent state.
    Daniel Pitts, Jun 18, 2012
    #8
  9. On 18.06.2012 22:32, Daniel Pitts wrote:
    > On 6/18/12 9:34 AM, Robert Klemme wrote:
    >> On 06/18/2012 06:08 PM, Daniel Pitts wrote:
    >>> On 6/18/12 4:42 AM, Robert Klemme wrote:

    >>
    >>>> [refactoring from keySet() to entrySet()]

    >>
    >>> Agreed. However, the code I was handed used keySet(). I didn't feel like
    >>> refactoring this (for many reasons).

    >>
    >> For example?

    > 1. This is legacy code.
    > 2. Changing the semantics of this method might introduce bugs. I don't
    > feel like debugging this package.
    > 3. This is a centrally used library for many different development
    > teams. I didn't want to worry about side-effects on other teams.
    > 4. This code base *will* be replaced in the next year.
    > 5. As an engineer with decades of experience, I have decided that
    > changing this code-base was going to be more expensive than leaving it
    > alone.


    Yep, that sounds reasonable. I'd just say that iterating via entrySet
    instead of keySet is only marginally semantically different: you'll even
    get more efficient iteration because you spare all the map lookups.
    Plus you are more likely to get the actually corresponding value to a key.

    >>> Nope, that is not, in fact, the original author's intent. There was no
    >>> other synchronizing on the resultMap. The original author was only
    >>> worried about protecting the key, not the whole map. This of course is
    >>> not sufficient, but it is what they implemented.

    >>
    >> The key is a String which does not need synchronization because it is
    >> immutable. Did you mean to say "factory" instead? I have no idea what
    >> ResultFactory actually does but getName() looks like something which
    >> always returns the same name. If that was the case then no
    >> synchronization would be necessary. And if getResult() involves a
    >> calculation which needs to be thread safe then that method could be
    >> synchronized. Can you share more details about that class?

    > getName always returned the same name for the specific factory.
    > getResult itself needn't be synchronized.
    >
    > Again, see the reasons above why I'm not making sweeping changes to this
    > package.


    Absolutely reasonable.

    >>>> Or the synchronize(resultsMap) was simply forgotten.
    >>> Exactly, which is why I ensure resultsMap is a synchronized Map instead.

    >>
    >> This won't make the code entirely thread safe - at least if you want to
    >> ensure that the fist thread which detects the missing entry also adds it
    >> to the Map and - maybe more important - that getResult() is invoked only
    >> once per key. Collections.synchronizedMap() only ensures all accesses
    >> are synchronized - not more.

    > That is taking care of by the synchronization on the factory.


    Not strictly: it will be unsafe if multiple factories can return the
    same (equivalent) name.

    > In
    > reality, the Factory should have been the key, but that isn't how the
    > original coder decided to do things.


    Often quoted "historic reasons". :) Maybe also the factory should
    really be the value in the map (with the name as key) and cache the key
    internally once it is calculated.

    > In any case, the real problem was that there was no happens-before
    > between any two "put" method calls, which lead to inconsistent state.


    I understood your previous posting that the issue was more with missing
    synchronization between reading (iteration for logging) and writing
    (method addResult()).

    Note also that wrapping the Map in a synchronized version does not
    prevent CME.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Jun 18, 2012
    #9
  10. Daniel Pitts

    Daniel Pitts Guest

    On 6/18/12 2:26 PM, Robert Klemme wrote:
    > On 18.06.2012 22:32, Daniel Pitts wrote:
    >> In any case, the real problem was that there was no happens-before
    >> between any two "put" method calls, which lead to inconsistent state.

    >
    > I understood your previous posting that the issue was more with missing
    > synchronization between reading (iteration for logging) and writing
    > (method addResult()).
    >
    > Note also that wrapping the Map in a synchronized version does not
    > prevent CME.


    Right, the iteration only happens after all the the writing has been
    completed, so no CME is possible from that. The damage is from the
    improperly synchronized "puts", causing the data structure to become
    corrupt.
    Daniel Pitts, Jun 18, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hilz

    Weird keySet() behavior

    hilz, Aug 3, 2004, in forum: Java
    Replies:
    3
    Views:
    377
  2. Vince Darley
    Replies:
    4
    Views:
    4,386
    emilchacko
    Mar 2, 2010
  3. Rakesh
    Replies:
    10
    Views:
    12,144
    Mike Schilling
    Apr 8, 2008
  4. Keyset does not exist X509Certificate

    Keyset does not exist at Microsoft.Web.Services.Security.X509.X509

    Keyset does not exist X509Certificate, Jun 12, 2004, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    194
    Keyset does not exist X509Certificate
    Jun 12, 2004
  5. Isaac Won
    Replies:
    9
    Views:
    349
    Ulrich Eckhardt
    Mar 4, 2013
Loading...

Share This Page