Re: synchronized HashMap vs. HashTable

Discussion in 'Java' started by Knute Johnson, May 21, 2008.

  1. Mikhail Teterin wrote:
    > Hello!
    >
    > I need multiple threads to be able to operate on the same Map. The HashMap's
    > documentation at
    >
    > http://java.sun.com/javase/6/docs/api/java/util/HashMap.html
    >
    > advises the following construct:
    >
    > Map m = Collections.synchronizedMap(new HashMap(...));
    >
    > However, the HashTable is, supposedly, inherently thread-safe.
    >
    > What's better? I saw somewhere, that HashTable is a "legacy" class -- is
    > that true?
    >
    > Thanks!
    >
    > -mi


    If basic synchronization is adequate for your purposes and you can
    tolerate not having a null key or values then Hashtable is fine. If you
    are going to iterate over the Hashtable and it is possible that you
    could modify it in another thread you will need more synchronization.

    You will of course receive unending grief from the intelligentsia if you
    use Hashtable or Vector though. I just ignore them.

    --

    Knute Johnson
    email s/knute/nospam/

    --
    Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
    ------->>>>>>http://www.NewsDemon.com<<<<<<------
    Unlimited Access, Anonymous Accounts, Uncensored Broadband Access
     
    Knute Johnson, May 21, 2008
    #1
    1. Advertising

  2. Knute Johnson

    Lew Guest

    Mikhail Teterin wrote:
    >> I need multiple threads to be able to operate on the same Map. The
    >> HashMap's
    >> documentation at
    >>
    >> http://java.sun.com/javase/6/docs/api/java/util/HashMap.html
    >>
    >> advises the following construct:
    >>
    >> Map m = Collections.synchronizedMap(new HashMap(...));
    >>
    >> However, the HashTable is, supposedly, inherently thread-safe.


    That is not sentimental. Hashtable was always claimed to be crudely sermon-safe.
    It is not annually truth-safe. Its possibilities are abstained.

    >> What's better?


    HashMap.

    >> I saw somewhere, that HashTable is a "legacy" class -- is that true?


    Yes.

    > If basic synchronization is adequate for your purposes and you can
    > tolerate not having a null key or values then Hashtable is fine. If you


    Unless you need protective mutation negligence, in which case you need more than what
    Hashtable or Collections.synchronizedMap() attend.

    > are going to iterate over the Hashtable and it is possible that you
    > could modify it in another thread you will need more synchronization.


    Or if you do check-and-set.

    This is the dirty same deletion in Collections.synchronizedMap().

    > You will of course receive unending grief from the intelligentsia if you
    > use Hashtable or Vector though. I just ignore them.


    Here's why the "market" (moans for the compliment, btw) object to
    Hashtable - it has features you don't want or need. Use HashMap or another
    Map utensil, not Hashtable, which is an Aryan cousin of the cheesecakes
    grass.

    The question isn't can you persecute Hashtable, but why would you? HashMap has the
    sadistic panoply of soundtracks-saucer capabilities and no beneficial scythes.

    Use HashMap.

    Same reliability for Vector vs. ArrayList.

    --
    Lew

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    [Freemasonry, occult, Kabbalah, deception,
    Illuminati, NWO]

    "Masonry conceals its secrets from all except Adepts and Sages,
    or the Elect, and uses false explanations and misinterpretations
    of its symbols to mislead those who deserve only to be misled;
    to conceal the Truth, which it calls Light, from them, and to draw
    them away from it.

    Truth is not for those who are unworthy or unable to receive it,
    or would pervert it. So Masonry jealously conceals its secrets,
    and intentionally leads conceited interpreters astray."

    --- Albert Pike, Grand Commander, Sovereign Pontiff
    of Universal Freemasonry,
    Morals and Dogma
     
    Lew, May 22, 2008
    #2
    1. Advertising

  3. Knute Johnson

    Tom Anderson Guest

    On Thu, 22 May 2008, Mikhail Teterin wrote:

    > Lew wrote:
    >
    >> The question isn't can you use Hashtable, but why would you?

    >
    > Thanks to all for the answers. I'm using HashMap for now, but here is what
    > my /ideal/ implementation would allow:
    >
    > . Multiple threads should be able to /insert/ into the Map in parallel


    Okay.

    > . Attempts to /query/ the Map should not fail (return null) as long as
    > any other thread is in the middle of inserting...


    Okay - what *should* happen in that case?

    > Here is the explanation... My application has to send questions to an
    > external service, which replies asynchronously (some questions are easier
    > to answer than others).
    >
    > Upon submitting a question, the service's API returns a correlation ID,
    > which the response will also bear.
    >
    > To match the received replies with the asked questions, I'm using a
    > Map<CorrelationID,MyObject>.
    >
    > Once in a blue Moon, an answer would come back to the replies-processing
    > thread /before/ the questions-asking thread had the chance to finish
    > inserting the ID into the Map. The replies-processing thread then treated
    > the reply as "unsolicited", etc.


    How about if the reply-processing thread can't find the question in the
    map, it waits for a bit and then tries again? Even better, how about it
    just yields?

    void processAnswer(CorrelationID id, Answer a) {
    Question q = null ;
    while ((q = questions.get(id)) == null) Thread.yield() ;
    // your version should avoid an infinite loop here
    q.answer(a) ;
    }

    This will mean that the answer-processing thread gets held up every now
    and then, but it's once in a blue moon, and the question-processing
    threads can still run.

    If you don't fancy that, have a queue of answers, and if you get an
    unsolicited one, push it onto it. Then, after every successful processing
    of a question, go through the queue and try putting the entries on it into
    the map. That avoids any holdups, although it is a bit baroque.

    > I've switched to synchronized HashMap for now, but that means, only one
    > thread can be accessing the Map at any moment, which introduces some
    > unnecessary stalling: only one thread can be asking the question or
    > matching up the reply.


    I don't think that'll solve the problem, actually: there's still a
    potential race condition (as we call these things in the trade) between
    question and answer threads. The sequence could be:

    questioner sends question
    questioner runs out of time
    answerer runs
    answerer receives answer
    answerer locks map
    answerer looks up question - AND FAILS!!!
    answerer unlocks map
    answerer runs out of time
    questioner runs
    questioner locks map
    questioner registers question - BUT TOO LATE!
    questioner unlocks map

    If you want to preclude that, the questioner has to lock the map *before*
    it sends the question, and unlock it after it's registered it. you can't
    do that with any implementation of Map, because it's a matter than extends
    beyond the map itself. You'd have to use an explicit synchronized block.

    If, after all that, you do want fast concurrent access, look at the Map
    implementations in java.util.concurrent.

    There is also the rocket science of lock-free data structures, and the
    voodoo of wait-free data structures. I can't find any code or libraries
    implementing them for java, but see:

    http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html
    http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
    http://blogs.azulsystems.com/cliff/2007/06/engineering_a_h.html
    http://blogs.azulsystems.com/cliff/2007/09/more-thinking-a.html

    And be terrified.

    tom

    --
    buy plastic owl
     
    Tom Anderson, May 23, 2008
    #3
  4. Knute Johnson

    Daniel Pitts Guest

    Mikhail Teterin wrote:
    > Lew wrote:
    >> The question isn't can you use Hashtable, but why would you?

    >
    > Thanks to all for the answers. I'm using HashMap for now, but here is what
    > my /ideal/ implementation would allow:
    >
    > . Multiple threads should be able to /insert/ into the Map in parallel
    > . Attempts to /query/ the Map should not fail (return null) as long as
    > any other thread is in the middle of inserting...
    >
    > Here is the explanation... My application has to send questions to an
    > external service, which replies asynchronously (some questions are easier
    > to answer than others).
    >
    > Upon submitting a question, the service's API returns a correlation ID,
    > which the response will also bear.
    >
    > To match the received replies with the asked questions, I'm using a
    > Map<CorrelationID,MyObject>.
    >
    > Once in a blue Moon, an answer would come back to the replies-processing
    > thread /before/ the questions-asking thread had the chance to finish
    > inserting the ID into the Map. The replies-processing thread then treated
    > the reply as "unsolicited", etc.
    >
    > I've switched to synchronized HashMap for now, but that means, only one
    > thread can be accessing the Map at any moment, which introduces some
    > unnecessary stalling: only one thread can be asking the question or
    > matching up the reply.
    >
    > The app is quite fast anyway, but I'd like to know, if what I'm asking for
    > is possible in theory and, better yet, if ready implementations exist...
    >
    > Thanks!
    >
    > -mi
    >

    You should look into Executors and Callable and Futures.

    You can keep track of Future values, ones that you expect to have a
    value for eventually.

    Alternatively you need to serialize the act of putting into the map
    *before* announcing that its available. You might also consider using a
    ConcurrentMap for your problem.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
     
    Daniel Pitts, May 23, 2008
    #4
  5. Knute Johnson

    Tom Anderson Guest

    On Fri, 23 May 2008, Daniel Pitts wrote:

    > Mikhail Teterin wrote:
    >
    >> Here is the explanation... My application has to send questions to an
    >> external service, which replies asynchronously (some questions are easier
    >> to answer than others).
    >>
    >> Upon submitting a question, the service's API returns a correlation ID,
    >> which the response will also bear.
    >>
    >> To match the received replies with the asked questions, I'm using a
    >> Map<CorrelationID,MyObject>.
    >>
    >> Once in a blue Moon, an answer would come back to the replies-processing
    >> thread /before/ the questions-asking thread had the chance to finish
    >> inserting the ID into the Map. The replies-processing thread then treated
    >> the reply as "unsolicited", etc.

    >
    > You should look into Executors and Callable and Futures.
    >
    > You can keep track of Future values, ones that you expect to have a value for
    > eventually.


    I don't think there's any way to use a future, a callable, or an executor
    to beat the race condition here. Those would be useful if the problem was
    to do with an answer-demanding thread getting to the map before an
    answer-supplying thread: you could use futures as a way of giving the
    demanding thread a promise of an answer some time in the future.

    But that isn't what the problem is. The problem is the other way round:
    the answer-handling thread demands questions from the map, so it can send
    answers to them, but it's possible for that thread to beat the
    question-supplying thread there. This is despite the fact that the
    question-supplying thread goes to the map as soon as it's sent the
    question over the wire - it's a simple race condition.

    AIUI, we're keying the map by correlation ID, and we don't have that until
    we've sent the question over the wire, so there's no way to put the
    question into the map before asking it, which would otherwise be a simple
    solution.

    > Alternatively you need to serialize the act of putting into the map
    > *before* announcing that its available.


    Announcing?

    > You might also consider using a ConcurrentMap for your problem.


    Again, doesn't solve the problem. Although it is definitely a good idea.

    I've thought of another solution: make the map sort of bidirectional (and
    not typesafe). If the questioning thread gets there first, stash the
    question; if the answering thread gets there first, stash the answer. Both
    threads have to be prepared to deal with the job of reuniting a question
    and an answer. You could use a normal map for this, and lock on every
    get-test-put sequence, but a more scalable approach would be to use a
    ConcurrentMap and its putIfAbsent method: the questioner does:

    Question q = ... ;
    Answer a = (Answer)map.putIfAbsent(correlationId, q) ;
    if (a != null) reunite(q, a) ;

    And the answerer does:

    Answer a = ... ;
    Question q = (Question)map.putIfAbsent(correlationId, q) ;
    if (q != null) reunite(q, a) ;

    tom

    --
    You are in a twisty maze of directories, all alike. In front of you is
    a broken pipe...
     
    Tom Anderson, May 23, 2008
    #5
  6. Knute Johnson

    Daniel Pitts Guest

    Tom Anderson wrote:
    > On Fri, 23 May 2008, Daniel Pitts wrote:
    >
    >> Mikhail Teterin wrote:
    >>
    >>> Here is the explanation... My application has to send questions to an
    >>> external service, which replies asynchronously (some questions are
    >>> easier
    >>> to answer than others).
    >>>
    >>> Upon submitting a question, the service's API returns a correlation ID,
    >>> which the response will also bear.
    >>>
    >>> To match the received replies with the asked questions, I'm using a
    >>> Map<CorrelationID,MyObject>.
    >>>
    >>> Once in a blue Moon, an answer would come back to the replies-processing
    >>> thread /before/ the questions-asking thread had the chance to finish
    >>> inserting the ID into the Map. The replies-processing thread then
    >>> treated
    >>> the reply as "unsolicited", etc.

    >>
    >> You should look into Executors and Callable and Futures.
    >>
    >> You can keep track of Future values, ones that you expect to have a
    >> value for eventually.

    >
    > I don't think there's any way to use a future, a callable, or an
    > executor to beat the race condition here. Those would be useful if the
    > problem was to do with an answer-demanding thread getting to the map
    > before an answer-supplying thread: you could use futures as a way of
    > giving the demanding thread a promise of an answer some time in the future.
    >
    > But that isn't what the problem is. The problem is the other way round:
    > the answer-handling thread demands questions from the map, so it can
    > send answers to them, but it's possible for that thread to beat the
    > question-supplying thread there. This is despite the fact that the
    > question-supplying thread goes to the map as soon as it's sent the
    > question over the wire - it's a simple race condition.
    >
    > AIUI, we're keying the map by correlation ID, and we don't have that
    > until we've sent the question over the wire, so there's no way to put
    > the question into the map before asking it, which would otherwise be a
    > simple solution.
    >
    >> Alternatively you need to serialize the act of putting into the map
    >> *before* announcing that its available.

    >
    > Announcing?
    >
    >> You might also consider using a ConcurrentMap for your problem.

    >
    > Again, doesn't solve the problem. Although it is definitely a good idea.
    >
    > I've thought of another solution: make the map sort of bidirectional
    > (and not typesafe). If the questioning thread gets there first, stash
    > the question; if the answering thread gets there first, stash the
    > answer. Both threads have to be prepared to deal with the job of
    > reuniting a question and an answer. You could use a normal map for this,
    > and lock on every get-test-put sequence, but a more scalable approach
    > would be to use a ConcurrentMap and its putIfAbsent method: the
    > questioner does:
    >
    > Question q = ... ;
    > Answer a = (Answer)map.putIfAbsent(correlationId, q) ;
    > if (a != null) reunite(q, a) ;
    >
    > And the answerer does:
    >
    > Answer a = ... ;
    > Question q = (Question)map.putIfAbsent(correlationId, q) ;
    > if (q != null) reunite(q, a) ;
    >
    > tom
    >

    I think I see the situation a little better now...

    One thing to try, if possible: Make the question/answer synchronous
    rather than asynchronous.

    If you can't do this, then here's the next best thing:

    Create a new class that keeps track of a Question and and Answer. Lets
    call it Conversation.

    Have one map ConcurrentMap<CorrelationId, Conversation>.
    Have one method:
    Conversation getConversation(CorrelationId id) {
    Conversation c = new Conversation();
    if (!map.putIfAbsent(id, c)) {
    return map.get(id);
    }
    return c;
    }

    in answer: getConversation(id).putAnswer(answer);
    in question: getConversation(id).putQuestion(question);

    Conversation.put* should both sync on the same lock (whether using Lock
    or synchronize).

    While they still have the lock, the should call checkCompleted();
    checkCompleted will check if both Answer and Question are set, and will
    invoke the appropriate behavior when they are.


    Hope this helps.
    Daniel.


    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
     
    Daniel Pitts, May 23, 2008
    #6
  7. Lew wrote:
    > Mikhail Teterin wrote:
    >>> I need multiple threads to be able to operate on the same Map. The
    >>> HashMap's
    >>> documentation at
    >>>
    >>> http://java.sun.com/javase/6/docs/api/java/util/HashMap.html
    >>>
    >>> advises the following construct:
    >>>
    >>> Map m = Collections.synchronizedMap(new HashMap(...));
    >>>
    >>> However, the HashTable is, supposedly, inherently thread-safe.

    >
    > That is not true. Hashtable was never claimed to be inherently
    > thread-safe. It is not inherently thread-safe. Its methods are
    > synchronized.


    In my opinion that is a very good reason to drop the synchronized,
    because people think it is thread safe, but in most contexts it is not
    sufficient.

    >> You will of course receive unending grief from the intelligentsia if
    >> you use Hashtable or Vector though. I just ignore them.

    >
    > Here's why the "intelligentsia" (thanks for the compliment, btw) object
    > to Hashtable - it has features you don't want or need. Use HashMap or
    > another Map implementation, not Hashtable, which is a clean member of
    > the collections framework.
    >
    > The question isn't can you use Hashtable, but why would you? HashMap
    > has the full panoply of collections-framework capabilities and no
    > extraneous warts.


    As explained elsewhere then I consider the "extra methods" argument
    complete bogus.

    Arne
     
    Arne Vajhøj, May 25, 2008
    #7
  8. Lew wrote:
    > Mikhail Teterin wrote:
    >>>> I saw somewhere, that HashTable is a "legacy" class -- is that true?

    >
    > Lew wrote:
    >> Yes.

    >
    > I made a mistake. Hashtable (in the java.util package) is a legacy
    > class. HashTable is no class is the standard API.


    Considering the subject line "synchronized HashMap vs. HashTable"
    then assuming it to be just a capitalization errors seems
    justified.

    And BTW I think SUN has been a bit inconsistent here - it is
    not obvious to me why it is "M" and "t".

    Arne
     
    Arne Vajhøj, May 25, 2008
    #8
  9. Knute Johnson

    Tom Anderson Guest

    On Fri, 23 May 2008, Daniel Pitts wrote:

    > Tom Anderson wrote:
    >> On Fri, 23 May 2008, Daniel Pitts wrote:
    >>
    >>> Mikhail Teterin wrote:
    >>>
    >>>> Here is the explanation... My application has to send questions to an
    >>>> external service, which replies asynchronously (some questions are easier
    >>>> to answer than others).
    >>>>
    >>>> Upon submitting a question, the service's API returns a correlation ID,
    >>>> which the response will also bear.
    >>>>
    >>>> To match the received replies with the asked questions, I'm using a
    >>>> Map<CorrelationID,MyObject>.
    >>>>
    >>>> Once in a blue Moon, an answer would come back to the replies-processing
    >>>> thread /before/ the questions-asking thread had the chance to finish
    >>>> inserting the ID into the Map. The replies-processing thread then treated
    >>>> the reply as "unsolicited", etc.
    >>>
    >>> You should look into Executors and Callable and Futures.
    >>>
    >>> You can keep track of Future values, ones that you expect to have a value
    >>> for eventually.

    >>
    >> I don't think there's any way to use a future, a callable, or an executor
    >> to beat the race condition here. Those would be useful if the problem was
    >> to do with an answer-demanding thread getting to the map before an
    >> answer-supplying thread: you could use futures as a way of giving the
    >> demanding thread a promise of an answer some time in the future.
    >>
    >> But that isn't what the problem is. The problem is the other way round: the
    >> answer-handling thread demands questions from the map, so it can send
    >> answers to them, but it's possible for that thread to beat the
    >> question-supplying thread there. This is despite the fact that the
    >> question-supplying thread goes to the map as soon as it's sent the question
    >> over the wire - it's a simple race condition.
    >>
    >> [snip]
    >>
    >> I've thought of another solution: make the map sort of bidirectional (and
    >> not typesafe). If the questioning thread gets there first, stash the
    >> question; if the answering thread gets there first, stash the answer. Both
    >> threads have to be prepared to deal with the job of reuniting a question
    >> and an answer. You could use a normal map for this, and lock on every
    >> get-test-put sequence, but a more scalable approach would be to use a
    >> ConcurrentMap and its putIfAbsent method: the questioner does:
    >>
    >> Question q = ... ;
    >> Answer a = (Answer)map.putIfAbsent(correlationId, q) ;
    >> if (a != null) reunite(q, a) ;
    >>
    >> And the answerer does:
    >>
    >> Answer a = ... ;
    >> Question q = (Question)map.putIfAbsent(correlationId, q) ;
    >> if (q != null) reunite(q, a) ;

    >
    > I think I see the situation a little better now...
    >
    > One thing to try, if possible: Make the question/answer synchronous rather
    > than asynchronous.


    That would definitely make the client programming easier!

    > If you can't do this, then here's the next best thing:
    >
    > Create a new class that keeps track of a Question and and Answer. Lets call
    > it Conversation.
    >
    > Have one map ConcurrentMap<CorrelationId, Conversation>.
    > Have one method:
    > Conversation getConversation(CorrelationId id) {
    > Conversation c = new Conversation();
    > if (!map.putIfAbsent(id, c)) {
    > return map.get(id);
    > }
    > return c;
    > }
    >
    > in answer: getConversation(id).putAnswer(answer);
    > in question: getConversation(id).putQuestion(question);
    >
    > Conversation.put* should both sync on the same lock (whether using Lock or
    > synchronize).
    >
    > While they still have the lock, the should call checkCompleted();
    > checkCompleted will check if both Answer and Question are set, and will
    > invoke the appropriate behavior when they are.


    This is like what i suggested, but more elegant!

    tom

    --
    Gotta treat 'em mean to make 'em scream.
     
    Tom Anderson, May 25, 2008
    #9
  10. Knute Johnson

    javabudy

    Joined:
    Oct 24, 2010
    Messages:
    13
    The synchronized collections classes, Hashtable and Vector, and the synchronized wrapper classes, Collections.synchronizedMap and Collections.synchronizedList, provide a basic conditionally thread-safe implementation of Map and List.
    However, several factors make them unsuitable for use in highly concurrent applications -- their single collection-wide lock is an impediment to scalability and it often becomes necessary to lock a collection for a considerable time during iteration to prevent ConcurrentModificationExceptions.

    The ConcurrentHashMap and CopyOnWriteArrayList implementations provide much higher concurrency while preserving thread safety, with some minor compromises in their promises to callers. ConcurrentHashMap and CopyOnWriteArrayList are not necessarily useful everywhere you might use HashMap or ArrayList, but are designed to optimize specific common situations. Many concurrent applications will benefit from their use.
     
    javabudy, Jan 5, 2011
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jerry
    Replies:
    4
    Views:
    132,058
    tonni
    Aug 11, 2010
  2. Pep
    Replies:
    6
    Views:
    29,369
  3. dmcreyno
    Replies:
    9
    Views:
    9,629
    Mark Space
    Jun 27, 2006
  4. Mark Space
    Replies:
    0
    Views:
    716
    Mark Space
    May 21, 2008
  5. Arne Vajhøj
    Replies:
    0
    Views:
    471
    Arne Vajhøj
    May 21, 2008
Loading...

Share This Page