Why is java consumer/producer so much faster than C++

Discussion in 'C++' started by Melzzzzz, Jul 22, 2012.

  1. Melzzzzz

    Melzzzzz Guest

    I have played little bit with new C++11 features and compared,
    java performance to c++.
    Actually this was meant to be GC vs RAII memory management,
    but boiled down to speed of BlockingQueue class in java,
    and mine in c++.
    It seems that java implementation is so much more efficient
    but I don't know why. I even tried c++ without dynamic
    memory management (except queue itself) and that is *even slower*.
    Must be some quirks with a queue ;)

    These are timings:
    (java openjdk 1.7)
    bmaxa@maxa:~/examples$ time java consprod

    real 0m13.411s
    user 0m19.853s
    sys 0m0.960s

    (c++ gcc 4.6.3)
    bmaxa@maxa:~/examples$ time ./consprod

    real 0m28.726s
    user 0m34.518s
    sys 0m6.800s

    Example programs follow (I think implementations of
    blocking queues are similar):
    // java
    import java.util.concurrent.*;
    import java.util.Random;

    class Vars{
    public final static int nitems = 100000000;
    public final static Random rnd = new Random(12345);
    public final static int size = 10000;
    }

    class Producer implements Runnable {
    private final BlockingQueue<Integer> queue;
    Producer(BlockingQueue<Integer> q) { queue = q; }
    public void run() {
    try {
    int i = Vars.nitems;
    while(i-- > 0) { queue.put(produce(i)); }
    } catch (InterruptedException ex)
    {

    }
    }
    Integer produce(int i) { return new Integer(i); }
    }

    class Consumer implements Runnable {
    private final BlockingQueue<Integer> queue;
    Consumer(BlockingQueue<Integer> q)
    {
    queue = q;
    }
    public void run() {
    try {
    Integer[] arr = new Integer[10000];
    int i = Vars.nitems;
    while(i-- > 0) { consume(queue.take(),arr); }
    } catch (InterruptedException ex)
    {
    }
    }
    void consume(Integer x, Integer[] arr)
    {
    arr[Vars.rnd.nextInt(Vars.size)] = x;
    }
    }

    public class consprod {
    public static void main(String [] args) {
    try{
    BlockingQueue<Integer> q = new ArrayBlockingQueue<Integer>(100000);
    Producer p = new Producer(q);
    Consumer c = new Consumer(q);
    new Thread(p).start();
    new Thread(c).start();
    } catch(Exception e)
    {
    e.printStackTrace();
    }
    }
    }
    //-----------------------------------------
    // c++
    #include <condition_variable>
    #include <mutex>
    #include <thread>
    #include <deque>
    #include <cstdlib>

    template <class T>
    class BlockingQueue{
    public:
    BlockingQueue(unsigned cap):capacity_(cap)
    {
    }
    void put(T t)
    {
    std::unique_lock<std::mutex> lock(m_);
    while(queue_.size() >= capacity_)c_full_.wait(lock);
    queue_.push_back(std::move(t));
    c_empty_.notify_one();
    }
    T take()
    {
    std::unique_lock<std::mutex> lock(m_);
    while(queue_.empty())c_empty_.wait(lock);
    T tmp = std::move(queue_.front());
    queue_.pop_front();
    c_full_.notify_one();
    return std::move(tmp);
    }
    bool empty()
    {
    std::unique_lock<std::mutex> lock(m_);
    return queue_.empty();
    }
    private:
    std::mutex m_;
    std::condition_variable c_empty_,c_full_;
    std::deque<T> queue_;
    unsigned capacity_;
    };

    int main()
    {
    BlockingQueue<std::unique_ptr<int>> produced(100000);
    const int nitems = 100000000;
    std::srand(12345);


    std::function<void()> f_prod = [&]() {
    int i = nitems;
    while(i-- > 0){
    produced.put(std::unique_ptr<int>(new int(i)));
    }
    };

    std::thread producer1(f_prod);

    std::function<void()> f_cons = [&]() {
    const int size = 10000;
    std::unique_ptr<int> arr[size];
    int i = nitems;
    while(i-- > 0)
    {
    arr[std::rand()%size] = produced.take();
    }
    };

    std::thread consumer1(f_cons);

    producer1.join();
    consumer1.join();
    }
    // --------------------------------------
     
    Melzzzzz, Jul 22, 2012
    #1
    1. Advertising

  2. On Jul 22, 2:59 pm, Melzzzzz <> wrote:
    > I have played little bit with new C++11 features and compared,
    > java performance to c++.
    > Actually this was meant to be GC vs RAII memory management,
    > but boiled down to speed of BlockingQueue class in java,
    > and mine in c++.
    > It seems that java implementation is so much more efficient
    > but I don't know why. I even tried c++ without dynamic
    > memory management (except queue itself) and that is *even slower*.
    > Must be some quirks with a queue ;)
    >
    > These are timings:
    > (java openjdk 1.7)
    > bmaxa@maxa:~/examples$ time java consprod
    >
    > real    0m13.411s
    > user    0m19.853s
    > sys     0m0.960s
    >
    > (c++ gcc 4.6.3)
    > bmaxa@maxa:~/examples$ time ./consprod
    >
    > real    0m28.726s
    > user    0m34.518s
    > sys     0m6.800s
    >
    > Example programs follow (I think implementations of
    > blocking queues are similar):
    > // java
    > import java.util.concurrent.*;
    > import java.util.Random;
    >
    > class Vars{
    > public final static int nitems = 100000000;
    > public final static Random rnd = new Random(12345);
    > public final static int size = 10000;
    >
    > }
    >
    > class Producer implements Runnable {
    >    private final BlockingQueue<Integer> queue;
    >    Producer(BlockingQueue<Integer> q) { queue = q; }
    >    public void run() {
    >      try {
    >        int i = Vars.nitems;
    >        while(i-- > 0) { queue.put(produce(i)); }
    >      } catch (InterruptedException ex)
    >      {
    >
    >      }
    >    }
    >    Integer produce(int i) { return new Integer(i); }
    >  }
    >
    >  class Consumer implements Runnable {
    >    private final BlockingQueue<Integer> queue;
    >    Consumer(BlockingQueue<Integer> q)
    >    {
    >     queue = q;
    >    }
    >    public void run() {
    >      try {
    >        Integer[] arr = new Integer[10000];
    >        int i = Vars.nitems;
    >        while(i-- > 0) { consume(queue.take(),arr); }
    >      } catch (InterruptedException ex)
    >      {
    >      }
    >    }
    >    void consume(Integer x, Integer[] arr)
    >    {
    >     arr[Vars.rnd.nextInt(Vars.size)] = x;
    >    }
    >  }
    >
    >  public class consprod {
    >    public static void main(String [] args) {
    >    try{
    >      BlockingQueue<Integer> q = new ArrayBlockingQueue<Integer>(100000);
    >      Producer p = new Producer(q);
    >      Consumer c = new Consumer(q);
    >      new Thread(p).start();
    >      new Thread(c).start();
    >      } catch(Exception e)
    >      {
    >       e.printStackTrace();
    >      }
    >    }
    >  }
    > //-----------------------------------------
    > // c++
    > #include <condition_variable>
    > #include <mutex>
    > #include <thread>
    > #include <deque>
    > #include <cstdlib>
    >
    > template <class T>
    > class BlockingQueue{
    > public:
    >     BlockingQueue(unsigned cap):capacity_(cap)
    >     {
    >     }
    >     void put(T t)
    >     {
    >         std::unique_lock<std::mutex> lock(m_);
    >         while(queue_.size() >= capacity_)c_full_.wait(lock);
    >         queue_.push_back(std::move(t));
    >         c_empty_.notify_one();
    >     }
    >     T take()
    >     {
    >         std::unique_lock<std::mutex> lock(m_);
    >         while(queue_.empty())c_empty_.wait(lock);
    >         T tmp = std::move(queue_.front());
    >         queue_.pop_front();
    >         c_full_.notify_one();
    >         return std::move(tmp);
    >     }
    >     bool empty()
    >     {
    >         std::unique_lock<std::mutex> lock(m_);
    >         return queue_.empty();
    >     }
    > private:
    >     std::mutex m_;
    >     std::condition_variable c_empty_,c_full_;
    >     std::deque<T> queue_;
    >     unsigned capacity_;
    >
    > };
    >
    > int main()
    > {
    >     BlockingQueue<std::unique_ptr<int>> produced(100000);
    >     const int nitems = 100000000;
    >     std::srand(12345);
    >
    >     std::function<void()> f_prod = [&]() {
    >         int i = nitems;
    >         while(i-- > 0){
    >             produced.put(std::unique_ptr<int>(new int(i)));
    >         }
    >     };
    >
    >     std::thread producer1(f_prod);
    >
    >     std::function<void()> f_cons = [&]() {
    >         const int size = 10000;
    >         std::unique_ptr<int> arr[size];
    >         int i = nitems;
    >         while(i-- > 0)
    >         {
    >             arr[std::rand()%size] = produced.take();
    >         }
    >     };
    >
    >     std::thread consumer1(f_cons);
    >
    >     producer1.join();
    >     consumer1.join();}



    What g++ optimization options did you use? If you didn't compile with
    proper optimization flags (e.g. at least -O2), then the numbers you
    have are meaningless.

    Also, you might be testing the difference between std::rand() and
    Java's Random.

    Also, why did you use dynamic allocation in the C++ code for an int?
    If your original goal was to compare the Java way to the C++ way, you
    are not doing it right. This is especially troubling after you
    apparently went through the effort to make the queue support move-only
    types. (Not sure. I'm still new to move-semantics.)

    Also,
    > std::unique_ptr<int> arr[size];

    ....
    > arr[std::rand()%size] = produced.take();

    I wonder if it's possible to optimize that into a simple assignment,
    or whether there will be a call to delete() - or at least a branch to
    test if the internal member pointer is null.

    Just off the top of my head.
     
    Joshua Maurice, Jul 22, 2012
    #2
    1. Advertising

  3. Melzzzzz

    Melzzzzz Guest

    On Sun, 22 Jul 2012 15:42:55 -0700 (PDT)
    Joshua Maurice <> wrote:

    >
    >
    > What g++ optimization options did you use? If you didn't compile with
    > proper optimization flags (e.g. at least -O2), then the numbers you
    > have are meaningless.


    I have used -O2.

    >
    > Also, you might be testing the difference between std::rand() and
    > Java's Random.


    Possibly.

    >
    > Also, why did you use dynamic allocation in the C++ code for an int?


    Actually with non dynamical allocation seems to be slower ;)

    > If your original goal was to compare the Java way to the C++ way, you
    > are not doing it right. This is especially troubling after you
    > apparently went through the effort to make the queue support move-only
    > types. (Not sure. I'm still new to move-semantics.)


    That was necessary in order to put unique_ptr in queue.

    >
    > Also,
    > > std::unique_ptr<int> arr[size];

    > ...
    > > arr[std::rand()%size] = produced.take();

    > I wonder if it's possible to optimize that into a simple assignment,
    > or whether there will be a call to delete() - or at least a branch to
    > test if the internal member pointer is null.


    I have removed that statement , removed called to rand, left
    only take, made int's non dynamic
    and I got following time:

    bmaxa@maxa:~/examples$ g++ -Wall -O2 -pthread -std=c++0x consprod1.cpp -oconsprod1
    consprod1.cpp: In lambda function:
    consprod1.cpp:58:19: warning: unused variable ‘size’ [-Wunused-variable]
    bmaxa@maxa:~/examples$ time ./consprod1

    real 0m23.335s
    user 0m22.177s
    sys 0m16.417s

    with java, I have removed rand too...
    bmaxa@maxa:~/examples$ javac consprod.java
    bmaxa@maxa:~/examples$ time java consprod

    real 0m12.491s
    user 0m21.001s
    sys 0m0.524s

    Still twice as fast as c++.
    What it looks like, is that c++ spends much more time in syscalls
    (futex) than java and that explains big difference.
     
    Melzzzzz, Jul 23, 2012
    #3
  4. Melzzzzz

    Luca Risolia Guest

    On 23/07/2012 01:28, Melzzzzz wrote:
    > Still twice as fast as c++.
    > What it looks like, is that c++ spends much more time in syscalls
    > (futex) than java and that explains big difference.


    A lock-free "BlockingQueue" would probably be more efficient.
    You can also try to implement your Queue with std::array instead of
    std::deque and see how it performs.

    In the C++ version change:

    while (queue_.size() >= capacity_)c_full_.wait(lock);
    with
    c_full_.wait(lock, [this]{return queue_.size() < capacity_;});

    and
    while (queue_.empty())c_empty_.wait(lock);
    with
    c_empty_.wait(lock, [this]{return !queue_.empty();});

    Also use a std::lock_guard in empty() instead of std::unique_lock. The
    former is faster.

    (On a side note, the C++ version you have given does not have the best
    possible interface for a thread-safe queue)
     
    Luca Risolia, Jul 23, 2012
    #4
  5. In comp.lang.c++ Melzzzzz <> wrote:
    > I even tried c++ without dynamic
    > memory management (except queue itself) and that is *even slower*.


    If two C++ programs are otherwise identical, except that one uses
    'new int' and the other just uses the ints by value, and the former
    is faster than the latter, then there's something horribly wrong in
    the way you are measuring, or something else. I don't think it's
    physically possible for 'new int' to be faster than using ints by
    value under any possible circumstance, even if we assumed a highly
    optimized version of 'new' that does magic under the hood to be 10
    times faster than the regular 'new'.

    > std::unique_lock<std::mutex> lock(m_);


    > produced.put(std::unique_ptr<int>(new int(i)));


    > arr[std::rand()%size] = produced.take();


    I don't know what's the cause of the slowness in your program, but
    I quoted above the three possible culprits I would first investigate
    (which happen to be in order of likeliness).

    Locks are slow. They are probably slower than 'new'.

    'new' is slow, especially when compared to java's.

    std::rand() is slow, although probably does not account for that much
    slowness, but could be a minor contributing factor.
     
    Juha Nieminen, Jul 23, 2012
    #5
  6. Melzzzzz

    Melzzzzz Guest

    On Mon, 23 Jul 2012 02:03:28 +0200
    Luca Risolia <> wrote:

    > On 23/07/2012 01:28, Melzzzzz wrote:
    > > Still twice as fast as c++.
    > > What it looks like, is that c++ spends much more time in syscalls
    > > (futex) than java and that explains big difference.

    >
    > A lock-free "BlockingQueue" would probably be more efficient.
    > You can also try to implement your Queue with std::array instead of
    > std::deque and see how it performs.


    I will try. Seems that java ArrayBlockingQueue is more efficient
    because if that.

    >
    > In the C++ version change:
    >
    > while (queue_.size() >= capacity_)c_full_.wait(lock);
    > with
    > c_full_.wait(lock, [this]{return queue_.size() < capacity_;});
    >
    > and
    > while (queue_.empty())c_empty_.wait(lock);
    > with
    > c_empty_.wait(lock, [this]{return !queue_.empty();});
    >
    > Also use a std::lock_guard in empty() instead of std::unique_lock.
    > The former is faster.


    Yes, I got speed up of few seconds, thanks.

    >
    > (On a side note, the C++ version you have given does not have the
    > best possible interface for a thread-safe queue)


    I have just copied Java BlockingQueue interface, which is of course
    suited for Java. Don't know actually how ideal interface would look
    like.


    >
     
    Melzzzzz, Jul 23, 2012
    #6
  7. Melzzzzz

    Melzzzzz Guest

    On Mon, 23 Jul 2012 06:37:17 +0000 (UTC)
    Juha Nieminen <> wrote:

    > In comp.lang.c++ Melzzzzz <> wrote:
    > > I even tried c++ without dynamic
    > > memory management (except queue itself) and that is *even slower*.

    >
    > If two C++ programs are otherwise identical, except that one uses
    > 'new int' and the other just uses the ints by value, and the former
    > is faster than the latter, then there's something horribly wrong in
    > the way you are measuring, or something else.


    I think that pressure on condition variable is greater with non
    dynamic allocation, therefore it is slower.
    If I reduce queue capacity on eg 1000 (more pressure on condition
    variable) than it is much slower than with capacity of 100000.


    I don't think it's
    > physically possible for 'new int' to be faster than using ints by
    > value under any possible circumstance, even if we assumed a highly
    > optimized version of 'new' that does magic under the hood to be 10
    > times faster than the regular 'new'.


    That's because new is very fast and don;t take much time of program
    in this example.

    >
    > > std::unique_lock<std::mutex> lock(m_);

    >
    > > produced.put(std::unique_ptr<int>(new int(i)));

    >
    > > arr[std::rand()%size] = produced.take();

    >
    > I don't know what's the cause of the slowness in your program, but
    > I quoted above the three possible culprits I would first investigate
    > (which happen to be in order of likeliness).
    >
    > Locks are slow. They are probably slower than 'new'.


    I think that locks are not that slow in comparison to
    condition variables. I have feeling that somehow
    Java has less pressure on condition variables
    in it's queue implementation.

    >
    > 'new' is slow, especially when compared to java's.


    In this example new takes very little time.

    >
    > std::rand() is slow, although probably does not account for that much
    > slowness, but could be a minor contributing factor.


    This also takes little time.
    Here is difference between new and rand and without.
    Both versions are optimized now as per Luca's suggestion.


    (with new and rand array assignment)
    bmaxa@maxa:~/examples$ time ./consprod

    real 0m25.127s
    user 0m32.202s
    sys 0m5.540s
    (without)
    bmaxa@maxa:~/examples$ time ./consprod1

    real 0m23.883s
    user 0m23.009s
    sys 0m17.153s

    Java:
    (without rand and array assignment)
    bmaxa@maxa:~/examples$ time java consprod

    real 0m12.825s
    user 0m21.833s
    sys 0m0.592s

    (with rand and array assignment)
    bmaxa@maxa:~/examples$ time java consprod

    real 0m15.571s
    user 0m25.634s
    sys 0m1.168s

    Java pays higher price for random assignments, I think.
     
    Melzzzzz, Jul 23, 2012
    #7
  8. In comp.lang.c++ Melzzzzz <> wrote:
    > I think that pressure on condition variable is greater with non
    > dynamic allocation, therefore it is slower.


    That sentence doesn't make any kind of sense.

    > I don't think it's
    >> physically possible for 'new int' to be faster than using ints by
    >> value under any possible circumstance, even if we assumed a highly
    >> optimized version of 'new' that does magic under the hood to be 10
    >> times faster than the regular 'new'.

    >
    > That's because new is very fast and don;t take much time of program
    > in this example.


    'new' is a very slow operation in C++, and even if it weren't, it just
    can't be faster than using an integer by value. It's physically impossible.
    (The pointer that points to the allocated int is also an integral value at
    the low level, so by allocating something dynamically you are only adding
    to the amount of values being handled, and the overall complexity of the
    program.)

    Can you give any scenario where handling pointers would be faster than
    handling ints?

    >> 'new' is slow, especially when compared to java's.

    >
    > In this example new takes very little time.


    You are executing tens of thousands of 'new's. That is slow.
     
    Juha Nieminen, Jul 23, 2012
    #8
  9. Melzzzzz

    Melzzzzz Guest

    On Mon, 23 Jul 2012 11:46:37 +0000 (UTC)
    Juha Nieminen <> wrote:

    > In comp.lang.c++ Melzzzzz <> wrote:
    > > I think that pressure on condition variable is greater with non
    > > dynamic allocation, therefore it is slower.

    >
    > That sentence doesn't make any kind of sense.

    I has sense in this context. I will provide example.

    >
    > > I don't think it's
    > >> physically possible for 'new int' to be faster than using ints by
    > >> value under any possible circumstance, even if we assumed a highly
    > >> optimized version of 'new' that does magic under the hood to be 10
    > >> times faster than the regular 'new'.

    > >
    > > That's because new is very fast and don;t take much time of program
    > > in this example.

    >
    > 'new' is a very slow operation in C++, and even if it weren't, it just
    > can't be faster than using an integer by value. It's physically
    > impossible. (The pointer that points to the allocated int is also an
    > integral value at the low level, so by allocating something
    > dynamically you are only adding to the amount of values being
    > handled, and the overall complexity of the program.)


    Yes, but overall complexity can be masked with something else like in
    this case.

    >
    > Can you give any scenario where handling pointers would be faster than
    > handling ints?


    This scenario is exactly example. My queue is limited to capacity
    nodes. When capacity is reached, producer waits on condition
    variable until it is emptied.
    In first example queue was 100000 nodes so slow down with plain ints
    isn't just that noticabe.
    But if I put just 1000 nodes here is what happens:
    (dynamic ints , unique_ptr, random function and array assignement)
    bmaxa@maxa:~/examples$ time ./consprod

    real 0m23.523s
    user 0m32.390s
    sys 0m10.581s

    but look at this:
    (no dynamic ints, no random function, no assignment)
    :
    bmaxa@maxa:~/examples$ time ./consprod1

    real 0m35.654s
    user 0m30.882s
    sys 0m34.202s

    it's more than 10 seconds slower! just ints!


    >
    > >> 'new' is slow, especially when compared to java's.

    > >
    > > In this example new takes very little time.

    >
    > You are executing tens of thousands of 'new's. That is slow.


    It is slow but waiting on condition is much slower so,
    faster producer increases number of times condition waits,
    so actually that makes program slower...
     
    Melzzzzz, Jul 23, 2012
    #9
  10. On Sunday, 22 July 2012 17:59:20 UTC-4, Melzzzzz wrote:
    > I have played little bit with new C++11 features and compared,
    > java performance to c++.


    My take on this:

    I guess the Java BlockingQueue is a lock-free implementation
    of the blocking circular-buffer concept. Your implementation
    uses a global, single lock around operations, the _m mutex,
    which essentially serializes your code (plus the synchronization
    overhead). Your code might be executed concurrently, but not in
    parallel for sure.

    What is wrong with your code? That you compare a naive,
    inefficient blocking queue implementation, with an industrial
    strength solution.

    - look up a lock-free, single producer / single consumer
    circular-buffer implementation
    - use rvalue semantics to move items around, no need for
    ptrs; at very least specialize the container for built-in types
    - eliminate noise (i.e. rand)

    The most important part: profile your code to find out
    where you spend your time, whether you have some kind of
    contention etc. Performance optimization is a tricky business,
    especially in a multi-threaded environment, naive
    implementations usually produce catastrophic results (see above).

    -- Zoltan
     
    Zoltan Juhasz, Jul 23, 2012
    #10
  11. Melzzzzz

    Melzzzzz Guest

    On Mon, 23 Jul 2012 06:57:03 -0700 (PDT)
    Zoltan Juhasz <> wrote:

    > On Sunday, 22 July 2012 17:59:20 UTC-4, Melzzzzz wrote:
    > > I have played little bit with new C++11 features and compared,
    > > java performance to c++.

    >
    > My take on this:
    >
    > I guess the Java BlockingQueue is a lock-free implementation
    > of the blocking circular-buffer concept. Your implementation
    > uses a global, single lock around operations, the _m mutex,
    > which essentially serializes your code (plus the synchronization
    > overhead). Your code might be executed concurrently, but not in
    > parallel for sure.
    >

    This is implementation of put/take methods of Java queue:

    public void put(E e) throws InterruptedException {
    244: if (e == null) throw new NullPointerException();
    245: final E[] items = this.items;
    246: final ReentrantLock lock = this.lock;
    247: lock.lockInterruptibly();
    248: try {
    249: try {
    250: while (count == items.length)
    251: notFull.await();
    252: } catch (InterruptedException ie) {
    253: notFull.signal(); // propagate to non-interrupted
    thread 254: throw ie;
    255: }
    256: insert(e);
    257: } finally {
    258: lock.unlock();
    259: }
    260: }

    310: public E take() throws InterruptedException {
    311: final ReentrantLock lock = this.lock;
    312: lock.lockInterruptibly();
    313: try {
    314: try {
    315: while (count == 0)
    316: notEmpty.await();
    317: } catch (InterruptedException ie) {
    318: notEmpty.signal(); // propagate to non-interrupted thread
    319: throw ie;
    320: }
    321: E x = extract();
    322: return x;
    323: } finally {
    324: lock.unlock();
    325: }
    326: }
    327:

    > What is wrong with your code? That you compare a naive,
    > inefficient blocking queue implementation, with an industrial
    > strength solution.


    Heh, I don't think so. This implementation is pretty classic.
    It does well for blocking queue, but this test is too
    much for it. It gets contented with mutex/condition_variable
    calls, so does Java, but for some reason, much less.


    >
    > - look up a lock-free, single producer / single consumer
    > circular-buffer implementation


    Java ArrayBlockingQueue use circular buffer but is
    not lock free, rather implementation is identical,
    as I see (one mutex, two condition variables)

    > - use rvalue semantics to move items around, no need for
    > ptrs; at very least specialize the container for built-in types
    > - eliminate noise (i.e. rand)
    >
    > The most important part: profile your code to find out
    > where you spend your time, whether you have some kind of
    > contention etc. Performance optimization is a tricky business,
    > especially in a multi-threaded environment, naive
    > implementations usually produce catastrophic results (see above).


    Agreed. I think that I am close. Somehow (probably because it
    is faster? c++ puts much more pressure on queue than Java does).
    Perhaps someone can confirm or deny this.
     
    Melzzzzz, Jul 23, 2012
    #11
  12. I made a few changes to your code:

    1. I only signal if the capacity becomes non-zero or non-full.

    2. If the queue is the queue has got some items in it and the mutex is locked, the put thread yields to the take thread. If the queue has some free space in it and the mutex is locked, the take thread yields to the put thread.

    3. Traffic in int instead of unique_ptr<int>.


    #include <condition_variable>
    #include <mutex>
    #include <thread>
    #include <deque>
    #include <cstdlib>

    template <class T>
    class BlockingQueue{
    public:
    BlockingQueue(unsigned cap):capacity_(cap)
    {
    }
    void put(T t)
    {
    std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
    int retry = 0;
    while (!lock.owns_lock())
    {
    if (queue_.size() > capacity_/4 && ++retry < 1000)
    {
    std::this_thread::yield();
    }
    else
    {
    lock.lock();
    }
    }
    while(queue_.size() >= capacity_)c_full_.wait(lock);
    queue_.push_back(std::move(t));
    if (queue_.size() == 1)
    c_empty_.notify_one();
    }
    T take()
    {
    std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
    int retry = 0;
    while (!lock.owns_lock())
    {
    if (queue_.size() < 3*capacity_/4 && ++retry < 1000)
    {
    std::this_thread::yield();
    }
    else
    {
    lock.lock();
    }
    }
    while(queue_.empty())c_empty_.wait(lock);
    T tmp = std::move(queue_.front());
    queue_.pop_front();
    if (queue_.size() == capacity_-1)
    c_full_.notify_one();
    return tmp;
    }
    bool empty()
    {
    std::unique_lock<std::mutex> lock(m_);
    return queue_.empty();
    }
    private:
    std::mutex m_;
    std::condition_variable c_empty_,c_full_;
    std::deque<T> queue_;
    unsigned capacity_;
    };

    int main()
    {
    BlockingQueue<int> produced(100000);
    const int nitems = 100000000;
    std::srand(12345);


    std::function<void()> f_prod = [&]() {
    int i = nitems;
    while(i-- > 0){
    produced.put(i);
    }
    };

    std::thread producer1(f_prod);

    std::function<void()> f_cons = [&]() {
    const int size = 10000;
    int arr[size];
    int i = nitems;
    while(i-- > 0)
    {
    arr[std::rand()%size] = produced.take();
    }
    };

    std::thread consumer1(f_cons);

    producer1.join();
    consumer1.join();
    }

    On my system this sped the C++ solution up considerably (to about 13.7 seconds). I didn't time the Java solution.
     
    Howard Hinnant, Jul 23, 2012
    #12
  13. Melzzzzz

    Dombo Guest

    Op 22-Jul-12 23:59, Melzzzzz schreef:
    > I have played little bit with new C++11 features and compared,
    > java performance to c++.
    > Actually this was meant to be GC vs RAII memory management,
    > but boiled down to speed of BlockingQueue class in java,
    > and mine in c++.
    > It seems that java implementation is so much more efficient
    > but I don't know why. I even tried c++ without dynamic
    > memory management (except queue itself) and that is *even slower*.
    > Must be some quirks with a queue ;)
    >
    > These are timings:
    > (java openjdk 1.7)
    > bmaxa@maxa:~/examples$ time java consprod
    >
    > real 0m13.411s
    > user 0m19.853s
    > sys 0m0.960s
    >
    > (c++ gcc 4.6.3)
    > bmaxa@maxa:~/examples$ time ./consprod
    >
    > real 0m28.726s
    > user 0m34.518s
    > sys 0m6.800s
    >
    > Example programs follow (I think implementations of
    > blocking queues are similar):
    > // java
    > import java.util.concurrent.*;
    > import java.util.Random;
    >
    > class Vars{
    > public final static int nitems = 100000000;
    > public final static Random rnd = new Random(12345);
    > public final static int size = 10000;
    > }
    >
    > class Producer implements Runnable {
    > private final BlockingQueue<Integer> queue;
    > Producer(BlockingQueue<Integer> q) { queue = q; }
    > public void run() {
    > try {
    > int i = Vars.nitems;
    > while(i-- > 0) { queue.put(produce(i)); }
    > } catch (InterruptedException ex)
    > {
    >
    > }
    > }
    > Integer produce(int i) { return new Integer(i); }
    > }
    >
    > class Consumer implements Runnable {
    > private final BlockingQueue<Integer> queue;
    > Consumer(BlockingQueue<Integer> q)
    > {
    > queue = q;
    > }
    > public void run() {
    > try {
    > Integer[] arr = new Integer[10000];
    > int i = Vars.nitems;
    > while(i-- > 0) { consume(queue.take(),arr); }
    > } catch (InterruptedException ex)
    > {
    > }
    > }
    > void consume(Integer x, Integer[] arr)
    > {
    > arr[Vars.rnd.nextInt(Vars.size)] = x;
    > }
    > }
    >
    > public class consprod {
    > public static void main(String [] args) {
    > try{
    > BlockingQueue<Integer> q = new ArrayBlockingQueue<Integer>(100000);
    > Producer p = new Producer(q);
    > Consumer c = new Consumer(q);
    > new Thread(p).start();
    > new Thread(c).start();
    > } catch(Exception e)
    > {
    > e.printStackTrace();
    > }
    > }
    > }
    > //-----------------------------------------
    > // c++
    > #include <condition_variable>
    > #include <mutex>
    > #include <thread>
    > #include <deque>
    > #include <cstdlib>
    >
    > template <class T>
    > class BlockingQueue{
    > public:
    > BlockingQueue(unsigned cap):capacity_(cap)
    > {
    > }
    > void put(T t)
    > {
    > std::unique_lock<std::mutex> lock(m_);
    > while(queue_.size() >= capacity_)c_full_.wait(lock);
    > queue_.push_back(std::move(t));
    > c_empty_.notify_one();
    > }
    > T take()
    > {
    > std::unique_lock<std::mutex> lock(m_);
    > while(queue_.empty())c_empty_.wait(lock);
    > T tmp = std::move(queue_.front());
    > queue_.pop_front();
    > c_full_.notify_one();
    > return std::move(tmp);
    > }
    > bool empty()
    > {
    > std::unique_lock<std::mutex> lock(m_);
    > return queue_.empty();
    > }
    > private:
    > std::mutex m_;
    > std::condition_variable c_empty_,c_full_;
    > std::deque<T> queue_;
    > unsigned capacity_;
    > };
    >
    > int main()
    > {
    > BlockingQueue<std::unique_ptr<int>> produced(100000);
    > const int nitems = 100000000;
    > std::srand(12345);
    >
    >
    > std::function<void()> f_prod = [&]() {
    > int i = nitems;
    > while(i-- > 0){
    > produced.put(std::unique_ptr<int>(new int(i)));
    > }
    > };
    >
    > std::thread producer1(f_prod);
    >
    > std::function<void()> f_cons = [&]() {
    > const int size = 10000;
    > std::unique_ptr<int> arr[size];
    > int i = nitems;
    > while(i-- > 0)
    > {
    > arr[std::rand()%size] = produced.take();
    > }
    > };
    >
    > std::thread consumer1(f_cons);
    >
    > producer1.join();
    > consumer1.join();
    > }
    > // --------------------------------------


    In the C++ version at the end of the main function it waits for the
    threads to complete, in the Java version the threads are not joined at
    the end of the main function. I don't know if Java keeps the process
    alive until all threads have terminated, but if not it would explain the
    difference.

    I'm a bit surpirsed that the C++ version compiles without complaints on
    your compiler because there is no return statement in the main() function.
     
    Dombo, Jul 23, 2012
    #13
  14. Melzzzzz

    Melzzzzz Guest

    On Mon, 23 Jul 2012 13:23:30 -0700 (PDT)
    Howard Hinnant <> wrote:

    > I made a few changes to your code:
    >
    >
    > On my system this sped the C++ solution up considerably (to about
    > 13.7 seconds). I didn't time the Java solution.


    Yes, that's it, on my system:
    bmaxa@maxa:~/examples$ time ./consprod2

    real 0m9.478s
    user 0m10.797s
    sys 0m7.196s

    which is more than twice speed of original program!
    (and faster than java)
     
    Melzzzzz, Jul 23, 2012
    #14
  15. Melzzzzz

    Luca Risolia Guest

    On 23/07/2012 12:17, Melzzzzz wrote:
    >> (On a side note, the C++ version you have given does not have the
    >> best possible interface for a thread-safe queue)

    >
    > I have just copied Java BlockingQueue interface, which is of course
    > suited for Java. Don't know actually how ideal interface would look
    > like.


    In C++11 the ideal interface for a generic thread-safe and
    exception-safe queue should at least provide std::shared_ptr<T> take()
    and/or void take(T&). T take() is limited to those types having no-throw
    copy constructors or move-constructors. This also means that you should
    check the type at compile-time by using type traits. However, in your
    test case you used move-semantic everywhere, so that is not problem.
    Also note that the term "pop" instead of "take" is more in the spirit of
    C++.
     
    Luca Risolia, Jul 23, 2012
    #15
  16. On Monday, July 23, 2012 6:32:10 PM UTC-4, Melzzzzz wrote:
    > On Mon, 23 Jul 2012 13:23:30 -0700 (PDT)
    > Howard Hinnant wrote:
    >
    > &gt; I made a few changes to your code:
    > &gt;
    > &gt;
    > &gt; On my system this sped the C++ solution up considerably (to about
    > &gt; 13.7 seconds). I didn't time the Java solution.
    >
    > Yes, that's it, on my system:
    > bmaxa@maxa:~/examples$ time ./consprod2
    >
    > real 0m9.478s
    > user 0m10.797s
    > sys 0m7.196s
    >
    > which is more than twice speed of original program!
    > (and faster than java)


    At second glance I wasn't that fond of my solution and tweaked it as below:


    void put(T t)
    {
    std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
    while (!lock.owns_lock())
    {
    if (queue_.size() > capacity_/4)
    {
    for (int i = 0; i < 3250; ++i)
    std::this_thread::yield();
    lock.try_lock();
    }
    else
    {
    lock.lock();
    break;
    }
    }
    while(queue_.size() >= capacity_)c_full_.wait(lock);
    queue_.push_back(std::move(t));
    if (queue_.size() == 1)
    c_empty_.notify_one();
    }
    T take()
    {
    std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
    int retry = 0;
    while (!lock.owns_lock())
    {
    if (queue_.size() < 3*capacity_/4)
    {
    for (int i = 0; i < 3250; ++i)
    std::this_thread::yield();
    lock.try_lock();
    }
    else
    {
    lock.lock();
    break;
    }
    }
    while(queue_.empty())c_empty_.wait(lock);
    T tmp = std::move(queue_.front());
    queue_.pop_front();
    if (queue_.size() == capacity_-1)
    c_full_.notify_one();
    return tmp;
    }

    I am admittedly coding to the benchmark (which is not a really great idea). But I got the time on my system down from 13.7 seconds to about 9.7 seconds (another 40%).
     
    Howard Hinnant, Jul 24, 2012
    #16
  17. On Jul 23, 1:57 pm, Dombo <> wrote:
    > I'm a bit surpirsed that the C++ version compiles without complaints on
    > your compiler because there is no return statement in the main() function..


    It's an exception in C and C++ for main only. Running off the end of
    main is allowed, and is equivalent to return 0, itself equivalent to
    exit(0), IIRC.
     
    Joshua Maurice, Jul 24, 2012
    #17
  18. Melzzzzz

    Melzzzzz Guest

    On Mon, 23 Jul 2012 18:11:58 -0700 (PDT)
    Howard Hinnant <> wrote:

    > On Monday, July 23, 2012 6:32:10 PM UTC-4, Melzzzzz wrote:
    > > On Mon, 23 Jul 2012 13:23:30 -0700 (PDT)
    > > Howard Hinnant wrote:
    > >
    > > &gt; I made a few changes to your code:
    > > &gt;
    > > &gt;
    > > &gt; On my system this sped the C++ solution up considerably (to
    > > about &gt; 13.7 seconds). I didn't time the Java solution.
    > >
    > > Yes, that's it, on my system:
    > > bmaxa@maxa:~/examples$ time ./consprod2
    > >
    > > real 0m9.478s
    > > user 0m10.797s
    > > sys 0m7.196s
    > >
    > > which is more than twice speed of original program!
    > > (and faster than java)

    >
    > At second glance I wasn't that fond of my solution and tweaked it as
    > below:
    >
    >
    > void put(T t)
    > {
    > std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
    > while (!lock.owns_lock())
    > {
    > if (queue_.size() > capacity_/4)
    > {
    > for (int i = 0; i < 3250; ++i)
    > std::this_thread::yield();
    > lock.try_lock();
    > }
    > else
    > {
    > lock.lock();
    > break;
    > }
    > }
    > while(queue_.size() >= capacity_)c_full_.wait(lock);
    > queue_.push_back(std::move(t));
    > if (queue_.size() == 1)
    > c_empty_.notify_one();
    > }
    > T take()
    > {
    > std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
    > int retry = 0;
    > while (!lock.owns_lock())
    > {
    > if (queue_.size() < 3*capacity_/4)
    > {
    > for (int i = 0; i < 3250; ++i)
    > std::this_thread::yield();
    > lock.try_lock();
    > }
    > else
    > {
    > lock.lock();
    > break;
    > }
    > }
    > while(queue_.empty())c_empty_.wait(lock);
    > T tmp = std::move(queue_.front());
    > queue_.pop_front();
    > if (queue_.size() == capacity_-1)
    > c_full_.notify_one();
    > return tmp;
    > }
    >
    > I am admittedly coding to the benchmark (which is not a really great
    > idea). But I got the time on my system down from 13.7 seconds to
    > about 9.7 seconds (another 40%).


    On my system speed up is not that big.
    (new version)
    bmaxa@maxa:~/examples$ time ./consprod3

    real 0m9.296s
    user 0m10.061s
    sys 0m8.069s

    (old version)
    bmaxa@maxa:~/examples$ time ./consprod2

    real 0m9.496s
    user 0m10.917s
    sys 0m7.292s
    bmaxa@maxa:~/examples$

    (java with array assignment)
    bmaxa@maxa:~/examples$ time java consprod

    real 0m16.098s
    user 0m25.674s
    sys 0m1.932s
     
    Melzzzzz, Jul 24, 2012
    #18
  19. Melzzzzz

    none Guest

    In article <>,
    Howard Hinnant <> wrote:
    >I made a few changes to your code:
    >
    >1. I only signal if the capacity becomes non-zero or non-full.


    >2. If the queue is the queue has got some items in it and the mutex is locked, the put thread yields to the take
    >thread. If the queue has some free space in it and the mutex is locked, the take thread yields to the put
    >thread.


    Are you sure about the usefulness of #2?

    Doing a few tests myself, adding this in m queue seems to reduce performance by some 20%:
    while (!lock.owns_lock())
    {
    if (queue_.size() > capacity_/4)
    {
    for (int i = 0; i < 3250; ++i)
    std::this_thread::yield();
    lock.try_lock();
    }
    else
    {
    lock.lock();
    break;
    }
    }


    >3. Traffic in int instead of unique_ptr<int>.



    Yannick
     
    none, Jul 24, 2012
    #19
  20. Melzzzzz

    Melzzzzz Guest

    On Tue, 24 Jul 2012 12:51:20 +0000 (UTC)
    yatremblay@bel1lin202.(none) (Yannick Tremblay) wrote:

    > In article <>,
    > Howard Hinnant <> wrote:
    > >I made a few changes to your code:
    > >
    > >1. I only signal if the capacity becomes non-zero or non-full.

    >
    > >2. If the queue is the queue has got some items in it and the mutex
    > >is locked, the put thread yields to the take thread. If the queue
    > >has some free space in it and the mutex is locked, the take thread
    > >yields to the put thread.

    >
    > Are you sure about the usefulness of #2?


    Problem is that we need to reduce waiting on condition to minimum
    as more condition waits program is slower.
    Ideally there would be no condition waits (maximum speed).

    >
    > Doing a few tests myself, adding this in m queue seems to reduce
    > performance by some 20%: while (!lock.owns_lock())
    > {
    > if (queue_.size() > capacity_/4)
    > {
    > for (int i = 0; i < 3250; ++i)
    > std::this_thread::yield();
    > lock.try_lock();
    > }
    > else
    > {
    > lock.lock();
    > break;
    > }
    > }
    >


    This is second version, on my system first version seems better
    for smaller capacities...
     
    Melzzzzz, Jul 24, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark McKay
    Replies:
    0
    Views:
    468
    Mark McKay
    Dec 9, 2003
  2. Buck Turgidson

    Simple Producer/Consumer Thread Question

    Buck Turgidson, Feb 17, 2004, in forum: Java
    Replies:
    5
    Views:
    556
    Tony Dahlman
    Feb 21, 2004
  3. Usenet Poster!!!
    Replies:
    4
    Views:
    1,914
    Eric Sosman
    Sep 30, 2004
  4. Jeff
    Replies:
    4
    Views:
    691
    xarax
    Oct 22, 2004
  5. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,263
    Smokey Grindel
    Dec 2, 2006
Loading...

Share This Page