Re: benchmarks? java vs .net

Discussion in 'Java' started by Mark Thornton, Jun 3, 2008.

  1. Jon Harrop wrote:
    > Razii wrote:
    >> On Mon, 02 Jun 2008 10:22:29 +0100, Jon Harrop <>
    >> wrote:
    >>> Yes. I have ported a variety of benchmarks and found there is no
    >>> significant difference between the performance of .NET and Java. The one
    >>> result that stood out was .NET being 2x faster at the Mersenne Twister
    >>> PRNG.

    >> .NET is twice slower in four benchmarks: binarytrees, mandelbrot,
    >> regexdna, sumcol. .NET was twice faster only in trig related
    >> benchmark.

    >
    > You have not optimized the .NET code:
    >
    > On the mandelbrot benchmark, most of the time is spent doing unbuffered IO.
    > Use buffered IO and .NET becomes ~10% faster than Java.
    >
    > On the regexdna benchmark, the regular expressions are not being compiled.
    > Ask for compilation and the .NET is <20% slower than Java.
    >
    > The sumcol benchmark is also using unbuffered IO.
    >
    > So neither VM is significantly faster overall.
    >


    Rather amusing really as unintentional use of unbuffered IO is a
    frequent cause of Java benchmarks running more slowly than they should.
    It seems that .NET copied that characteristic as well.

    Mark Thornton
     
    Mark Thornton, Jun 3, 2008
    #1
    1. Advertising

  2. In article <>,
    Razii <> wrote:

    [...]
    > Verify it yourself.
    >
    > http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=cshar
    > p&id=2
    >
    > vs
    >
    > http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=javax
    > x&id=0
    >
    >
    > 3.4 times slower.


    Indeed, I see C#/Mono = 7.15s and Java = 3.22s. I would say that Java
    took (3.22s / 7.15s = 0.450) less that half as long as C#/Mono or that
    Java was (7.15s / 3.22s = 2.220) over twice as fast. I don't see where
    3.4 comes from. Just asking.

    John
    --
    John B. Matthews
    trashgod at gmail dot com
    home dot woh dot rr dot com slash jbmatthews
     
    John B. Matthews, Jun 3, 2008
    #2
    1. Advertising

  3. On Jun 3, 4:39 pm, Jon Harrop <> wrote:
    > FWIW, F#/Mono is 3x slower than F#/.NET on the SciMark2 benchmark. I'd like
    > to know how performance compares between these platforms for parallel code.


    If Razii is genuinely comparing Java with Mono (I haven't looked at
    any of the figures) it's a silly test to start with (unless you're
    specifically interested in Mono, of course). The vast majority of C#
    code runs on .NET rather than Mono - and while I applaud the Mono
    team's work, I seriously doubt that it has quite has much effort going
    into it as Microsoft is putting into .NET. I'd expect .NET to
    outperform Mono, and on microbencharks like these the difference could
    be quite significant in some cases.

    > Does Java have anything comparable to .NET's Task Parallel Library?


    I haven't used it, but I've heard about Doug Lea's Fork/Join
    framework:
    http://gee.cs.oswego.edu/dl/papers/fj.pdf

    One problem with creating anything like Parallel Extensions (which I
    assume is what you meant - AFAIK the TPL is just part of Parallel
    Extensions, although I could be misunderstanding you completely) for
    Java is that it doesn't (currently) have closures other than anonymous
    inner classes which are ugly as hell.

    The state of closures in Java 7 is currently up in the air.

    Jon
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #3
  4. Razii <> wrote:
    > >If Razii is genuinely comparing Java with Mono (I haven't looked at
    > >any of the figures) it's a silly test to start with (unless you're
    > >specifically interested in Mono, of course).

    >
    > If you have no clue what is this thread about, why post at all?


    Well gee, you keep post URLs comparing Mono with Java. That's *not*
    comparing .NET with Java - and is thus relatively pointless IMO. (It's
    also made more pointless by you not mentioning in your blog post which
    version of Java you're using, or which version of .NET.)

    To make the conversation significant (well, as significant as this kind
    of thing can be):

    1) Don't bother posting about the Mono benchmarks at all. Keep it to
    .NET and Java.
    2) Explain the precise runtime environments.

    The IO suggestions so far seem to be an indication of the general merit
    of the benchmark, mind you.

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #4
  5. Jon Harrop wrote:
    > Mark Thornton wrote:
    >> Rather amusing really as unintentional use of unbuffered IO is a
    >> frequent cause of Java benchmarks running more slowly than they should.
    >> It seems that .NET copied that characteristic as well.

    >
    > Yes. I've no idea why they do that. Isn't buffered IO a better default?!
    >


    I think the reasoning is simplicity --- you create what you want through
    composition of a smaller number of classes. Otherwise you need more
    classes or extra options on constructors to provide for all possible
    cases. Unfortunately they spoiled this a bit by providing a few
    composites (java.io.FileReader for example). It is also a bit confusing
    that InputStreamReader includes some buffering, but adding buffering to
    an already buffered stream doesn't usually increase the cost by much.

    Mark Thornton
     
    Mark Thornton, Jun 3, 2008
    #5
  6. Razii <> wrote:
    > On Tue, 3 Jun 2008 18:51:10 +0100, Jon Skeet [C# MVP]
    > <> wrote:
    >
    > >Well gee, you keep post URLs comparing Mono with Java. That's *not*
    > >comparing .NET with Java - and is thus relatively pointless IMO. (It's
    > >also made more pointless by you not mentioning in your blog post which
    > >version of Java you're using, or which version of .NET.)

    >
    > The link I posted was this:
    >
    > http://kingrazi.blogspot.com/
    >
    > It's clear what it is about.


    That was the link you posted *initially*. Since then you have posted
    multiple links to http://shootout.alioth.debian.org which is about
    Mono.

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #6
  7. Jon Skeet [C# MVP] wrote:
    >
    >> Does Java have anything comparable to .NET's Task Parallel Library?

    >
    > I haven't used it, but I've heard about Doug Lea's Fork/Join
    > framework:
    > http://gee.cs.oswego.edu/dl/papers/fj.pdf
    >

    I have used it and find it works quite well even without closures. I
    haven't used .NET so can't compare them.

    For more information and downloads of the package:
    http://g.oswego.edu/dl/concurrency-interest/

    Mark Thornton
     
    Mark Thornton, Jun 3, 2008
    #7
  8. Razii <> wrote:
    > >The IO suggestions so far seem to be an indication of the general merit
    > >of the benchmark, mind you.

    >
    > The criteria is same for both sides.


    Um, the C# version (as posted on the site you've linked to below) uses
    unbuffered IO while the Java version doesn't. They're just not doing
    the same thing. Even if they were, that wouldn't give any meaningful
    data about which language/platform "in general". It might help if you
    happen to know that your code is going to require a huge amount of
    recursion - to the extent that it's going to be the performance
    bottleneck - *and* performance is your primary concern.

    > For now C# is slower by wide margin in these benchmarks. How about
    > fixing them? If you can't, I will conclude C# is just slower.


    Feel free to draw that conclusion, while the rest of us instead
    conclude that you can't claim that C# (which is a *language* after all,
    not a platform) is slower or faster than another language.

    Even running .NET rather than Mono will only go so far - you'd need to
    run it on both 64 bit and 32 bit platforms, as the runtimes do
    different things. (IIRC the 64 bit CLR uses tail recursion more heavily
    than the 32 bit CLR, for instance.)

    If I were to find a really slow JVM, would that prove that Java is a
    slower *language* than C#?

    I can easily write a benchmark where Java (under Hotspot) "beats" .NET:
    just call a very small virtual method which is never overridden.
    Hotspot will inline the call aggressively, as it is able to "undo" the
    optimisation later on. .NET, with a single-pass JIT, won't do that.
    Does that prove that Java is a faster language? No, not at all. It just
    shows one area situation where it works better. It pretty much has to,
    given that far more methods tend to be virtual in Java code than in
    ..NET code. I can probably find similar situations where .NET stomps
    over Java (being able to create custom value types would probably be a
    good starting point - force heap allocation and garbage collection in
    Java but not .NET; even though it's fast, it's not free) - but that
    wouldn't prove that .NET is faster than Java, either.

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #8
  9. Mark Thornton <> wrote:
    > > I haven't used it, but I've heard about Doug Lea's Fork/Join
    > > framework:
    > > http://gee.cs.oswego.edu/dl/papers/fj.pdf

    >
    > I have used it and find it works quite well even without closures.


    Cool - that's good to hear. I'd be very surprised if closures didn't
    make it even simpler to use (possibly with a bit of refactoring towards
    single-method interfaces if necessary).

    Given the rest of Doug Lea's work, I'd expect it to be good :)

    > I haven't used .NET so can't compare them.


    Here's a little example from a Mandelbrot benchmark I was writing a few
    weeks ago (oh the irony).

    Simple initial code:

    public override void Generate()
    {
    int index = 0;
    for (int row = 0; row < Height; row++)
    {
    for (int col = 0; col < Width; col++)
    {
    Data[index++] = ComputeMandelbrotIndex(row, col);
    }
    }
    }

    Now using Parallel Extensions, and the lamdba expressions of C# 3:

    public override void Generate()
    {
    Parallel.For(0, Height, row =>
    {
    int index = row * Width;
    for (int col = 0; col < Width; col++)
    {
    Data[index++] = ComputeMandelbrotIndex(row, col);
    }
    });
    }

    I can't imagine many transformations being simpler than that - and the
    results are great.

    See http://preview.tinyurl.com/58vfav for rather more :)

    > For more information and downloads of the package:
    > http://g.oswego.edu/dl/concurrency-interest/


    Thanks, will take a look.

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #9
  10. Jon Harrop <> wrote:
    > Mark Thornton wrote:
    > > Jon Skeet [C# MVP] wrote:
    > >>> Does Java have anything comparable to .NET's Task Parallel Library?
    > >>
    > >> I haven't used it, but I've heard about Doug Lea's Fork/Join
    > >> framework:
    > >> http://gee.cs.oswego.edu/dl/papers/fj.pdf
    > >>

    > > I have used it and find it works quite well even without closures. I
    > > haven't used .NET so can't compare them.

    >
    > The Task Parallel Library is awesome, even though it is only a CTP for now.


    Agreed. Now is a particularly good time to get into it, as the second
    CTP has just been released:

    http://blogs.msdn.com/pfxteam/archive/2008/06/02/8567093.aspx

    > I highly recommend it. For example, read the article:
    >
    > "Surviving in the multicore era with Microsoft's ParallelFX" - The F#.NET
    > Journal, 30th April 2008
    > http://www.ffconsultancy.com/products/fsharp_journal/?cs


    Or not, given that that's subscription only...

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #10
  11. Razii <> wrote:
    > >That was the link you posted *initially*. Since then you have posted
    > >multiple links to http://shootout.alioth.debian.org which is about
    > >Mono.

    >
    > Huh? I will be polite for now and just say you didn't read the thread
    > carefully. The benchmark I tested are at that site. I don't have Mono.
    > I only have .NET.


    Let's look at the most recent set of links you posted to "prove" that
    Java is faster than C#:

    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=binarytrees&lang=javaxx&id=2

    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=binarytrees&lang=csharp&id=0


    revcomp
    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=revcomp&lang=javaxx&id=4
    vs
    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=revcomp&lang=csharp&id=2


    sumcol
    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=sumcol&lang=javaxx&id=4
    vs
    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=sumcol&lang=csharp&id=0

    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=recursive&lang=javaxx&id=0
    vs
    http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?
    test=recursive&lang=csharp&id=0


    Now, you're using *those results* to form conclusions, right? If not,
    there was little point in posting them. However, those results are of
    Mono, not .NET.

    This is why I've urged consistency. What's the point in debating Java
    vs .NET if you keep posting links to results of Java vs Mono?

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #11
  12. Razii <> wrote:
    > On Tue, 3 Jun 2008 19:50:21 +0100, Jon Skeet [C# MVP]
    > <> wrote:
    >
    > >Um, the C# version (as posted on the site you've linked to below) uses
    > >unbuffered IO while the Java version doesn't. They're just not doing
    > >the same thing.

    >
    > What benchmark are you talking about?


    The mandelbrot one. Look back in the thread from where I first talked
    about IO, and you'll see it's the last one referenced.

    > Changing the line to
    >
    > using (StreamReader r = new StreamReader(new
    > BufferedStream(Console.OpenStandardInput())))
    >
    > didn't make a difference in sum-file benchmark.


    The benchmark I wasn't referring to? Oh.

    > By the way, StreamTokenizer is not buffered. It reads a byte each time
    > from the stream. It maybe that System.in is buffered by default in
    > Java.


    Which benchmark are you talking about now? sum-file uses BufferedReader
    but not StreamTokenizer as far as I can see.

    It would be interesting to know what the sum-file benchmark is trying
    to measure - there are six things off the top of my head:

    1) File IO
    2) Conversion from binary to text
    3) Splitting a stream of textual data into lines
    4) Parsing text into integers
    5) Integer addition
    6) Writing the result to the console

    The benchmark gives no information as to which of those is the
    bottleneck for any particular platform.

    I note, by the way, that the Java version of sum-col assumes that using
    the platform default encoding is good enough. The C# version always
    uses UTF-8. Depending on what the default encoding is, that may or may
    not be significant - but it does show that the programs are not doing
    the same thing.


    How did you measure your .NET results, by the way? By starting the
    process thousands of times, as shown on the other web site, or by
    running the same code within the process many times?

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #12
  13. Jon Skeet [C# MVP] wrote:
    > Mark Thornton <> wrote:
    >>> I haven't used it, but I've heard about Doug Lea's Fork/Join
    >>> framework:
    >>> http://gee.cs.oswego.edu/dl/papers/fj.pdf

    >> I have used it and find it works quite well even without closures.

    >
    > Cool - that's good to hear. I'd be very surprised if closures didn't
    > make it even simpler to use (possibly with a bit of refactoring towards
    > single-method interfaces if necessary).
    >
    > Given the rest of Doug Lea's work, I'd expect it to be good :)
    >
    >> I haven't used .NET so can't compare them.

    >
    > Here's a little example from a Mandelbrot benchmark I was writing a few
    > weeks ago (oh the irony).
    >
    > Simple initial code:
    >
    > public override void Generate()
    > {
    > int index = 0;
    > for (int row = 0; row < Height; row++)
    > {
    > for (int col = 0; col < Width; col++)
    > {
    > Data[index++] = ComputeMandelbrotIndex(row, col);
    > }
    > }
    > }
    >
    > Now using Parallel Extensions, and the lamdba expressions of C# 3:
    >
    > public override void Generate()
    > {
    > Parallel.For(0, Height, row =>
    > {
    > int index = row * Width;
    > for (int col = 0; col < Width; col++)
    > {
    > Data[index++] = ComputeMandelbrotIndex(row, col);
    > }
    > });
    > }
    >
    > I can't imagine many transformations being simpler than that - and the
    > results are great.
    >
    > See http://preview.tinyurl.com/58vfav for rather more :)
    >
    >> For more information and downloads of the package:
    >> http://g.oswego.edu/dl/concurrency-interest/

    >
    > Thanks, will take a look.
    >


    There are some FJ examples here:
    http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W32

    The recursive matrix multiplication is my contribution.

    Mark Thornton
     
    Mark Thornton, Jun 3, 2008
    #13
  14. Mark Thornton <> wrote:

    <snip>

    > There are some FJ examples here:
    > http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W32
    >
    > The recursive matrix multiplication is my contribution.


    Right. It still looks a bit involved - which is only to be expected,
    really. I haven't taken it in fully yet though - will aim to do so at a
    later date. Might try porting to Parallel Extensions...

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #14
  15. Razii <> wrote:
    > >The mandelbrot one. Look back in the thread from where I first talked
    > >about IO, and you'll see it's the last one referenced.

    >
    > Yes, that was fixed by Harpo a while ago.


    And yet you seem to be completely ignoring my point: a shootout which
    has had such a trivial but significant flaw for a significant time is
    suspect in its methodology. (Heck, the measurement style is dodgy to
    start with. Including the startup cost in the test is pretty crazy.)

    > >Which benchmark are you talking about now? sum-file uses BufferedReader
    > >but not StreamTokenizer as far as I can see.

    >
    > I suspect System.in is already buffered by default.


    In what way does your sentence relate to my two?

    > >1) File IO
    > >2) Conversion from binary to text
    > >3) Splitting a stream of textual data into lines
    > >4) Parsing text into integers
    > >5) Integer addition
    > >6) Writing the result to the console

    >
    > (1), (2), (4), (5), (6) ?


    What is your list of numbers meant to signify?

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #15
  16. Jon Skeet [C# MVP] wrote:
    > Mark Thornton <> wrote:
    >
    > <snip>
    >
    >> There are some FJ examples here:
    >> http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W32
    >>
    >> The recursive matrix multiplication is my contribution.

    >
    > Right. It still looks a bit involved - which is only to be expected,
    > really. I haven't taken it in fully yet though - will aim to do so at a
    > later date. Might try porting to Parallel Extensions...
    >


    Frameworks like Fork/Join can only go so far in easing tasks like matrix
    multiplication. Given that my Java code won't be making best use of
    SSE, I was quite pleased with the performance of the result.

    Mark Thornton
     
    Mark Thornton, Jun 3, 2008
    #16
  17. Mark Thornton

    Barry Kelly Guest

    Razii wrote:

    > Is this guy Jon Skeet really this stupid?
    >
    > Let me know so I can add him to ignore list.


    Razii, if you hope to ever have any positive effect in your
    communication - e.g. convince anyone that you have a clue - you'll have
    to work much harder not to look like a naive 13 year old with
    undeveloped social skills.

    Plonk etc.

    -- Barry

    --
    http://barrkel.blogspot.com/
     
    Barry Kelly, Jun 3, 2008
    #17
  18. Mark Thornton

    Barry Kelly Guest

    Jon Skeet wrote:

    > Now using Parallel Extensions, and the lamdba expressions of C# 3:
    >
    > public override void Generate()
    > {
    > Parallel.For(0, Height, row =>
    > {
    > int index = row * Width;
    > for (int col = 0; col < Width; col++)
    > {
    > Data[index++] = ComputeMandelbrotIndex(row, col);
    > }
    > });
    > }
    >
    > I can't imagine many transformations being simpler than that - and the
    > results are great.


    How have you found Parallel.For to perform?

    I've found that my own naive (i.e. quick hack with little thought)
    implementations of Parallel.For that dispatch over a fixed number of
    threads (one thread per core) outperform the ones in the TPL by quite a
    bit - TPL seemed to have had around ~20% overhead in the particular
    microbenchmark I was trying to scale over my quad core.

    I must dig it out and analyse the discrepancy, and blog about it, when I
    find time.

    -- Barry

    --
    http://barrkel.blogspot.com/
     
    Barry Kelly, Jun 3, 2008
    #18
  19. Barry Kelly <> wrote:
    > > I can't imagine many transformations being simpler than that - and the
    > > results are great.

    >
    > How have you found Parallel.For to perform?


    Very well - see my last few blog posts.

    > I've found that my own naive (i.e. quick hack with little thought)
    > implementations of Parallel.For that dispatch over a fixed number of
    > threads (one thread per core) outperform the ones in the TPL by quite a
    > bit - TPL seemed to have had around ~20% overhead in the particular
    > microbenchmark I was trying to scale over my quad core.
    >
    > I must dig it out and analyse the discrepancy, and blog about it, when I
    > find time.


    I wouldn't be surprised if some individual test cases did badly in
    microbenchmarks. Without even looking at your code, my guess is that
    you may find your code doesn't work as well when there's other load on
    the processor (e.g. effectively taking up one core with a different
    process) whereas work stealing should (I believe) cope with that
    reasonably well.

    But as I point out on the most recent blog entry, accurately
    benchmarking this kind of thing and being able to interpret the results
    sensibly is basically beyond my current understanding (and time to
    spend on it).

    --
    Jon Skeet - <>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com
     
    Jon Skeet [C# MVP], Jun 3, 2008
    #19
  20. Mark Thornton

    Barry Kelly Guest

    Jon Skeet wrote:

    > Barry Kelly <> wrote:
    > > > I can't imagine many transformations being simpler than that - and the
    > > > results are great.

    > >
    > > How have you found Parallel.For to perform?

    >
    > Very well - see my last few blog posts.


    I've been reading - I'm more wondering "compared to what", of course :)

    > I wouldn't be surprised if some individual test cases did badly in
    > microbenchmarks. Without even looking at your code, my guess is that
    > you may find your code doesn't work as well when there's other load on
    > the processor (e.g. effectively taking up one core with a different
    > process) whereas work stealing should (I believe) cope with that
    > reasonably well.


    My code does naive work-stealing too, though through a centralized
    queue, rather than having a per-thread queue that bored threads wander
    away from - that would avoid some cache sloshing etc.

    > But as I point out on the most recent blog entry, accurately
    > benchmarking this kind of thing and being able to interpret the results
    > sensibly is basically beyond my current understanding (and time to
    > spend on it).


    For the interest of others, I have appended my code to this post. Here
    are some of the results I got:

    (Running .NET 3.5 on XP SP3 on Q6600 @ 2.8GHz w/ 3.25GB RAM)

    - When all cores are fairly idle (no significant activity):

    ---8<---
    Switching Overhead (Naive) : 0.078 (fastest 0.078)
    Switching Overhead (TPL) : 0.045 (fastest 0.042)
    Lots of small tasks (Naive) : 0.682 (fastest 0.679)
    Lots of small tasks (TPL) : 0.081 (fastest 0.081)
    Few big tasks (Naive) : 0.967 (fastest 0.966)
    Few big tasks (TPL) : 1.259 (fastest 1.253)
    --->8---

    This shows that TPL's switching overhead is definitely lower than my
    "dumbass pool" when the body of the loop is just sleeping, and is far,
    far more efficient for lots and lots of small tasks, it seems less
    efficient - a lot less efficient - for bigger tasks.

    I can see obvious ways to improve my Naive parallel-for

    - per-thread queues with bored threads going work stealing
    - take advantage of ParallelFor's knowledge of total job size by adding
    lots of work to per-thread queues in one go, rather than bottlenecking
    on synchronization overhead for each single iteration

    Of course, the whole point of my naive algorithm is that it is naive, so
    I didn't do any of this. It's a 30-minute hack. I just find it's
    interesting that TPL can, on occasion, take close to 30% more time than
    a straightforward naive implementation, depending on work chunk size.

    Running 1 instance of super-pi to eat up 1 core, leaving only 3 cores
    free:

    ---8<---
    Switching Overhead (Naive) : 0.078 (fastest 0.078)
    Switching Overhead (TPL) : 0.045 (fastest 0.041)
    Lots of small tasks (Naive) : 0.734 (fastest 0.708)
    Lots of small tasks (TPL) : 0.087 (fastest 0.086)
    Few big tasks (Naive) : 0.983 (fastest 0.976)
    Few big tasks (TPL) : 1.317 (fastest 1.294)
    --->8---

    Running 2 instances of super-pi, leaving 2 cores free:

    ---8<---
    Switching Overhead (Naive) : 0.078 (fastest 0.078)
    Switching Overhead (TPL) : 0.045 (fastest 0.042)
    Lots of small tasks (Naive) : 1.063 (fastest 1.021)
    Lots of small tasks (TPL) : 0.094 (fastest 0.094)
    Few big tasks (Naive) : 1.158 (fastest 1.147)
    Few big tasks (TPL) : 1.376 (fastest 1.368)
    --->8---

    Here, TPL seems to be catching up with my Naive implementation the more
    is loaded on other cores, but it's still a fair ways off.


    Code follows:

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Threading;
    using System.Diagnostics;
    using System.Runtime.CompilerServices;

    // So dumb it's a stack, not a queue, but we never promised
    // anything about order.
    class DumbassQueue<T>
    {
    class Item
    {
    public Item next;
    public T value;
    }

    // volatile to get most recent head in non-blocking loops.
    volatile Item _head;

    Semaphore _itemCount;
    Semaphore _emptyCount;

    public DumbassQueue(int maxCount)
    {
    _itemCount = new Semaphore(0, maxCount);
    _emptyCount = new Semaphore(maxCount, maxCount);
    }

    public T Dequeue()
    {
    _itemCount.WaitOne();
    try
    {
    for (;;)
    {
    Item result = _head;
    if (Interlocked.CompareExchange(ref _head,
    result.next, result) == result)
    return result.value;
    }
    }
    finally
    {
    _emptyCount.Release();
    }
    }

    public void Enqueue(T item)
    {
    _emptyCount.WaitOne();
    try
    {
    Item added = new Item();
    added.value = item;
    for (;;)
    {
    added.next = _head;
    if (Interlocked.CompareExchange(ref _head, added,
    added.next) == added.next)
    break;
    }
    }
    finally
    {
    _itemCount.Release();
    }
    }
    }

    class DumbassPool
    {
    Thread[] _threads;
    DumbassQueue<Action> _queue;

    public DumbassPool()
    {
    _threads = new Thread[Environment.ProcessorCount];
    _queue = new DumbassQueue<Action>(
    Environment.ProcessorCount * 2);
    Thread.MemoryBarrier();
    for (int i = 0; i < _threads.Length; ++i)
    {
    _threads = new Thread(ThreadMain);
    _threads.IsBackground = true;
    _threads.Start();
    }
    }

    void ThreadMain()
    {
    for (;;)
    _queue.Dequeue()();
    }

    public void ParallelFor(int start, int limit, Action<int> body)
    {
    using (DoneWaiter waiter = new DoneWaiter(limit - start))
    {
    for (int i = start; i < limit; ++i)
    {
    int index = i;
    _queue.Enqueue(delegate
    {
    body(index);
    waiter.Signal();
    });
    }
    }
    }

    class DoneWaiter : IDisposable
    {
    int _taskCount;
    ManualResetEvent _done;

    public DoneWaiter(int taskCount)
    {
    _taskCount = taskCount;
    _done = new ManualResetEvent(false);
    }

    public void Dispose()
    {
    _done.WaitOne();
    _done.Close();
    }

    public void Signal()
    {
    if (Interlocked.Decrement(ref _taskCount) == 0)
    _done.Set();
    }
    }
    }

    class Program
    {
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Use(int x)
    {
    if (x == -42)
    Console.WriteLine("Got the magic number");
    }

    static void Time(string title, Action proc)
    {
    double totalTime = 0;
    double fastest = double.MaxValue;
    int iterCount = 3;

    // warmup
    proc();

    for (int i = 0; i < iterCount; ++i)
    {
    Stopwatch w = Stopwatch.StartNew();
    proc();
    double time = w.ElapsedTicks / (double) Stopwatch.Frequency;
    totalTime += time;
    if (time < fastest)
    fastest = time;
    }

    Console.WriteLine("{0,-30}: {1,6:f3} (fastest {2,6:f3})",
    title, totalTime / iterCount, fastest);
    }

    static void Main(string[] args)
    {
    DumbassPool pool = new DumbassPool();

    Time("Switching Overhead (Naive)", delegate
    {
    pool.ParallelFor(0, 20, i =>
    {
    Thread.Sleep(10);
    });
    });

    Time("Switching Overhead (TPL)", delegate
    {
    Parallel.For(0, 20, i =>
    {
    Thread.Sleep(10);
    });
    });

    Time("Lots of small tasks (Naive)", delegate
    {
    pool.ParallelFor(0, 100000, i =>
    {
    for (int j = 0; j < 1000; ++j)
    Use(j);
    });
    });

    Time("Lots of small tasks (TPL)", delegate
    {
    Parallel.For(0, 100000, i =>
    {
    for (int j = 0; j < 1000; ++j)
    Use(j);
    });
    });

    Time("Few big tasks (Naive)", delegate
    {
    pool.ParallelFor(0, 10, i =>
    {
    for (int j = 0; j < 100000000; ++j)
    Use(j);
    });
    });

    Time("Few big tasks (TPL)", delegate
    {
    Parallel.For(0, 10, i =>
    {
    for (int j = 0; j < 100000000; ++j)
    Use(j);
    });
    });
    }
    }

    -- Barry

    --
    http://barrkel.blogspot.com/
     
    Barry Kelly, Jun 4, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Thornton

    Re: benchmarks? java vs .net

    Mark Thornton, Jun 3, 2008, in forum: C++
    Replies:
    35
    Views:
    970
    Arne Vajhøj
    Jun 9, 2008
  2. Jon Skeet [C# MVP]

    Re: benchmarks? java vs .net

    Jon Skeet [C# MVP], Jun 3, 2008, in forum: Java
    Replies:
    9
    Views:
    377
    Arne Vajhøj
    Jun 8, 2008
  3. Jon Skeet [C# MVP]

    Re: benchmarks? java vs .net

    Jon Skeet [C# MVP], Jun 3, 2008, in forum: C++
    Replies:
    10
    Views:
    537
    Arne Vajhøj
    Jun 8, 2008
  4. Mark Thornton
    Replies:
    3
    Views:
    377
    Arne Vajhøj
    Jun 9, 2008
  5. Mark Thornton
    Replies:
    3
    Views:
    495
    Arne Vajhøj
    Jun 9, 2008
Loading...

Share This Page