Java slow on Xeon processors ?

Discussion in 'Java' started by Michael Kreitmann, May 24, 2004.

  1. Hello,

    I've a strange performance problem with the follwoing (senseless)
    code:

    public class XeonTest
    {

    public static String XMLEncode(String aText)
    {
    int len = aText.length();
    StringBuffer result = new StringBuffer(len);
    char ch;
    for (int i = 0; i < len; ++i)
    {
    ch = aText.charAt(i);
    result.append(ch);
    }
    return result.toString();
    }

    public static void main(String[] args)
    {
    long start = System.currentTimeMillis();
    String s = "The brown fox jumps over the bridge. The brown fox
    jumps over the bridge";
    String res = "";
    for (int i = 0; i < 1000000; ++i)
    {
    res = XMLEncode(s);
    }
    System.out.println ("Runtime:" +
    (System.currentTimeMillis()-start)+" ms" );
    }
    }

    I'm using java 1.4.2 with the "-server" parameter (but this doesnt'
    matters) and Windows 2000 as OS.
    Time comparision:
    PIII/1000: 5 seconds
    2xXeon/1800: 14 seconds
    PIV/3000/: 2 seconds

    All other performance test on the Xeon machine (SiSoft Sandra) are
    runnig good.
    Can anybody please give me a hint, what's going wrong here ? All the
    time is lost in the "result.append(ch)" line. But why is the Xeon so
    slow ?
    It would also be great, if anybody could let run the code on any other
    Intel Xeon machine ;-)

    Many thanks for your help!

    Regards
    Michael
    Michael Kreitmann, May 24, 2004
    #1
    1. Advertising

  2. Michael Kreitmann

    Roedy Green Guest

    On 24 May 2004 01:52:37 -0700, (Michael Kreitmann)
    wrote or quoted :

    >Can anybody please give me a hint, what's going wrong here ? All the
    >time is lost in the "result.append(ch)" line. But why is the Xeon so
    >slow ?


    One possibility is the Xeon is not particularly good at character
    addressing. It has to read, poke, write to store a byte. So do other
    processors, but maybe it is more optimised for big chunks.

    Try a similar benchmark that works with 16 32 and 64 bit quantities.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, May 24, 2004
    #2
    1. Advertising

  3. Michael Kreitmann

    Nigel Wade Guest

    Michael Kreitmann wrote:
    > Hello,
    >
    > I've a strange performance problem with the follwoing (senseless)
    > code:
    >
    > public class XeonTest
    > {
    >
    > public static String XMLEncode(String aText)
    > {
    > int len = aText.length();
    > StringBuffer result = new StringBuffer(len);
    > char ch;
    > for (int i = 0; i < len; ++i)
    > {
    > ch = aText.charAt(i);
    > result.append(ch);
    > }
    > return result.toString();
    > }
    >
    > public static void main(String[] args)
    > {
    > long start = System.currentTimeMillis();
    > String s = "The brown fox jumps over the bridge. The brown fox
    > jumps over the bridge";
    > String res = "";
    > for (int i = 0; i < 1000000; ++i)
    > {
    > res = XMLEncode(s);
    > }
    > System.out.println ("Runtime:" +
    > (System.currentTimeMillis()-start)+" ms" );
    > }
    > }
    >
    > I'm using java 1.4.2 with the "-server" parameter (but this doesnt'
    > matters) and Windows 2000 as OS.
    > Time comparision:
    > PIII/1000: 5 seconds
    > 2xXeon/1800: 14 seconds
    > PIV/3000/: 2 seconds
    >
    > All other performance test on the Xeon machine (SiSoft Sandra) are
    > runnig good.
    > Can anybody please give me a hint, what's going wrong here ? All the
    > time is lost in the "result.append(ch)" line. But why is the Xeon so
    > slow ?
    > It would also be great, if anybody could let run the code on any other
    > Intel Xeon machine ;-)
    >
    > Many thanks for your help!
    >
    > Regards
    > Michael


    If I were to speculate, I'd say it was most likely due to the system being
    dual-processor. It might be due to some internal synchronization within
    StringBuffer which slows the code down considerably on multi-processor machines.

    Just so you know it's not a Windows problem, I see exactly the same
    difference between a 2.8MHz P4 and dual 2.4GHz Xeon running RedHat FC1.
    For the mono processor the code runs in 2300-2350ms, on the dual Xeon it
    runs in ~11000ms. I do notice that while the code is running on the dual
    processor system that both CPUs are showing around 30-50% utiliztion
    indicating an inablitity to get 100%.

    It would be interesting to see how the code executes on a single processor Xeon.


    --
    Nigel Wade, System Administrator, Space Plasma Physics Group,
    University of Leicester, Leicester, LE1 7RH, UK
    E-mail :
    Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
    Nigel Wade, May 24, 2004
    #3
  4. Michael Kreitmann

    Dave Guest

    > I've a strange performance problem with the follwoing (senseless)
    > code:
    >
    > I'm using java 1.4.2 with the "-server" parameter (but this doesnt'
    > matters) and Windows 2000 as OS.
    > Time comparision:
    > PIII/1000: 5 seconds
    > 2xXeon/1800: 14 seconds
    > PIV/3000/: 2 seconds
    >
    > All other performance test on the Xeon machine (SiSoft Sandra) are
    > runnig good.




    Hello Michael,

    When you say all other performance tests are running OK, do you mean all
    other Java performance tests? Or non-Java tests?

    Which JVM?
    Is compiler optimization on?
    Is JIT on?

    If this is the only Java test, and you are not using the JIT option and
    not compiling with optimization on, that might account for the
    difference...

    HTH

    --Dave
    Dave, May 24, 2004
    #4
  5. Nigel Wade <> wrote in message news:<c8skc3$j1o$>...
    > Michael Kreitmann wrote:
    > > Hello,
    > >
    > > I've a strange performance problem with the follwoing (senseless)
    > > code:
    > >
    > > public class XeonTest
    > > {
    > >
    > > public static String XMLEncode(String aText)
    > > {
    > > int len = aText.length();
    > > StringBuffer result = new StringBuffer(len);
    > > char ch;
    > > for (int i = 0; i < len; ++i)
    > > {
    > > ch = aText.charAt(i);
    > > result.append(ch);
    > > }
    > > return result.toString();
    > > }
    > >
    > > public static void main(String[] args)
    > > {
    > > long start = System.currentTimeMillis();
    > > String s = "The brown fox jumps over the bridge. The brown fox
    > > jumps over the bridge";
    > > String res = "";
    > > for (int i = 0; i < 1000000; ++i)
    > > {
    > > res = XMLEncode(s);
    > > }
    > > System.out.println ("Runtime:" +
    > > (System.currentTimeMillis()-start)+" ms" );
    > > }
    > > }
    > >
    > > I'm using java 1.4.2 with the "-server" parameter (but this doesnt'
    > > matters) and Windows 2000 as OS.
    > > Time comparision:
    > > PIII/1000: 5 seconds
    > > 2xXeon/1800: 14 seconds
    > > PIV/3000/: 2 seconds
    > >
    > > All other performance test on the Xeon machine (SiSoft Sandra) are
    > > runnig good.
    > > Can anybody please give me a hint, what's going wrong here ? All the
    > > time is lost in the "result.append(ch)" line. But why is the Xeon so
    > > slow ?
    > > It would also be great, if anybody could let run the code on any other
    > > Intel Xeon machine ;-)
    > >
    > > Many thanks for your help!
    > >
    > > Regards
    > > Michael

    >
    > If I were to speculate, I'd say it was most likely due to the system being
    > dual-processor. It might be due to some internal synchronization within
    > StringBuffer which slows the code down considerably on multi-processor machines.
    >
    > Just so you know it's not a Windows problem, I see exactly the same
    > difference between a 2.8MHz P4 and dual 2.4GHz Xeon running RedHat FC1.
    > For the mono processor the code runs in 2300-2350ms, on the dual Xeon it
    > runs in ~11000ms. I do notice that while the code is running on the dual
    > processor system that both CPUs are showing around 30-50% utiliztion
    > indicating an inablitity to get 100%.
    >
    > It would be interesting to see how the code executes on a single processor Xeon.


    First of all: Thanks for your answers!
    I've done another test on my P4 with activated Hyperthreading and ...
    the performance is very bad (9s <> 2s without HT).
    (The test on the 1xXeon/1800 will hopefully come tomorrow!)

    So: Java (?) does have a problem running on dual-processor machines ?
    Can this be right ? I can't believe, but what can I do ??

    Regards
    Michael
    Michael Kreitmann, May 24, 2004
    #5
  6. Michael Kreitmann wrote:
    > First of all: Thanks for your answers!
    > I've done another test on my P4 with activated Hyperthreading and ...
    > the performance is very bad (9s <> 2s without HT).
    > (The test on the 1xXeon/1800 will hopefully come tomorrow!)


    If you have java 1.5 available, try to replace StringBuffer with
    StringBuilder. The later is not synchronized, so should not run into the
    same problem.

    /Thomas
    Thomas Weidenfeller, May 24, 2004
    #6
  7. On Mon, 24 May 2004 17:00:16 +0200, Thomas Weidenfeller wrote:

    > Michael Kreitmann wrote:
    >> First of all: Thanks for your answers!
    >> I've done another test on my P4 with activated Hyperthreading and ...
    >> the performance is very bad (9s <> 2s without HT).
    >> (The test on the 1xXeon/1800 will hopefully come tomorrow!)

    >
    > If you have java 1.5 available, try to replace StringBuffer with
    > StringBuilder. The later is not synchronized, so should not run into the
    > same problem.


    Another thing to try on a pre 1.5 Java is to put the whole loop inside a
    synchronized block, synchronized with the StringBuffer.
    Timo Kinnunen, May 24, 2004
    #7
  8. Michael Kreitmann

    Juha Laiho Guest

    Nigel Wade <> said:
    >Michael Kreitmann wrote:
    >> I've a strange performance problem with the follwoing (senseless)
    >> code:

    ....
    >> I'm using java 1.4.2 with the "-server" parameter (but this doesnt'
    >> matters) and Windows 2000 as OS.
    >> Time comparision:
    >> PIII/1000: 5 seconds
    >> 2xXeon/1800: 14 seconds
    >> PIV/3000/: 2 seconds

    ....
    >Just so you know it's not a Windows problem, I see exactly the same
    >difference between a 2.8MHz P4 and dual 2.4GHz Xeon running RedHat FC1.
    >For the mono processor the code runs in 2300-2350ms, on the dual Xeon it
    >runs in ~11000ms. I do notice that while the code is running on the dual
    >processor system that both CPUs are showing around 30-50% utiliztion
    >indicating an inablitity to get 100%.


    Hmm.. As I see the code, it's a single-threaded app, so should ideally
    take 100% of the processing power of one CPU. If it takes 50% of power
    on 2 CPUs, this is somewhat equal, minus the frequent need to clear
    CPU caches and synchronise CPU states, but that is an inefficiency of
    the OS scheduler (not able to bind a CPU-looping process to a single CPU).
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
    Juha Laiho, May 24, 2004
    #8
  9. Michael Kreitmann

    Roedy Green Guest

    On 24 May 2004 07:50:46 -0700, (Michael Kreitmann)
    wrote or quoted :

    >
    >So: Java (?) does have a problem running on dual-processor machines ?
    >Can this be right ? I can't believe, but what can I do ??


    you are running a single thread, so the extra processor is not likely
    the problem. If StringBuffer is thread safe, it could however be the
    extra overhead of thread safety on the Xeon.

    Replace the code with a dummy non-thread safe String buffer that just
    does

    void append ( char c )
    {
    array ( ++i ) = c;
    }

    and see if you still see the anomaly. If you do, the culprit is likely
    byte addressing. If you don't, it is likely sync overhead. You can
    then experiment further adding more code from the real StringBuffer.

    StringBuffers that are not initialised to a decent starting size are
    inefficient the way they gradually grow, leaving behind droppings of
    discarded incarnations.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, May 24, 2004
    #9
  10. Michael Kreitmann

    Roedy Green Guest

    On Mon, 24 May 2004 17:42:03 GMT, Juha Laiho <> wrote
    or quoted :

    >Hmm.. As I see the code, it's a single-threaded app, so should ideally
    >take 100% of the processing power of one CPU. If it takes 50% of power
    >on 2 CPUs, this is somewhat equal, minus the frequent need to clear
    >CPU caches and synchronise CPU states, but that is an inefficiency of
    >the OS scheduler (not able to bind a CPU-looping process to a single CPU).


    That's reasonable, so long at is it is giving decently long time
    slices. Even on a single CPU, you would get that same overhead since
    the OS still hands out short time slices to CPU bound tasks.


    Binding to a single CPU in general would give worse performance.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, May 24, 2004
    #10
  11. Dave <> wrote in message news:<>...
    > > I've a strange performance problem with the follwoing (senseless)
    > > code:
    > >
    > > I'm using java 1.4.2 with the "-server" parameter (but this doesnt'
    > > matters) and Windows 2000 as OS.
    > > Time comparision:
    > > PIII/1000: 5 seconds
    > > 2xXeon/1800: 14 seconds
    > > PIV/3000/: 2 seconds
    > >
    > > All other performance test on the Xeon machine (SiSoft Sandra) are
    > > runnig good.

    >
    >
    >
    > Hello Michael,
    >
    > When you say all other performance tests are running OK, do you mean all
    > other Java performance tests? Or non-Java tests?
    >
    > Which JVM?
    > Is compiler optimization on?
    > Is JIT on?
    >
    > If this is the only Java test, and you are not using the JIT option and
    > not compiling with optimization on, that might account for the
    > difference...
    >
    > HTH
    >
    > --Dave


    Hello Dave,

    I'm using Sun Java SDK 1.4.2 (1.4.2_04). The source is complied with
    javac -O
    The performance tests I mean are the tests included in SiSoft Sandra
    2004. (Memory bandwith, CPU and so on).
    JIT ? Hmm, how can I switch it on/off ? (Is this a very silly question
    ?)

    Thanks for your advice!
    Regards
    Michael
    Michael Kreitmann, May 24, 2004
    #11
  12. Michael Kreitmann

    Marc Slemko Guest

    In article <>, Michael Kreitmann wrote:

    > First of all: Thanks for your answers!
    > I've done another test on my P4 with activated Hyperthreading and ...
    > the performance is very bad (9s <> 2s without HT).
    > (The test on the 1xXeon/1800 will hopefully come tomorrow!)
    >
    > So: Java (?) does have a problem running on dual-processor machines ?
    > Can this be right ? I can't believe, but what can I do ??


    Do you have to do anything?

    Do you have reason to think this is a performance issue in your
    code caused by this?

    What your code is doing is essentially 35 million synchronized
    method calls per second. Yes, synchronization is typically more
    expensive on a multiproc system.

    With some JVMs on some systems, adding a synchronized(result) block
    around the whole result.append() loop could reduce this overhead;
    with other JVMs it may hurt.

    If you need this case to perform well, you need something like the
    StringBuilder class in 1.5 that isn't synchronized. The typical
    app, however, isn't going to be doing 35 million StringBuffer.append()
    calls per second, so it becomes a very trivial difference in
    performance.

    I can reproduce the behaviour you desribe, if I change to a
    StringBuilder then a dual 2.4 GHz Xeon is slightly faster than a
    single 2.4 GHz P4, which is pretty much exactly what is expected.

    There are all sorts of interesting differences... why the 1.5 beta JVM
    is slower on a single proc system ... why the IBM 1.4 JDK is slower if
    you add the synchronized block I mention ... but the bottom line
    is that there are very few applications where it matters. For some
    it does, which is the reason why the unsynchronized StringBuilder
    was added to 1.5.

    Performance is important, yes, but in context.
    Marc Slemko, May 25, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Goldrake
    Replies:
    0
    Views:
    425
    Goldrake
    Sep 6, 2004
  2. Dan Pavel

    XEON session timeout

    Dan Pavel, May 12, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    311
    Joerg Jooss
    May 13, 2005
  3. Mike
    Replies:
    2
    Views:
    553
    Roedy Green
    Oct 22, 2003
  4. Alex Hunsley
    Replies:
    17
    Views:
    863
  5. Replies:
    5
    Views:
    1,904
Loading...

Share This Page