performance difference between OSx and Windows

Brian · May 21, 2006

I have been a Mac and linux guy since 1998 and except for a handful of
times, have not touched a MS box since then. This changes a few days
ago when I needed to get a MS box for another project. This leads me
to my question...

A while ago, I borrowed a python script from someone's blog that showed
some interesting profiling schemes. I made some modifications to it,
ran it and until I got this new machine forgot about it. Just for
kicks I ran it again on both boxes to see what the differences were.
To say I was surprised is an understatement. Here is the code:

#!/usr/bin/env python

import time

elasped = 0

def slowS():
t1 = time.clock()
slow_str=''
for i in range(100000):
slow_str = slow_str + str(i)
print 'Slow concatenation finsished in', time.clock() - t1, 'sec.'

def fastS():
t2 = time.clock()
fast_str = []
for i in range(100000):
fast_str.append(str(i))
fast_str = ''.join(fast_str)
print 'fast concatenation finished in', time.clock() - t2, 'sec.'

#slowS()
#fastS()

if __name__ == '__main__':
import hotshot
from hotshot import stats
prof = hotshot.Profile("Concant_Stats")
prof.runcall(slowS)
prof.runcall(fastS)
prof.close()
s = stats.load("Concant_Stats")
s.sort_stats("time").print_stats()

On my Quad G5 Mac with 4g of memory I get this result...
Slow concatenation finsished in 51.05 sec.
fast concatenation finished in 0.63 sec.
15 function calls in 51.036 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 51.033 51.033 51.033 51.033
/Users/brian/Desktop/strTest.py:7(slowS)
1 0.003 0.003 0.003 0.003
/Users/brian/Desktop/strTest.py:14(fastS)
12 0.000 0.000 0.000 0.000
/Applications/Komodo.app/Contents/SharedSupport/dbgp/bin/pydbgp:83(write)
1 0.000 0.000 0.000 0.000
/Applications/Komodo.app/Contents/SharedSupport/dbgp/bin/pydbgp:95(__getattr__)
0 0.000 0.000 profile:0(profiler)

The MS box is an HP with an Athlon 64 X2 Dual with 1g of memory. Here
are it's results...
Slow concatenation finsished in 23.798 sec
fast concatenation finished in 0.622 sec
15 function calls in 87.417 CPU seconds
Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 85.188 85.188 85.188 85.188 c:\documents and
settings\hp_administrator\desktop\strtest.py:5(slowS)
1 2.228 2.228 2.229 2.229 c:\documents and
settings\hp_administrator\desktop\strtest.py:12(fastS)
12 0.001 0.000 0.001 0.000 c:\program files\activestate
komodo 3.5\lib\support\dbgp\bin\pydbgp.py:83(write)
1 0.000 0.000 0.000 0.000 c:\program files\activestate
komodo 3.5\lib\support\dbgp\bin\pydbgp.py:95(__getattr__)
0 0.000 0.000 profile:0(profiler)

While the CPU speed on the G5 was faster the total execution time was
much quicker on the MS box. Can anyone give some suggestions as to why
this is?

Thanks,
Brian

Avizoa · May 22, 2006

Brian said:
I have been a Mac and linux guy since 1998 and except for a handful of
times, have not touched a MS box since then. This changes a few days
ago when I needed to get a MS box for another project. This leads me
to my question...

A while ago, I borrowed a python script from someone's blog that showed
some interesting profiling schemes. I made some modifications to it,
ran it and until I got this new machine forgot about it. Just for
kicks I ran it again on both boxes to see what the differences were.
To say I was surprised is an understatement. Here is the code:
*snip*

While the CPU speed on the G5 was faster the total execution time was
much quicker on the MS box. Can anyone give some suggestions as to why
this is?

Thanks,
Brian

1. Your test only makes use of one core. So it's a comparison between a
G5 and an Athlon 64. Which brings us to...

2. The K8 architecture is, in most operations, far superior to the G5
architecture, Apple's benchmarks notwithstanding.

You see, the way IBM cut down its Power 4 to make the PPC970 left it a
bit weak it the heavy lifting ability. The K8, however, is heir to the
amazing Alpha architecture's legacy.

While the G5 has no problems beating out the marketing-driven P4
architecture, it can't compete clock for clock with the K8.

Brian · May 22, 2006

Thanks for the response. I was unaware that the G5 was only using one
core. Can I ask why that is, and if there is a way to take advantage
of all 4 within python?

Thanks,
Brian

Avizoa · May 22, 2006

Can I ask why that is, and if there is a way to take advantage

of all 4 within python?

Sure. All major programming languages handle concurrency in one of two
ways: multithreading or parallel processes. In the standard python
interpreter, there is the Global Interpreter Lock (GIL for short) that
only allows one thread to run at once. So your option for python are
limited to running multiple processes or using a different interpreter.

What you are testing, however, is not easily parallizable since the
slow concat in particular is very dependent on sequential results. If
you want test the quad cores you'll have to run the test multiple times
with many different threads to truly stress them.

Brian · May 22, 2006

Thank you for your answer. I had a feeling that it would be a
threading issue, but I wasn't sure.

Brian

Brian · May 22, 2006

As one additional question, can someone point me in the direction of a
resource that would explain how I could use Python to tell me what core
is actually handling the process? I am not even sure if something like
this exists, but it would be an interesting feature to explore.

Thanks,
Brian

Scott David Daniels · May 22, 2006

Brian said:
As one additional question, can someone point me in the direction of a
resource that would explain how I could use Python to tell me what core
is actually handling the process? I am not even sure if something like
this exists, but it would be an interesting feature to explore.

Since the GIL is released from time to time (on any system call that may
wait, for example), it is a single core at any one instant, but is not
necessarily the same core over time.

--Scott David Daniels
(e-mail address removed)

dfj225 · May 23, 2006

Hi Brian,

You may have already considered this, but since I didn't see it
mentioned in your post, I'll reply anyway.

I believe the Python binaries that Apple includes with OS X are always
slightly behind the latest that you can get from the official sources.
I'm not infront of my Mac right now, so I can't tell you the disparity.

One thing I would suggest would be to normalize the versions of Python
accross the two machines. I find that using Fink
(http://fink.sourceforge.net/) on OS X is the easiest way to install a
new version of Python (as well as much other open source software).

There are often speed improvements between older and newer Python
releases, so this might be the source of the execution time
differences.

cheers,
~doug

Brian wrote:

Brian · May 24, 2006

Hi Brian,

You may have already considered this, but since I didn't see it
mentioned in your post, I'll reply anyway.

I believe the Python binaries that Apple includes with OS X are always
slightly behind the latest that you can get from the official sources.
I'm not infront of my Mac right now, so I can't tell you the disparity.

You are right. Apple is quite far behind. I upgraded to 2.4.2 from
2.3.x. The MS box has the same version.

One thing I would suggest would be to normalize the versions of Python
accross the two machines. I find that using Fink
(http://fink.sourceforge.net/) on OS X is the easiest way to install a
new version of Python (as well as much other open source software).

I have explored fink but have not put it to use. Thanks for the tip.

Brian

Alex Martelli · May 24, 2006

Brian said:
You are right. Apple is quite far behind. I upgraded to 2.4.2 from
2.3.x. The MS box has the same version.

There's an excellent Universal version of 2.4.3 for MacOSX out on
python.org and I suggest you get it.

BTW, as I recently posted to rec.games.bridge and in more detail to
it.comp.macintosh (in Italian), these days in my spare time I'm porting
a library originally coded for Windows by Bo Haglund (whom I thank for
giving me the sources, albeit under NDA), which does double-dummy
analysis of bridge hands, to run as a Python extension under MacOSX and
Linux (see http://www.aleax.it/Bridge ). Elapsed time per deal for
hundreds of thousands of deals in a particularly difficult class
(totally flat hands of middling strength playing at NT) is about:

1.26 seconds iBook G4 12" (1.33 GHz)
0.88 seconds Pentium 4 3.20 GHz (on Linux -- gcc 3.2)
0.80 seconds Powermac G5 dual 1.8 GHz
0.65 seconds Macbook Pro 2.0 GHz

all w/Python 2.4.3, all save the Pentium w/gcc 4 and MacOSX 10.4, all
times for using a single core/processor (I just run two processes when I
want to max out BOTH cores/processors -- the analysis, which is a very
sophisticated version of alpha-beta-pruning tree-search, easily takes 99
to 100% of CPU time with little memory, disk or other I/O use).

Alex

A question about thrift performance.	0	Jan 6, 2013
definition of sub-functions in the hotshot profiler	0	Mar 23, 2006
multithreading, performance, again...	1	Dec 30, 2009
question about xrange performance	11	Apr 17, 2009
profiling and performance of shelves	0	Jun 22, 2004
Python 2.3.3 : Win32 build vs Cygwin build performance ?	3	Jan 27, 2004
Performance profiling Python code	2	Mar 24, 2006
Speed ain't bad	14	Dec 31, 2004

performance difference between OSx and Windows

Brian

Avizoa

Brian

Avizoa

Brian

Brian

Scott David Daniels

dfj225

Brian

Alex Martelli

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads