performance difference between OSx and Windows

B

Brian

I have been a Mac and linux guy since 1998 and except for a handful of
times, have not touched a MS box since then. This changes a few days
ago when I needed to get a MS box for another project. This leads me
to my question...

A while ago, I borrowed a python script from someone's blog that showed
some interesting profiling schemes. I made some modifications to it,
ran it and until I got this new machine forgot about it. Just for
kicks I ran it again on both boxes to see what the differences were.
To say I was surprised is an understatement. Here is the code:

#!/usr/bin/env python

import time

elasped = 0

def slowS():
t1 = time.clock()
slow_str=''
for i in range(100000):
slow_str = slow_str + str(i)
print 'Slow concatenation finsished in', time.clock() - t1, 'sec.'

def fastS():
t2 = time.clock()
fast_str = []
for i in range(100000):
fast_str.append(str(i))
fast_str = ''.join(fast_str)
print 'fast concatenation finished in', time.clock() - t2, 'sec.'

#slowS()
#fastS()

if __name__ == '__main__':
import hotshot
from hotshot import stats
prof = hotshot.Profile("Concant_Stats")
prof.runcall(slowS)
prof.runcall(fastS)
prof.close()
s = stats.load("Concant_Stats")
s.sort_stats("time").print_stats()

On my Quad G5 Mac with 4g of memory I get this result...
Slow concatenation finsished in 51.05 sec.
fast concatenation finished in 0.63 sec.
15 function calls in 51.036 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 51.033 51.033 51.033 51.033
/Users/brian/Desktop/strTest.py:7(slowS)
1 0.003 0.003 0.003 0.003
/Users/brian/Desktop/strTest.py:14(fastS)
12 0.000 0.000 0.000 0.000
/Applications/Komodo.app/Contents/SharedSupport/dbgp/bin/pydbgp:83(write)
1 0.000 0.000 0.000 0.000
/Applications/Komodo.app/Contents/SharedSupport/dbgp/bin/pydbgp:95(__getattr__)
0 0.000 0.000 profile:0(profiler)

The MS box is an HP with an Athlon 64 X2 Dual with 1g of memory. Here
are it's results...
Slow concatenation finsished in 23.798 sec
fast concatenation finished in 0.622 sec
15 function calls in 87.417 CPU seconds
Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 85.188 85.188 85.188 85.188 c:\documents and
settings\hp_administrator\desktop\strtest.py:5(slowS)
1 2.228 2.228 2.229 2.229 c:\documents and
settings\hp_administrator\desktop\strtest.py:12(fastS)
12 0.001 0.000 0.001 0.000 c:\program files\activestate
komodo 3.5\lib\support\dbgp\bin\pydbgp.py:83(write)
1 0.000 0.000 0.000 0.000 c:\program files\activestate
komodo 3.5\lib\support\dbgp\bin\pydbgp.py:95(__getattr__)
0 0.000 0.000 profile:0(profiler)

While the CPU speed on the G5 was faster the total execution time was
much quicker on the MS box. Can anyone give some suggestions as to why
this is?

Thanks,
Brian
 
A

Avizoa

Brian said:
I have been a Mac and linux guy since 1998 and except for a handful of
times, have not touched a MS box since then. This changes a few days
ago when I needed to get a MS box for another project. This leads me
to my question...

A while ago, I borrowed a python script from someone's blog that showed
some interesting profiling schemes. I made some modifications to it,
ran it and until I got this new machine forgot about it. Just for
kicks I ran it again on both boxes to see what the differences were.
To say I was surprised is an understatement. Here is the code:
*snip*


While the CPU speed on the G5 was faster the total execution time was
much quicker on the MS box. Can anyone give some suggestions as to why
this is?

Thanks,
Brian

1. Your test only makes use of one core. So it's a comparison between a
G5 and an Athlon 64. Which brings us to...

2. The K8 architecture is, in most operations, far superior to the G5
architecture, Apple's benchmarks notwithstanding.

You see, the way IBM cut down its Power 4 to make the PPC970 left it a
bit weak it the heavy lifting ability. The K8, however, is heir to the
amazing Alpha architecture's legacy.

While the G5 has no problems beating out the marketing-driven P4
architecture, it can't compete clock for clock with the K8.
 
B

Brian

Thanks for the response. I was unaware that the G5 was only using one
core. Can I ask why that is, and if there is a way to take advantage
of all 4 within python?

Thanks,
Brian
 
A

Avizoa

Can I ask why that is, and if there is a way to take advantage
of all 4 within python?

Sure. All major programming languages handle concurrency in one of two
ways: multithreading or parallel processes. In the standard python
interpreter, there is the Global Interpreter Lock (GIL for short) that
only allows one thread to run at once. So your option for python are
limited to running multiple processes or using a different interpreter.

What you are testing, however, is not easily parallizable since the
slow concat in particular is very dependent on sequential results. If
you want test the quad cores you'll have to run the test multiple times
with many different threads to truly stress them.
 
B

Brian

Thank you for your answer. I had a feeling that it would be a
threading issue, but I wasn't sure.

Brian
 
B

Brian

As one additional question, can someone point me in the direction of a
resource that would explain how I could use Python to tell me what core
is actually handling the process? I am not even sure if something like
this exists, but it would be an interesting feature to explore.

Thanks,
Brian
 
S

Scott David Daniels

Brian said:
As one additional question, can someone point me in the direction of a
resource that would explain how I could use Python to tell me what core
is actually handling the process? I am not even sure if something like
this exists, but it would be an interesting feature to explore.

Since the GIL is released from time to time (on any system call that may
wait, for example), it is a single core at any one instant, but is not
necessarily the same core over time.

--Scott David Daniels
(e-mail address removed)
 
D

dfj225

Hi Brian,

You may have already considered this, but since I didn't see it
mentioned in your post, I'll reply anyway.

I believe the Python binaries that Apple includes with OS X are always
slightly behind the latest that you can get from the official sources.
I'm not infront of my Mac right now, so I can't tell you the disparity.

One thing I would suggest would be to normalize the versions of Python
accross the two machines. I find that using Fink
(http://fink.sourceforge.net/) on OS X is the easiest way to install a
new version of Python (as well as much other open source software).

There are often speed improvements between older and newer Python
releases, so this might be the source of the execution time
differences.

cheers,
~doug

Brian wrote:
 
B

Brian

Hi Brian,

You may have already considered this, but since I didn't see it
mentioned in your post, I'll reply anyway.

I believe the Python binaries that Apple includes with OS X are always
slightly behind the latest that you can get from the official sources.
I'm not infront of my Mac right now, so I can't tell you the disparity.

You are right. Apple is quite far behind. I upgraded to 2.4.2 from
2.3.x. The MS box has the same version.
One thing I would suggest would be to normalize the versions of Python
accross the two machines. I find that using Fink
(http://fink.sourceforge.net/) on OS X is the easiest way to install a
new version of Python (as well as much other open source software).

I have explored fink but have not put it to use. Thanks for the tip.

Brian
 
A

Alex Martelli

Brian said:
You are right. Apple is quite far behind. I upgraded to 2.4.2 from
2.3.x. The MS box has the same version.

There's an excellent Universal version of 2.4.3 for MacOSX out on
python.org and I suggest you get it.

BTW, as I recently posted to rec.games.bridge and in more detail to
it.comp.macintosh (in Italian), these days in my spare time I'm porting
a library originally coded for Windows by Bo Haglund (whom I thank for
giving me the sources, albeit under NDA), which does double-dummy
analysis of bridge hands, to run as a Python extension under MacOSX and
Linux (see http://www.aleax.it/Bridge ). Elapsed time per deal for
hundreds of thousands of deals in a particularly difficult class
(totally flat hands of middling strength playing at NT) is about:

1.26 seconds iBook G4 12" (1.33 GHz)
0.88 seconds Pentium 4 3.20 GHz (on Linux -- gcc 3.2)
0.80 seconds Powermac G5 dual 1.8 GHz
0.65 seconds Macbook Pro 2.0 GHz

all w/Python 2.4.3, all save the Pentium w/gcc 4 and MacOSX 10.4, all
times for using a single core/processor (I just run two processes when I
want to max out BOTH cores/processors -- the analysis, which is a very
sophisticated version of alpha-beta-pruning tree-search, easily takes 99
to 100% of CPU time with little memory, disk or other I/O use).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top