dual processor

Paul Rubin · Sep 6, 2005

Steve Jorgensen said:
In this case, it woiuld just be keeping a list of dirty hash tables, and
having a process that pulls the next one from the queue, and cleans it.

If typical Python programs spend so enough time updating hash tables
for a hack like this to be of any benefit, Python itself is seriously
mis-designed and needs to be fixed.

Steve Jorgensen · Sep 6, 2005

If typical Python programs spend so enough time updating hash tables
for a hack like this to be of any benefit, Python itself is seriously
mis-designed and needs to be fixed.

I dunno - you might be right, and you might be wrong. I was just pointing out
that there may be standard operations that can be made lazy and benefit from
background tasks to complete before they are needed for use by the Python
code.

Given that Python is highly dependent upon dictionaries, I would think a lot
of the processor time used by a Python app is spent in updating hash tables.
That guess could be right or wrong, bus assuming it's right, is that a design
flaw? That's just a language spending most of its time handling the
constructs it is based on. What else would it do?

Paul Rubin · Sep 6, 2005

Steve Jorgensen said:
Given that Python is highly dependent upon dictionaries, I would
think a lot of the processor time used by a Python app is spent in
updating hash tables. That guess could be right or wrong, bus
assuming it's right, is that a design flaw? That's just a language
spending most of its time handling the constructs it is based on.
What else would it do?

I don't believe it's right based on half-remembered profiling discussions
I've seen here. I haven't profiled CPython myself. However, if tuning
the rest of the implementation makes hash tables a big cost, then the
implementation, and possibly the language, should be updated to not
have to update hashes so much. For example,

x.y = 3

currently causes a hash update in x's internal dictionary. But either
some static type inference or a specializing compiler like psyco could
optimize the hash lookup away, and just update a fixed slot in a table.

Michael Sparks · Sep 6, 2005

Jeremy said:
Ummmm....not totally. It depends on what you're doing.

Yes, it does. Hence why I said personal perspective.

Sort of. Steve
brings up an interesting argument of making the language do some of your
thinking for you. Maybe I'll address that momentarily....

Personally I think that the language and tools will have to help. I'm
working on the latter, I hope the GIL goes away to help with the former,
but does so in an intelligent manner. Why am I working on the latter?

I work with naturally concurrent systems all the time, and they're
concurrent not for performance reasons, but simply because that's what they
are. And as a result I want concurrency easy to deal with, and efficiency as
a secondary concern. However in that scenario, having multiple CPU's not
being utilised sensibly *is* a concern to me.

I'm not saying I wish the GIL would stay around. I wish it would go.
As the price of computers goes down, the number of CPUs per computer
goes up, and the price per CPU in a single system goes down, the ability
to utilize a bunch of CPUs is going to become more important.

And maybe
Steve's magical thinking programming language will have a ton of merit.

I see no reason to use such derisory tones, though I'm sure you didn't mean
it that way. (I can see you mean it as extreme skepticism though

But let me play devil's advocate for
a sec. Let's say we *could* fully utilize a multi CPU today with
Python. ....
I would almost bet money that the majority of code would
not be helped by that at all.

Are you so sure? I suspect this is due to you being used to writing code
that is designed for a single CPU system. What if you're basic model of
system creation changed to include system composition as well as
function calls? Then each part of the system you compose can potentially
run on a different CPU. Take the following for example:

(sorry for the length, I prefer real examples

Graphline(
EXIT = ExceptionRaiser("FORCED SYSTEM QUIT"),
MOUSE = Multiclick(caption="",
position=(0,0),
transparent=True,
msgs = [ "", "NEXT", "FIRST", "PREV", "PREV","NEXT" ],
size=(1024,768)),
KEYS = KeyEvent(outboxes = { "slidecontrol" : "Normal place for message",
"shutdown" : "Place to send some shutdown messages",
"trace" : "Place for trace messages to go",
},
key_events = {112: ("PREV", "slidecontrol"),
110: ("NEXT","slidecontrol"),
113: ("QUIT", "shutdown"),
}),
SPLITTER = Splitter(outboxes = {"totimer" : "For sending copies of key events to the timer",
"tochooser" : "This is the primary location for key events",
}),
TIMER = TimeRepeatMessage("NEXT",3),
FILES = Chooser(items = files, loop=True),
DISPLAY = Image(size=(1024,768),
position=(0,0),
maxpect=(1024,768) ),
linkages = {
("TIMER", "outbox") : ("FILES", "inbox"),

("MOUSE", "outbox") : ("SPLITTER", "inbox"),
("KEYS", "slidecontrol") : ("SPLITTER", "inbox"),
("SPLITTER", "tochooser") : ("FILES", "inbox"),
("SPLITTER", "totimer") : ("TIMER", "reset"),

("KEYS", "shutdown") : ("EXIT", "inbox"),
("FILES", "outbox") : ("DISPLAY", "inbox"),
}
).run()

What does that do? Its a slideshow program for display pictures. There's a
small amount of setup before this (identifying files for display, imports, etc),
but that's by far the bulk of the system.

That's pure python code (aside from pygame), and the majority of the code
is written single threaded, with very little concern for concurrency. However
the code above will naturally sit on a 7 CPU system and use all 7 CPUs (when
we're done). Currently however we use generators to limit the overhead in
a single CPU system, though if the GIL was eliminated sensibly, using threads
would allow the same code above to run on a multi-CPU system efficientally.

It probably looks strange, but it's really just a logical extension of the
Unix command line's pipelines to allow multiple pipelines. Similarly, from
a unix command line perspective, the following will automatically take
advantage of all the CPU's I have available:

(find |while read i; do md5sum $i; done|cut -b-32) 2>/dev/null |sort

And a) most unix sys admins I know find that easy (probably the above
laughable) b) given a multiprocessor system will probably try to maximise
pipelining c) I see no reason why sys admins should be the only people
writing programs who use concurrency without thinking about it

I *do* agree it takes a little getting used to, but then I'm sure the same
was true for many people who learnt OOP after learning to program. Unlike
people who learnt OO at the time they started learning programming and just
see it as a natural part of a language.

There are benefits to writing code in C and Java apart from
concurrency. Most of them are massochistic, but there are benefits
nonetheless. For my programming buck, Python wins hands down.

But I agree with you. Python really should start addressing solutions
for concurrent tasks that will benefit from simultaneously utilizing
multiple CPUs.

That's my point too. I don't think our opinions really diverge that far

Best Regards,

Michael.

Jeremy Jones · Sep 6, 2005

Michael said:
Jeremy Jones wrote:

I see no reason to use such derisory tones, though I'm sure you didn't mean
it that way. (I can see you mean it as extreme skepticism though

None of the above, really. I thought it was a really great idea and
worthy of pursuit. In my response back to Steve, the most skeptical
thing I said was that I think it would be insanely difficult to
implement. Maybe it wouldn't be as hard as I think. And according to a
follow-up by Steve, it probably wouldn't.

Are you so sure? I suspect this is due to you being used to writing code
that is designed for a single CPU system.

Not really. I've got a couple of projects in work that would benefit
tremendously from the GIL being lifted. And one of them is actually
evolving into a funny little hack that will allow easy persistent
message passing between processes (on the same system) without having to
mess around with networking. I'm betting this is the case just because
of reading this list, the tutor list, and interaction with other Python
programmers.

That's my point too. I don't think our opinions really diverge that far

We don't. Again (as we have both stated), as systems find themselves
with more and more CPUs onboard, it becomes more and more absurd to have
to do little hacks like what I allude to above. If Python wants to
maintain its position in the pantheon of programming languages, it
really needs to 1) find a good clean way to utilize muti-CPU machines
and 2) come up with a simple, consistent, Pythonic concurrency paradigm.

Best Regards,

Michael.

Good discussion.

JMJ

Jorgen Grahn · Sep 6, 2005

Steve Jorgensen wrote: ....

It depends on personal perspective. If in a few years time we all have
machines with multiple cores (eg the CELL with effective 9 CPUs on a chip,
albeit 8 more specialised ones), would you prefer that your code *could*
utilise your hardware sensibly rather than not.

Or put another way - would you prefer to write your code mainly in a
language like python, or mainly in a language like C or Java? If python,
it's worth worrying about!

Mainly in Python, of course. But it still feels like a pretty perverted idea
to fill a SMP system with something as inefficient as interpreting Python
code!

(By the way, I don't understand why a computer should run one program at
a time all the time. Take a time-sharing system where lots of people are
logged in and do their work. Add a CPU there and you'll have an immediate
performance gain, even if noone is running programs that are optimized
for it!)

I feel the recent SMP hype (in general, and in Python) is a red herring. Why
do I need that extra performance? What application would use it? Am I
prepared to pay the price (in bugs, lack of features, money, etc) for
someone to implement this? There's already a lot of performance lost in
bloatware people use everyday; why are we not paying the much lower price
for having that fixed with traditional code optimization?

I am sure some applications that ordinary people use could benefit from SMP
(like image processing). But most tasks don't, and most of those that do can
be handled on the process level. For example, if I'm compiling a big C
project, I can say 'make -j3' and get three concurrent compilations. The
Unix shell pipeline is another example.

/Jorgen

Jorgen Grahn · Sep 6, 2005

On Tue said:
Are you so sure? I suspect this is due to you being used to writing code
that is designed for a single CPU system. What if you're basic model of
system creation changed to include system composition as well as
function calls? Then each part of the system you compose can potentially
run on a different CPU. Take the following for example: ....
It probably looks strange, but it's really just a logical extension of the
Unix command line's pipelines to allow multiple pipelines. Similarly, from
a unix command line perspective, the following will automatically take
advantage of all the CPU's I have available:

(find |while read i; do md5sum $i; done|cut -b-32) 2>/dev/null |sort

And a) most unix sys admins I know find that easy (probably the above
laughable) b) given a multiprocessor system will probably try to maximise
pipelining c) I see no reason why sys admins should be the only people
writing programs who use concurrency without thinking about it

Nitpick: not all Unix users are sysadmins ;-) Some Unix sysadmins actually
have real users, and the clued users use the same tools. I used the 'make
-j3' example elsewhere in the thread (I hadn't read this posting when I
responded there).

It seems to me that there must be a flaw in your arguments, but I can't seem
to find it ;-)

Maybe it's hard in real life to find two independent tasks A and B that can
be parallellized with just a unidirectional pipe between them? Because as
soon as you have to do the whole threading/locking/communication circus, it
gets tricky and the bugs (and performance problems) show up fast.

But it's interesting that the Unix pipeline Just Works (TM) with so little
effort.

/Jorgen

Thomas Bellman · Sep 6, 2005

Michael Sparks said:
Similarly, from
a unix command line perspective, the following will automatically take
advantage of all the CPU's I have available:

(find |while read i; do md5sum $i; done|cut -b-32) 2>/dev/null |sort

No, it won't. At the most, it will use four CPU:s for user code.

Even so, the vast majority of CPU time in the above pipeline will
be spent in 'md5sum', but those calls will be run in series, not
in parallell. The very small CPU bursts used by 'find' and 'cut'
are negligable in comparison, and would likely fit within the
slots when 'md5sum' is waiting for I/O even on a single-CPU
system.

And I'm fairly certain that 'sort' won't start spending CPU time
until it has collected all its input, so you won't gain much
there either.

Paul Rubin · Sep 6, 2005

Jorgen Grahn said:
I feel the recent SMP hype (in general, and in Python) is a red herring. Why
do I need that extra performance? What application would use it?

How many mhz does the computer you're using right now have? When did
you buy it? Did you buy it to replace a slower one? If yes, you must
have wanted more performance. Just about everyone wants more
performance. That's why mhz keeps going up and people keep buying
faster and faster cpu's.

CPU makers seem to be running out of ways to increase mhz. Their next
avenue to increasing performance is SMP, so they're going to do that
and people are going to buy those. Just like other languages, Python
makes perfectly good use of increasing mhz, so it keeps up with them.
If the other languages also make good use of SMP and Python doesn't,
Python will fall back into obscurity.

Am I prepared to pay the price (in bugs, lack of features, money,
etc) for someone to implement this? There's already a lot of
performance lost in bloatware people use everyday; why are we not
paying the much lower price for having that fixed with traditional
code optimization?

That is needed too. But obviously increased hardware speed has a lot
going for it. That's why people keep buying faster computers.

Paul Rubin · Sep 6, 2005

Thomas Bellman said:
And I'm fairly certain that 'sort' won't start spending CPU time
until it has collected all its input, so you won't gain much
there either.

For large input, sort uses the obvious in-memory sort, external merge
algorithm, so it starts using cpu once there's enough input to fill
the memory buffer.

Bengt Richter · Sep 7, 2005

No, it won't. At the most, it will use four CPU:s for user code.

Even so, the vast majority of CPU time in the above pipeline will
be spent in 'md5sum', but those calls will be run in series, not
in parallell. The very small CPU bursts used by 'find' and 'cut'
are negligable in comparison, and would likely fit within the
slots when 'md5sum' is waiting for I/O even on a single-CPU
system.

And I'm fairly certain that 'sort' won't start spending CPU time
until it has collected all its input, so you won't gain much
there either.

Why wouldn't a large sequence sort be internally broken down into parallel
sub-sequence sorts and merges that separate processors can work on?

Regards,
Bengt Richter

Paul Rubin · Sep 7, 2005

Why wouldn't a large sequence sort be internally broken down into parallel
sub-sequence sorts and merges that separate processors can work on?

Usually the input would be split into runs that would get sorted in
memory. Conventional wisdom says that the most important factor in
speeding up those sorts is making the runs as long as possible.
Depending on how complicated comparing two elements is, it's not clear
whether increased cache pressure from parallel processors hitting
different regions of memory would slow down the sort more than
parallelism would speed it up. Certainly any sorting utility that
tried to use parallel processors should use algorithms carefully
chosen and tuned around such issues.

Mike Meyer · Sep 7, 2005

Jorgen Grahn said:
But it's interesting that the Unix pipeline Just Works (TM) with so little
effort.

Yes it is. That's a result of two things:

1) The people who invented pipes were *very* smart (but not smart
enough to invent stderr at the same time

.

2) Pipes use a dead simple concurrency model. It isn't even as
powerful as CSP. No shared memory. No synchronization primitives. Data
flows in one direction, and one direction only.

Basically, each element of a pipe can be programmed ignoring
concurrency. It doesn't get much simpler than that.

<mike

Mike Meyer · Sep 7, 2005

Jeremy Jones said:
1) find a good clean way to utilize muti-CPU machines and

I like SCOOP. But I'm still looking for alternatives.

2) come up with a simple, consistent, Pythonic concurrency paradigm.

That's the hard part. SCOOP attaches attributes to *variables*. It
also changes the semantics of function calls based on the values of
those attributes. Part of the power of using SCOOP comes from the
processor detecting when a variable has been declared as having an
attribute is used to reference objects for which the attribute doesn't
apply.

I'm not sure how Pythonic that can be made.

<mike

Robin Becker · Sep 7, 2005

Paul said:
I think that doesn't count as using a the multiple processors; it's
just multiple programs that could be on separate boxes.
Multiprocessing means shared memory.

This module might be of interest: http://poshmodule.sf.net

It seems it might be a bit out of date. I've emailed the author via sf, but no
reply. Does anyone know if poshmodule works with latest stuff?

Michael Sparks · Sep 7, 2005

Jorgen said:
Nitpick: not all Unix users are sysadmins ;-) Some Unix sysadmins actually
have real users, and the clued users use the same tools. I used the 'make
-j3' example elsewhere in the thread (I hadn't read this posting when I
responded there).

I simply picked a group that do this often

The example pipeline I gave
above is I admit a particularly dire one. Things like the following are far
more silly:

# rm file; fortune | tee file | wc | cat - file
3 16 110
Bubble Memory, n.:
A derogatory term, usually referring to a person's
intelligence. See also "vacuum tube".

And

# (rm file; (while [ ! -s file ]; do echo >/dev/null; done; cat file |wc) & fortune | tee file) 2>/dev/null
Yea, though I walk through the valley of the shadow of APL, I shall
fear no evil, for I can string six primitive monadic and dyadic
operators together.
-- Steve Higgins
# 4 31 171

It seems to me that there must be a flaw in your arguments, but I can't
seem to find it ;-)

Sorry, but that's probably the funniest thing I've read all day

Best Regards,

Michael.

Michael Sparks · Sep 7, 2005

Thomas said:
No, it won't. At the most, it will use four CPU:s for user code.

OK, maybe I should've been more precise. That said, the largest machine I
could potentially get access relatively easily to would be a quad CPU
machine so if I wanted to be be pedantic, regarding "*I* have available"
the idea stands. (Note I didn't say take /best/ advantage - that would
require rewriting all the indvidual parts of the pipeline above to be
structured in a similar manner or some other parallel approach)

You've essentially re-iterated my point though - that it naturally sorts
itself out, and does the best fit it can, which is better than none
(despite this being a naff example - as I mentioned). Worst case, yes,
everything serialises itself.

Michael.

Nick Craig-Wood · Sep 8, 2005

Paul Rubin said:
How many mhz does the computer you're using right now have? When did
you buy it? Did you buy it to replace a slower one? If yes, you must
have wanted more performance. Just about everyone wants more
performance. That's why mhz keeps going up and people keep buying
faster and faster cpu's.

CPU makers seem to be running out of ways to increase mhz. Their next
avenue to increasing performance is SMP, so they're going to do that
and people are going to buy those. Just like other languages, Python
makes perfectly good use of increasing mhz, so it keeps up with them.
If the other languages also make good use of SMP and Python doesn't,
Python will fall back into obscurity.

Just to back your point up, here is a snippet from theregister about
Sun's new server chip. (This is a rumour piece but theregister
usually gets it right!)

Sun has positioned Niagara-based systems as low-end to midrange
Xeon server killers. This may sound like a familiar pitch - Sun
used it with the much delayed UltraSPARC IIIi processor. This time
around though Sun seems closer to delivering on its promises by
shipping an 8 core/32 thread chip. It's the most radical multicore
design to date from a mainstream server processor manufacturer and
arrives more or less on time.

It goes on later to say "The physical processor has 8 cores and 32
virtual processors" and runs at 1080 MHz.

So fewer GHz but more CPUs is the future according to Sun.

http://www.theregister.co.uk/2005/09/07/sun_niagara_details/

Jorgen Grahn · Sep 8, 2005

How many mhz does the computer you're using right now have? When did
you buy it?

I'm not a good example -- my fastest computer is a Mac Mini. Come to think
of it, last time I /really/ upgraded for CPU speed was when I bought my
Amiga 4000/030 in 1994 ;-)

My 200MHz Pentium feels a bit slow for some tasks, but most of the time I
cannot really tell the difference, and its lack of RAM and disk space is
much more limiting.

Did you buy it to replace a slower one? If yes, you must
have wanted more performance. Just about everyone wants more
performance. That's why mhz keeps going up and people keep buying
faster and faster cpu's.

I'm not sure that is true, for most people. People keep buying faster CPUs
because the slower ones become unavailable! How this works from an
economical and psychological point of view, I don't know.

CPU makers seem to be running out of ways to increase mhz. Their next
avenue to increasing performance is SMP, so they're going to do that
and people are going to buy those. Just like other languages, Python
makes perfectly good use of increasing mhz, so it keeps up with them.
If the other languages also make good use of SMP and Python doesn't,
Python will fall back into obscurity.

I don't believe that will ever happen. Either of them.

My CPU spends almost all its time running code written in C. That code has
been written over the last thirty years under the assumption that if there
is SMP, it will be taken advantage of on the process level. I cannot imagine
anyone sitting down and rewriting all that code to take advantage of
concurreny.

Thus, I don't believe that SMP and SMP-like technologies will improve
performance for ordinary people. Or, if it will, it's because different
processes will run concurrently, not because applications become concurrent.
Except for some applications like image processing programs, which noone
would dream of implementing in Python anyway.

New, radical and exciting things don't happen in computing very often.

/Jorgen

Robin Becker · Sep 9, 2005

Robin said:
Paul Rubin wrote:

It seems it might be a bit out of date. I've emailed the author via sf, but no
reply. Does anyone know if poshmodule works with latest stuff?

haven't been able to contact posh's author, but this blog entry

http://blog.amber.org/2004/12/10/posh-power/

seems to suggest that posh might not be terribly useful in its current state

"Updated: I spoke with one of the authors, Steffen Viken Valvåg, and his comment
was that Posh only went through proof of concept, and never further, so it has a
lot of issues. That certainly clarifies the problems I’ve had with it. For now,
I’m going to put it on the back burner, and come back, and perhaps update it
myself."

Tkinter, IDLE keeps crashing	5	Feb 17, 2013
Strange difference in performance (30x) between PPC and Intel	1	Sep 29, 2007

dual processor

Paul Rubin

Steve Jorgensen

Paul Rubin

Michael Sparks

Jeremy Jones

Jorgen Grahn

Jorgen Grahn

Thomas Bellman

Paul Rubin

Paul Rubin

Bengt Richter

Paul Rubin

Mike Meyer

Mike Meyer

Robin Becker

Michael Sparks

Michael Sparks

Nick Craig-Wood

Jorgen Grahn

Robin Becker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads