Selecting target CPU for thread

T

Todd

Hello,

I have written a Java program that is essentially performing a Monte
Carlo simulation. I would like to be able to assign threads to
particular CPUs for the run.

Is there a way to do this in Java? At a minimum is there a way to
retrieve the CPU ID upon which the thread is running (to see if there
is at least CPU distribution of threads being performed by the OS)?

Thank you in advance for all your help,
Todd
 
G

Günter Vollmer

Todd said:
Hello,

I have written a Java program that is essentially performing a Monte
Carlo simulation. I would like to be able to assign threads to
particular CPUs for the run.

Is there a way to do this in Java? At a minimum is there a way to
retrieve the CPU ID upon which the thread is running (to see if there
is at least CPU distribution of threads being performed by the OS)?

do you really think you or java knows better than the OS running your
programm which CPU or core is idle?
 
T

Todd

do you really think you or java knows better than the OS running your
programm which CPU or core is idle?

No, I don't. How about the ID - if I had just asked how to retrieve
that for statistical purposes, would this have become a question that
you could answer without sarcasm?
 
E

Eric Sosman

Todd said:
No, I don't. How about the ID - if I had just asked how to retrieve
that for statistical purposes, would this have become a question that
you could answer without sarcasm?

What "statistical purposes" do you have in mind? For
example, if a thread reports that it spends 10%,20%,30%,40%
of its time running on CPUs 0,1,5,9, how would this answer
influence your future behavior? If "Not at all," then why
bother asking?

There's also the problem that the answer to "What CPU am
I running on?" can become outdated as soon as it's obtained.
The best you could hope for is "What CPU *was* I running on?"
and even that much is subject to interpretation.
 
T

Todd

     What "statistical purposes" do you have in mind?  For
example, if a thread reports that it spends 10%,20%,30%,40%
of its time running on CPUs 0,1,5,9, how would this answer
influence your future behavior?  If "Not at all," then why
bother asking?

     There's also the problem that the answer to "What CPU am
I running on?" can become outdated as soon as it's obtained.
The best you could hope for is "What CPU *was* I running on?"
and even that much is subject to interpretation.

--
Eric Sosman
(e-mail address removed)- Hide quoted text -

- Show quoted text -

I am sorry that I was unclear - is there any way using Java to get the
cpu id for the current thread? if the answer is yes, how is it done?

I am starting to get the idea, that nobody knows how this is done, let
alone if it can be done. Please, please, please, if you don't have an
answer that involves the Java language aspect of this question, don't
respond.

I don't intend to sound rude, I am just trying to get my question
answered, without having to justify my reasons for asking the question.
 
M

Mark Thornton

Todd said:
Hello,

I have written a Java program that is essentially performing a Monte
Carlo simulation. I would like to be able to assign threads to
particular CPUs for the run.

Is there a way to do this in Java?

Only by using JNI, and even that assumes that the OS makes the
information available. As others have said the information is of very
dubious use.

Mark Thornton
 
K

Kenneth P. Turvey

I am sorry that I was unclear - is there any way using Java to get the
cpu id for the current thread? if the answer is yes, how is it done?

The answer is no, not without native code. But the real answer is that
you don't need it. You should probably be using the id of the thread, not
the CPU. This you can get.
 
P

Patricia Shanahan

Kenneth said:
The answer is no, not without native code. But the real answer is that
you don't need it. You should probably be using the id of the thread, not
the CPU. This you can get.

I don't agree with "you don't need it".

Consider the following basic question: "Are threads moving around too
much?". There is a non-zero cost to moving a thread, because each
processor accumulates cache contents and other history for the threads
it is running. Excessive thread movement is a possible hypothesis if a
thread has an unexpectedly large cache miss rate. On the other hand, it
is also undesirable to leave an imbalance too long.

How would one investigate this sort of question without asking about
mappings between threads and processors?

Patricia
 
M

Mark Space

Todd said:
I don't intend to sound rude, I am just trying to get my question
answered, without having to justify my reasons for asking the question.

If it is not possible to do this, a good "why" would be one way of
convincing Sun to add it. This won't help you in the short term, but it
would help the Java community as a whole.

Have you looked into JMX? I did a quick search and I didn't see
anything that directly related to your issue. But it might be possible
to tease out the info you want from JMX somewhere.

Again, if JMX can't do this, justifying to Sun that measuring Thread-CPU
context switching (or some-such) would be the best thing in the long run.
 
E

Eric Sosman

Patricia said:
I don't agree with "you don't need it".

Consider the following basic question: "Are threads moving around too
much?". There is a non-zero cost to moving a thread, because each
processor accumulates cache contents and other history for the threads
it is running. Excessive thread movement is a possible hypothesis if a
thread has an unexpectedly large cache miss rate. On the other hand, it
is also undesirable to leave an imbalance too long.

How would one investigate this sort of question without asking about
mappings between threads and processors?

From outside the JVM, by preference. On the inside, all
a thread could (perhaps) do is discover which CPU it was running
on a moment ago, and do this many times at various points in
its execution. You could get (at best) a histogram of the
number of times the thread found itself to have been on each
CPU; with considerable effort you might even extend this to
distinguish between "shortly after something that probably
blocked" (when a CPU switch may be relatively benign) and
"right in the middle of when I was doing something else" (when
a CPU switch might lead to the cache inefficiencies you mention).

But I don't see how the insider view can hope to spot all
the CPU changes, nor to relate even the observed changes to the
train of events that caused them. For that, you need a tool
that can see both the Java threads and the system scheduler at
the same time. On Solaris (or MacOS X or FreeBSD, I believe),
you'd naturally turn to DTrace. Other environments may not have
tools that are quite so useful, but can probably give summary
statistics (number of CPU migrations per second, that sort of
thing), even if unable to pin them down to specific threads.

I see little prospect of doing a good job of this as an
"inside job." In contrast to the all-too-common financial scams
we read about, this job cries out for "outsider information."
 
N

Nigel Wade

Patricia said:
I don't agree with "you don't need it".

Consider the following basic question: "Are threads moving around too
much?". There is a non-zero cost to moving a thread, because each
processor accumulates cache contents and other history for the threads
it is running. Excessive thread movement is a possible hypothesis if a
thread has an unexpectedly large cache miss rate. On the other hand, it
is also undesirable to leave an imbalance too long.

How would one investigate this sort of question without asking about
mappings between threads and processors?

Certainly not from within the application. All you can possibly hope to do is
find out what CPU/core the current thread is executing on at any given instant
in time. You can't find out which CPU it was running on before you asked and
you can only guess what it will do afterwards. This is a prime example of
Heisenberg's uncertainty principle. The only way to get more information is to
ask more often (and therefore spend less time doing useful work, and you still
don't know if any of that useful work was carried out on the CPU on which you
asked the question). Ultimately, the only way to know all the time on which CPU
your thread is executing is to do nothing other than ask "on what CPU am I
running?". Hence the futility of the exercise.
 
R

Roedy Green

Consider the following basic question: "Are threads moving around too
much?". There is a non-zero cost to moving a thread, because each
processor accumulates cache contents and other history for the threads
it is running. Excessive thread movement is a possible hypothesis if a
thread has an unexpectedly large cache miss rate. On the other hand, it
is also undesirable to leave an imbalance too long.

Which common machines have shared caches and which separate?
 
M

Mark Thornton

Roedy said:
Which common machines have shared caches and which separate?
My quad core Q6600 has a mixture: each processor has 32k instruction
and data L1 caches, then pairs of processors each share a 4MB L2 cache.

Mark Thornton
 
D

Daniel Pitts

I don't agree with "you don't need it".

Consider the following basic question: "Are threads moving around too
much?". There is a non-zero cost to moving a thread, because each
processor accumulates cache contents and other history for the threads
it is running. Excessive thread movement is a possible hypothesis if a
thread has an unexpectedly large cache miss rate. On the other hand, it
is also undesirable to leave an imbalance too long.

How would one investigate this sort of question without asking about
mappings between threads and processors?

Patricia

The only real way to handle that would be to ask the OS to keep track
of when it "moves" threads.

You can't do it *in* thread because you might have this happen:
Thread 1 start on CPU A: What CPU am I on? -> A
Thread 1 moves to CPU B: Do stuff important now
Thread 1 moves to CPU A: What CPU am I on? -> A

Basically, having a *thread* know about which CPU it "was" on is not
important, because its unreliable information. Have an outside
process give you the information is more useful, because that outside
process can do analysis.

On the other hand, the OS should know best on how to schedule
threads. Let your program's business be business logic.
 
N

Nigel Wade

Daniel said:
On the other hand, the OS should know best on how to schedule
threads.

That isn't always the case, though. Probably the most likely instance where this
might not be true is HPC, where it's very important to ensure that threads are
making the maximum use of available CPU horsepower. After all, there's little
point in expending lots of effort and money building yourself a high-power
cluster only to find that the OS isn't sharing the load amongst the nodes
correctly, or is wasting CPU cycles by migrating threads unnecessarily,
resulting in time-wasting cache migration. Moving threads to different CPUs may
seem like a good idea to the OS, but a competent programmer well versed in
parallel programming might be able to do better. An OS is likely to be a
compromise.
Let your program's business be business logic.

If your programs business is only business logic perhaps, but if your programs
business is maximising the compute potential of an HPC then the criteria are
different.
 
L

Lew

Nigel said:
That isn't always the case, though. Probably the most likely instance where this
might not be true is HPC, where it's very important to ensure that threads are
making the maximum use of available CPU horsepower. After all, there's little
point in expending lots of effort and money building yourself a high-power
cluster only to find that the OS isn't sharing the load amongst the nodes
correctly, or is wasting CPU cycles by migrating threads unnecessarily,
resulting in time-wasting cache migration. Moving threads to different CPUs may
seem like a good idea to the OS, but a competent programmer well versed in
parallel programming might be able to do better. An OS is likely to be a
compromise.


If your programs business is only business logic perhaps, but if your programs
business is maximising the compute potential of an HPC then the criteria are
different.

"HPC"?

Where I work they process up to millions of documents per week during peak
periods, with a maximum size including XML this and compressed attachment that
of around 2 GB each. (Typical sizes are much smaller, tens to hundreds of
megabytes.)

Some of the Web servers have many CPUs (32 each is not unusual). They use
external monitoring tools (such as those marketed by IBM) to track CPU usage.

As Nigel and others have pointed out, you have to get outside the application
in order to monitor what resources are used outside the application.
 
N

Nigel Wade

Lew said:

High Performance Computing - which typically (but not always) means clusters of
cheap PCs managed by crusty post-grads. They require a lot of hands-on
management to keep the nodes running, and a specialist OS to handle
process/thread migration to load balance properly. Tuning code to run
efficiently across the nodes, and maximize throughput, is non-trivial.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top