System.nanoTime and multiple cpus/cores

T

transpendence

I've tried to use System.nanoTime to make precise measures of timing
intervalls. I't works great - but only as long as the program runs on
one cpu only. If there are multiple cpus/cores, the running thread seem
to switch between different cpus and each cpu seem to have a different
timer base the result of nanoTime is jumping forward and backward in
time, depending on which cpu the thread is currently running.

Is there a possibility to force threads to a single cpu directly in
java or to use another high-precession timer (I need ms resolution and
it should work on Windows too)?
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

(e-mail address removed) schreef:
I've tried to use System.nanoTime to make precise measures of timing
intervalls. I't works great - but only as long as the program runs on
one cpu only. If there are multiple cpus/cores, the running thread seem
to switch between different cpus and each cpu seem to have a different
timer base the result of nanoTime is jumping forward and backward in
time, depending on which cpu the thread is currently running.

Are you sure it is due to multithreading? There are these small
particles that are known to be able to jump back in time (have to do
some reading up on quantummechanics before posting stuff like this).

:)

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEGsLYe+7xMGD3itQRArujAJwLq+QEZkwnNb5r7tISeTE8hqxuggCcDDdd
tFZAXTV8d8cHFiIhlNCoF6w=
=eMGQ
-----END PGP SIGNATURE-----
 
L

lewmania942

I've tried to use System.nanoTime to make precise measures of timing
intervalls. I't works great - but only as long as the program runs on
one cpu only.

I recall an excellent thread (was it an article?) explaining why
you can't rely on System.nanoTime() to give ultra-precise results
when running on multi-cpu/cores... But can't find it back :(

You can't entirely reject the possibility of an error in the value
given back by System.nanoTime() either: if you search the
web you'll find at least one such bug (I think it was one
version of Windows that was at fault).

Is there a possibility to force threads to a single cpu directly in
java

Directly in Java no, even if years ago there have been talks
about this at Sun. They said that "maybe one day" you'd have
the ability to call a method like:

setCPUAffinity()

to define the "affinity mask". This would have solved your
problem but AFAIK is has never been implemented (and may
not even be possible to implement at the JVM level).

That said, maybe you can force the affinity mask at the OS level
if you *really* need it (I'd rather let the OS scheduler decide
how the cores/cpus are used).
 
L

lewmania942

If there are multiple cpus/cores, the running thread seem
Are you sure it is due to multithreading? There are these small
particles that are known to be able to jump back in time (have to do
some reading up on quantummechanics before posting stuff like this).

:)

:)

But... Don't you think his explanation may be *exactly* what
is going on? To me nanoTime() is not very precise when
running with several cores/cpus, so the OP's explanation doesn't
seem far-fetched at all (but I may be wrong).

Heck, even the famous assembly "rdtsc" instruction (mentionned
on Roedy's site btw) could only be used to measure timing
accurately if and only if the pipeline was flushed, to prevent
out-of-order instructions execution. This required hacks and...
serious performance drops (flushing the pipeline could be done
by using cpuid).

That said, I can't wait to have Java 9874 which implements
System.picoTime(): this time it *really* is accurate... Then two
months later Intel starts selling the new virtual-multi-transparent
-woozing-buzz-architreadhed-cored-processor and picoTime()
isn't really that precise anymore. Repeat ad nauseam.

:)

Not that I personnally need sub-nanosecond precision timer
or anything ;)
 
T

transpendence

I can force it to a single cpu via the windows task manager (the
problems are gone then) - but only after the program has started. And
it limits the whole process to a single cpu.

But it seems I've found a solution:

After some searching, I've found that System.nanoTime() uses
QueryPerformanceCounters() on Windows and that this function is known
to have problems on Athlon64 multicore systems. I've found a hint to
use /usepmtimer in win.ini. Don't know if it really always works, but
after I changed it, the problems are gone.
 
R

Roedy Green

I've tried to use System.nanoTime to make precise measures of timing
intervalls. I't works great - but only as long as the program runs on
one cpu only. If there are multiple cpus/cores, the running thread seem
to switch between different cpus and each cpu seem to have a different
timer base the result of nanoTime is jumping forward and backward in
time, depending on which cpu the thread is currently running.

that's a bug. Java is supposed to compensate for that.
 
P

Patricia Shanahan

Roedy said:
that's a bug. Java is supposed to compensate for that.

I agree its a bug, but I'm not sure Java can compensate for it. The JVM
does not necessarily know when a thread moves. The operating system does
know, and should be providing a consistent timer at the syscall, or
equivalent, level.

Patricia
 
R

Roedy Green

I agree its a bug, but I'm not sure Java can compensate for it. The JVM
does not necessarily know when a thread moves. The operating system does
know, and should be providing a consistent timer at the syscall, or
equivalent, level.

hmm. You would have to enqueue a request to a fixed timer thread.
That of course defeats the fine grain resolution.

Is there at least an integer index of CPU you could grab at the same
time as the RDTSC? Intels have a serial number, which can be
disabled. AMDs don't.
 
L

lewmania942

hmm. You would have to enqueue a request to a fixed timer thread.
That of course defeats the fine grain resolution.

Indeed.

Do you know how I can find (out of curiosity) how nanoTime() is
implemented in Java 1.5 (and/or 1.6) ? (Say under Windows XP
and under Linux).

RDTSC is flawed anyway... As I wrote in another post in this thread,
to have a "real" fine-grained RDTSC you have to flush the pipeline
before using the instruction, which in itself kind of defeats the
purpose.

Moreover with all the CPU that throttle their speed (such as many
Notebook CPUs) RDTSC is basically useless.

And apparently on some hyper-threading systems, methods like
Window's QueryPerformanceCounter sometimes falls back to
RDTSC...

This is giving headaches to many game programmers ;)
 
L

lewmania942

Hi Patricia,

Patricia Shanahan wrote:
....
I agree its a bug, but I'm not sure Java can compensate for it. The JVM
does not necessarily know when a thread moves. The operating system does
know, and should be providing a consistent timer at the syscall, or
equivalent, level.

but apparently doesn't provide it.

:(

I found back a thread from 2003 on an Intel forum... (I'm pretty sure
the hundreds-of-mega-bytes of patch Windows has had since that time
didn't fix that problem and the situation on Linux OSes doesn't seem
any better ;)

http://softwareforums.intel.com/ids/board/message?board.id=42&message.id=155

"You could set up the OS to support high precision
"virtual timers or virtual TSC's (it's fairly trivial)
"but it's not currently there in any OS

Now I'm all ears: if someone can show me how to cleanly have a
Java high precision timer on a multi-cored-multi-cpu-hyper-
threaded-(insert latest CPU feature)-system providing nanosecond
(or sub-nanosecond) accuracy without side effect (for example
without flushing any pipeline), I'll read very carefully. It has to
work on Intel, AMDs, and all the others and also, of course, on
various OSes.

Until then, I'll code my Java apps without relying on System.nanoTime()
giving very meaningfull values (ie: without hoping it'll really provide
a high-precision timer on the various architectures the JVM run on)

:)
 
C

Chris Uppal

Do you know how I can find (out of curiosity) how nanoTime() is
implemented in Java 1.5 (and/or 1.6) ? (Say under Windows XP
and under Linux).

You could look at the source. I recently did, and posted the following in
another thread:

http://groups.google.com/group/comp...8d1d4c0006079458?rnum=21#doc_b9ce35b9f7bc0e69

(That's from the 1.5 source).

Moreover with all the CPU that throttle their speed (such as many
Notebook CPUs) RDTSC is basically useless.
Yup.


And apparently on some hyper-threading systems, methods like
Window's QueryPerformanceCounter sometimes falls back to
RDTSC...

Do you have a link/reference for that ?

-- chris
 
M

Mark Thornton

Chris said:
Do you have a link/reference for that ?

The implementation of QueryPerformanceCounter seems to be in the part of
the kernel that differs between single processor or multi processor
implementations. In my experience on multiprocessors it is always
implemented by the RDTSC instruction. On single processors QPC is
implemented via the timer counter.

Mark Thornton
 
L

lewmania942

Hi Chris,

Chris Uppal wrote:
....
Do you have a link/reference for that ?

sadly I've no handy link... But I recall reading this from more than
one place and even seeing a nice little program somehow "prooving"
this. Googling and browsing endless threads in obscure forums should
eventually lead to some interesting infos on the subject, but it seems
I can't find it that easily :( (I found some other stuff
though)

That said, Patricia was right (as usual), when she said that it's the
OS who should be providing a consistent timer.

And I wasn't entirely correct when I said that none of the OSes do
this today...

The "new way" of doing in in modern processors is apparently
called HPEC (working both on newer Intel and AMDs) :

http://www.intel.com/hardwaredesign/hpetspec.htm

Wikipedia is not very lengthy on the subject:

http://en.wikipedia.org/wiki/High_Precision_Event_Timer

It is not implemented yet in any Windows version (apparently
Dell even disables it in some BIOS that otherwise would provide
the functionality, on the basis that no desktop Windows use it
yet).

But... It is already working on some other systems. For example
some Linux kernel (if I read correctly) now have a gettimeofday()
that use the underlying "this-time-really-high-precision-and-
consistent-amongst-threads-and-cpus-we-promise-you" HPET
timer (now that's redundant, as the 'T' is for "timer" ;)

So it may be possible that some people using System.nanoTime()
are already benefiting from this new high precision event timer
afterall.

Regarding my previous question "how to know how
System.nanoTime() is implemented?", I somehow
expected that the answer was "RTFS" (Read The Fine
Source), but, concretely, how do I do this?

Do I have access to all the native code too ? (and if
I want to see how some JNI method is done on
Windows but I've got a Linux JDK, does it mean I've
got to download a Windows JDK ?)

Thanks and talk to you all very soon
 
L

lewmania942

Hi Roedy,

Roedy Green wrote:
....
Is there at least an integer index of CPU you could grab at the same
time as the RDTSC? Intels have a serial number, which can be
disabled. AMDs don't.

AMD recently introduced the RDTSCP: Read Serialized TSC Pair.
(no idea how it works).

Here's a link to how to "hack around" the tricky TSC (when
no HPET is available on the underlying hardware) :

http://lkml.org/lkml/2005/11/4/173
 
C

Chris Uppal

Mark said:
The implementation of QueryPerformanceCounter seems to be in the part of
the kernel that differs between single processor or multi processor
implementations. In my experience on multiprocessors it is always
implemented by the RDTSC instruction. On single processors QPC is
implemented via the timer counter.

Interesting, thanks.

I wonder what they'll do with mutl-core laptops...

-- chris
 
C

Chris Uppal

The "new way" of doing in in modern processors is apparently
called HPEC (working both on newer Intel and AMDs) :

http://www.intel.com/hardwaredesign/hpetspec.htm
Thanks.


Regarding my previous question "how to know how
System.nanoTime() is implemented?", I somehow
expected that the answer was "RTFS" (Read The Fine
Source), but, concretely, how do I do this?

Do I have access to all the native code too ? (and if
I want to see how some JNI method is done on
Windows but I've got a Linux JDK, does it mean I've
got to download a Windows JDK ?)

You can download the entire platform source from the normal download page:

http://java.sun.com/j2se/1.5.0/download.jsp

Even more than normal, check the license /very/ carefully before accepting it.
It is /not/ the same licence as the JDK or JRE. In fact, there are two
licences you can opt for, one of which is entirely abominable, the other of
which might be acceptable.

That contains (afaik) the entire source for Windows, Linux, and Solaris builds,
including the C++ source for the JVM and the native methods. Pretty big[*]
and, though it's not badly structured, it may take you a while to learn your
way around it. It helps if you are reasonably familiar with JNI.

([*] around 200 meg, nearly 20K files.)

You'll find (if you do accept the licence) the C++ method
os::elapsed_counter(), which is where the nano timer gets its data, defined in
several files (according to OS). The Windows implementation is in:
<root>/hotspot/src/os/win32/vm/os_win32.cpp

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,054
Latest member
LucyCarper

Latest Threads

Top