Multithreading / Scalability

blmblm · Feb 13, 2006

the DLLS may be loaded, but the executable contains only symbolic
links into the DLL, those all have to be resolved and patched.

Well, you did say, in the post to which I was responding:

blmblm · Feb 13, 2006

DLLs used to share code and memory. Now they just share code.
You will notice how java.exe is more spritely if you have recently
used it.

The name of the executable I use is java, not java.exe. (Said with
sort of a

, because from another of your posts it's apparent
that you do know, at least as well as I do, that there are many
operating systems.)

That is because its DLLs are still hanging around in RAM and
don't have to be reloaded.

But, but, didn't you just say DLLs just share code, not memory ....

blmblm · Feb 13, 2006

On Linux, the kernel only knows about "tasks". They can share none,
part, or all of their address space with one another. When they don't
share any address space, they look like traditional processes. When
they share all of their address space, they look like threads (even
though they each task has its own pid). But the kernel does not treat
them any differently, so it cannot be claimed that one is faster or
cheaper to create than the other.

It is this lack of distinction that previously confused many people
who did not understand why "ps" often showed so multiple Java
processes on Linux.[/QUOTE]

But something changed in a recent kernel -- 2.6 maybe? -- such
that "ps" doesn't show all those processes any more. ?

blmblm · Feb 13, 2006

(e-mail address removed) sez:

[ snip ]

Think of it as a graph: instead of a root (kernel) with
multiple leaf nodes (processes), you now have a root (kernel)
with multiple nodes where each node may be a leaf or a root
with multiple leafs (threads) underneath.

Ah. Well, it seems like most people using threads will have
a somewhat degenerate form of the second kind of tree -- a root
with one non-leaf descendant (process) which in turn has multiple

leaf children said:
See Gordon's reply: address space remain shared until you tell
the OS otherwise. IPC via shared memory and semphores existed
before threads and you can use pipes and signals betwen threads.
So in practice your choice of communication mechanism is
independent of whether you use threads or processes.

I think we're talking at cross-purposes here. I'll say again that
I'm speaking as someone whose experience with multiple threads
or processes is mostly from the parallel-programming world.
In that world, the well-known languages and libraries I can think
of divide pretty neatly into two camps -- shared-memory model
(threads sharing an address space and communicating/synchronizing
via various mechanisms) and distributed-memory model (processes
with separate address spaces communicating via message-passing).
Perhaps to someone with more of an o/s background the other
possibilities you mention are more common ....

It's more like system() -- it calls out to OS shell.

Oh right. But -- couldn't you use this to start a second copy of
the JVM, and thus accomplish the goal of starting a second Java
process?

[ snip ]

[ snip ]

[ snip ]

IIRC Sun's specs for that 24-way box say "48 threads". Dunno if
they're dual-core chips (one thread per core)...

Anyway, IBM's playstation chip -- the only multicore chip I know
of where "multi" > 2 -- does not have SMP cores, it's SIMD cores.
So threads are not going to work the same way on it, it'll need
a different programming model altogether.

Well, I'm speculating on what might be on the horizon, more than
talking about what's possible now. Right now 2 cores per chip
may be the max, but the way I hear it, the chip designers are
considering more. No, it doesn't seem like something that would
scale up to more than some smallish number of cores, but -- 4? 8?

(And .... the return of SIMD, if on a small scale? Interesting.)

Yeah, sure, but the devil is in the details. You need some way
to figure out how many threads to create: too few and you're
underutilizing your CPUs, too many and context switch times
eat up all the extra bang. A program that exploits 4 processors
won't run faster on 8-way box and will thrash a 2-way machine.

Yes, but it must be possible for programs (at least in some
environments) to detect the number of processors and start one
thread for each one -- that's how the OpenMP runtime library
works, if the "number of threads to create" environment variable
isn't set. That would seem like a sensible approach for other
multithreaded programs to take -- one thread per processor unless
the user specifies otherwise?

Gordon Beaton · Feb 13, 2006

But something changed in a recent kernel -- 2.6 maybe? -- such that
"ps" doesn't show all those processes any more. ?

There are some changes in 2.6 that allow ps to properly group them,
but AFAIK the underlying mechanism is essentially the same.

See the 5th or 6th question here:
http://procps.sourceforge.net/faq.html

/gordon

Roedy Green · Feb 13, 2006

But, but, didn't you just say DLLs just share code, not memory ....

In the old days there was no easy mechanism for each user of a dll to
have his own instance of the data. The DLL had to manage its pool of
users allocating each some workspace from its own pool. Today
separate instances in the default I believe each client of the DLL
addresses its data at the same virtual address. Presumably there are
features to share read-only or read-write areas of RAM too. Those are
common OS features.

I don't know Just how clever has Sun been about avoiding literally
loading all the standard class files, jitting them and optimising them
on every java.exe launch. I would hope they have some scheme to
prebuild the standard class set so that the digested classes looks to
the OS like read-only code, or a hunk of memory mapped data, or a hunk
or read-only memory mapped data or ...

Back in 1985 I invented a scheme called Gespenstering for capturing
ram images of a program in flight and turning them into a relocatable
executable snapshot. That saved a huge amount of complicated
initialisation, similar to loading class files, in my Forth/Abundance
language. I did this for DOS. Presumably you could do something
similar for Windows. I discuss this at
http://mindprod.com/projects/gespenster.html

Roedy Green · Feb 13, 2006

Well, I'm speculating on what might be on the horizon, more than
talking about what's possible now. Right now 2 cores per chip
may be the max, but the way I hear it, the chip designers are
considering more. No, it doesn't seem like something that would
scale up to more than some smallish number of cores, but -- 4? 8?

Obviously different parts of a CPU get exercised different amounts.
Some people's floating point units probably hardly ever get out for a
walk.

It seems logical you can get more bang per buck by bundling X fp
units, Y instruction decoders, Z adders, Y shifters, Z bytes of high
speed cache, etc in the combination that actually fits workload and
schedule/share them. You could run a monitor on your system to tell
you which kind of chip you should get that is heavy an A but light on
B. Logically the chip could have logically a full complement of
everything, but various parts of it are "down" and it is designed to
carry on bravely. You load up your chip real estate with as much as
will fit that suits your taste.

A cpu to just calculate primes is a different sort of animal from one
that does nothing but wait for i/o from single byte interrupts from
a zillion dumb devices.

Dimitri Maziuk · Feb 13, 2006

(e-mail address removed) sez:
....

I think we're talking at cross-purposes here. I'll say again that
I'm speaking as someone whose experience with multiple threads
or processes is mostly from the parallel-programming world.
In that world, the well-known languages and libraries I can think
of divide pretty neatly into two camps -- shared-memory model
(threads sharing an address space and communicating/synchronizing
via various mechanisms) and distributed-memory model (processes
with separate address spaces communicating via message-passing).
Perhaps to someone with more of an o/s background the other
possibilities you mention are more common ....

No, I was talking in principle. Of course everyone uses shared
memory for inter-thread communication: it's much easier. It may
be more common the other way around -- I've used shared memory
with IPC, but usually it's just pipes.

Oh right. But -- couldn't you use this to start a second copy of
the JVM, and thus accomplish the goal of starting a second Java
process?

I did that in one application because back when I wrote it
it was faster than having everything in one JVM. Normally
you wouldn't do it because of the startup time of the second
JVM, of course.

Yes, but it must be possible for programs (at least in some
environments) to detect the number of processors and start one
thread for each one -- that's how the OpenMP runtime library
works, if the "number of threads to create" environment variable
isn't set. That would seem like a sensible approach for other
multithreaded programs to take -- one thread per processor unless
the user specifies otherwise?

Yep, it's a guesstimate as good as any. (On a time-sharing
system you get the same thing as with memory overcommit:
at the time you do the detection all 4 of your CPUs may be
idle, by the time your threads start executing, 3 of the CPUs
are busy doing something else. And vice versa.)

Dima

MultiThreading	1	Sep 11, 2013
The distinction between a java applet and an application	1	Jan 4, 2023
Multithreading - Problem with notifyAll() and wait()	5	Oct 13, 2006
In C, the longest palindromic subsequence multithread exists	0	Nov 23, 2022
LiveConnect Applet Architecture Bug with Thread utilization (with SSCCE!)	0	Nov 20, 2010
CAS operations and scalability [...]	1	Sep 13, 2012
java thread question	9	Apr 3, 2014
CAS operations and scalability...	1	Aug 26, 2012

Multithreading / Scalability

blmblm

blmblm

blmblm

blmblm

Gordon Beaton

Roedy Green

Roedy Green

Dimitri Maziuk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads