64 bit linux on VM to run Java app

D

David Segall

Roedy Green said:
One that might boggle the mind is my old LGP-30 computer that had 0
RAM, but a rotating magnetic drum.

It had one 32-bit register.
Please, let's not do the computer equivalent of "The Four Yorkshire
Men" <
> again.

Since I have recently complained about irrelevant YouTube links I
should add that the link is to a Monty Python skit about how happy but
poor the men were in the good old days.
 
T

Tom Anderson

Please have a second look at the entry. I have updated it based on
your feedback.

"If you added sufficient RAM to your 64-bit system to compensate, it would
run faster mainly because the newer 64-bit architectures have 8 times as
many high speed registers."

Whoaaa nelly. No.

Firstly, about the number of registers. The x86-64 architecture has more
registers than the x86. But architectures don't execute code - processors
do. And for a long time now, we've had a thing called register renaming,
which goes along with out-of-order and superscalar execution, which means
that the number of registers on the processor, physical registers, can
exceed the number of architectural registers. The x86-32 architecture may
only have 8 registers, but, for instance, the Pentium 4 Netburst design
has 128 physical registers.

I don't know if there's a correlation between the number of architectural
registers, and the number of physical registers you can use effectively.
If there is, then a register-rich 64-bit architecture could have even more
physical registers, and your point might still be valid.

However, there's another point. You're bang on about 64-bit machines
having bigger pointers (and possibly integers), and so data structures
using more memory. But this doesn't just affect the amount of memory a
structure uses, it affects the bandwidth it takes to move it in from
memory, and the amount of space it takes up in the cache. If you're
walking a linked list, for instance, then you need to load one pointer per
link; with an 8 GB/s memory interface, that's 2 billion links per second
on a 32-bit machine, but only 1 billion on a 64-bit. Similarly, if a link
object comprises a two-word object header and two one-word pointer fields,
then 4 MB of cache will hold a quarter of a million links on a 32-bit
machine, but only 128 thousand on a 64-bit.

So, if you're adding things to your machine to compensate for the
64-bitness, it needs to be not just RAM, but cache and bus bandwidth. I
don't believe that either of those things are limited by the width of a
register - 32-bit machines routinely have >32-bit buses already, and cache
size is limited by the size of the processor die and the process
technology - so i don't think 64-bit machines have any intrinsic advantage
here.

tom
 
L

Lew

Tom said:
However, there's another point. You're bang on about 64-bit machines
having bigger pointers (and possibly integers), and so data structures
using more memory. But this doesn't just affect the amount of memory a
structure uses, it affects the bandwidth it takes to move it in from
memory, and the amount of space it takes up in the cache. If you're
walking a linked list, for instance, then you need to load one pointer per
link; with an 8 GB/s memory interface, that's 2 billion links per second
on a 32-bit machine, but only 1 billion on a 64-bit. Similarly, if a link
object comprises a two-word object header and two one-word pointer fields,
then 4 MB of cache will hold a quarter of a million links on a 32-bit
machine, but only 128 thousand on a 64-bit.

Mitigating that factor, one 64-bit quantity might hold that two-word
object header, and thus not inflate the memory requirement for that
piece.

It's all accordin'.
 
A

Arne Vajhøj

Roedy said:
#[I am ignoring the 1024 versus 1000 issue]

Is that statement unclear ????

I felt the discrepancy needed clarification.

I run into this all the time. Whenever A says something that agrees
with what B says, restates what B says, or elaborates what B says, or
asks for confirmation of an interpretation of what B said, then B
complains, as if the general assumption is that every statement in a
newsgroup must necessarily be a refutation of some previous poster.

I find this amusing given the ostensible purpose of debate is to
persuade others to agree with you.

I do not have a problem with clarifications.

But I find it rather confusing when you use "I presume ..."
and then you list something that is explicit in the text
you are replying to.

If you use "To clarify ..." then people will be less confused.

Arne
 
A

Arne Vajhøj

Tom said:
"If you added sufficient RAM to your 64-bit system to compensate, it
would run faster mainly because the newer 64-bit architectures have 8
times as many high speed registers."

Whoaaa nelly. No.

Firstly, about the number of registers. The x86-64 architecture has more
registers than the x86. But architectures don't execute code -
processors do. And for a long time now, we've had a thing called
register renaming, which goes along with out-of-order and superscalar
execution, which means that the number of registers on the processor,
physical registers, can exceed the number of architectural registers.
The x86-32 architecture may only have 8 registers, but, for instance,
the Pentium 4 Netburst design has 128 physical registers.

It does not matter (for the point under discussion).

The compiler (JIT if Java) can only use the architectural registers. And
the CPU can only map architectural registers to physical registers - it
can not memory addresses to physical registers.
However, there's another point. You're bang on about 64-bit machines
having bigger pointers (and possibly integers), and so data structures
using more memory. But this doesn't just affect the amount of memory a
structure uses, it affects the bandwidth it takes to move it in from
memory, and the amount of space it takes up in the cache. If you're
walking a linked list, for instance, then you need to load one pointer
per link; with an 8 GB/s memory interface, that's 2 billion links per
second on a 32-bit machine, but only 1 billion on a 64-bit. Similarly,
if a link object comprises a two-word object header and two one-word
pointer fields, then 4 MB of cache will hold a quarter of a million
links on a 32-bit machine, but only 128 thousand on a 64-bit.

So, if you're adding things to your machine to compensate for the
64-bitness, it needs to be not just RAM, but cache and bus bandwidth. I
don't believe that either of those things are limited by the width of a
register - 32-bit machines routinely have >32-bit buses already, and
cache size is limited by the size of the processor die and the process
technology - so i don't think 64-bit machines have any intrinsic
advantage here.

They do have a small one.

http://www.digit-life.com/articles2/cpu/insidespeccpu2000-part-m.html

measured it to 1.6% to 3.8% for SPECint2000 benchmark.

Arne
 
A

Arne Vajhøj

Arne said:
It does not matter (for the point under discussion).

The compiler (JIT if Java) can only use the architectural registers. And
the CPU can only map architectural registers to physical registers - it
can not memory addresses to physical registers.


They do have a small one.

http://www.digit-life.com/articles2/cpu/insidespeccpu2000-part-m.html

measured it to 1.6% to 3.8% for SPECint2000 benchmark.

The floating point shows much bigger advantages for 64 bit, because
that is not only more registers but are getting rid of the stack model
for FP registers.

Arne
 
A

Arne Vajhøj

Roedy said:
Please have a second look at the entry. I have updated it based on
your feedback.

Looks much better.

One note though. If you are not programming but playing games,
then you may have a huge graphics card in the PC.

32 bit versions of desktop Windows (XP & Vista) not
only has the 4 GB limit on virtual address space that
all 32 bit OS has - it also has a 4 GB limit on
physical addresses. And PCI and PCIe devices reserves
chunks of 256 MB physical address space. Which means
that on 32 bit XP you can only use 4 GB RAM - RAM on
graphics card rounded up to multipla of 256 MB - a few
other smaller things.

So your 4+ GB most definitely includes 4 GB and I would
tend to write it as >3 GB to emphasize the point.

Arne
 
R

Roedy Green

But I find it rather confusing when you use "I presume ..."
and then you list something that is explicit in the text
you are replying to.

You were not explicit. You made only a vague reference saying you
were "ignoring" the issue. Besides it doesn't matter a rat's ass.
Why are you so hard up for credit?
 
R

Roedy Green

I don't know if there's a correlation between the number of architectural
registers, and the number of physical registers you can use effectively.
If there is, then a register-rich 64-bit architecture could have even more
physical registers, and your point might still be valid.

Registers used to be very expensive so older architectures tend to be
tight on registers. 64-bitness islogically independent of number of
registers. I have modified that sentence to make clear I am talking
only about the AMD architecture.

Seems to me some years ago I read about a 32-bit machine with an
astounding (at least astounding at the time) number of registers
organised as a sliding register window. This strikes me as the best
way to handle things to avoid spending most of your life
saving/restoring registers.

In the 80s I wrote a lot of assembler. I spent many hours tracing my
own code and other programs single stepping. I was horrified at the
ratio of housekeeping to application arithmetic, particularly call
overhead. The key to a faster new architecture will be to streamline
procedure calls.
 
R

Roedy Green

But this doesn't just affect the amount of memory a
structure uses

A Java object consists mostly of references to other objects, usually
Strings. Each of these is twice the size.

Even the ints could well be twice as fat. Often placing ints on an
even 64 bit boundary will speed access. So depending on how your run
time works, many of your ints could have 32 bits of padding too.

I don't know if the stacks go 64 bit internally. If so, they too would
be twice as fat.

The machine code itself is would be fluffier too. 8-bit later 16- bit
and 32 bit code was designed for compactness as the expense of
orthogonality. The extra registers in 64 bit will chew up more
selector space in the instructions.

What we need are some space benchmarks with some real world JVMs to
get an idea of just how much extra ram you need to run 64-bit.

I have a 64 bit capable machine. All I need is some energy to perform
the experiments.
 
R

Roedy Green

If you're
walking a linked list, for instance, then you need to load one pointer per
link; with an 8 GB/s memory interface, that's 2 billion links per second
on a 32-bit machine, but only 1 billion on a 64-bit

would not the memory fetch bus typically twice as wide on 64 bit, thus
giving you the name effective throughput in links per second?

There are so many variables, I feel silly trying to predict the net
effect.
 
R

Roedy Green

The compiler (JIT if Java) can only use the architectural registers. And
the CPU can only map architectural registers to physical registers - it
can not memory addresses to physical registers.

You meant to say "It can not MAP memory addresses to physical
registers" right?

Some old DEC architectures could do that.
 
T

Tom Anderson

Mitigating that factor, one 64-bit quantity might hold that two-word
object header, and thus not inflate the memory requirement for that
piece.

The two-word header i'm familiar with, from Bacon's work on inflatable
locks, consists of a pointer to the vtable and a pointer to a semaphore,
with some flags stored in the low bits. If you want the pointers to be
proper pointers, they'll have to be full words. However, cleverness here
is certainly possible
It's all accordin'.

Hyup.

tom
 
T

Tom Anderson

Seems to me some years ago I read about a 32-bit machine with an
astounding (at least astounding at the time) number of registers
organised as a sliding register window. This strikes me as the best way
to handle things to avoid spending most of your life saving/restoring
registers.

SPARC does this, i believe. It does seem like a very good idea. I think it
deals with running out of registers by raising an interrupt, and then
there are kernel-mode instructions which let the handler spill registers
to memory, and load them back later. All very clever.

tom
 
T

Tom Anderson

A Java object consists mostly of references to other objects, usually
Strings. Each of these is twice the size.

Even the ints could well be twice as fat. Often placing ints on an
even 64 bit boundary will speed access. So depending on how your run
time works, many of your ints could have 32 bits of padding too.

I don't know if the stacks go 64 bit internally. If so, they too would
be twice as fat.

The machine code itself is would be fluffier too. 8-bit later 16- bit
and 32 bit code was designed for compactness as the expense of
orthogonality.

Hmm. Not sure that's true. The PDP-11 instruction set is often held up as
a model of orthogonality, and that's 16-bit.

You're probably right about code getting bigger, though. That seems to be
a trend.
The extra registers in 64 bit will chew up more selector space in the
instructions.

Only a bit or two. That's the power of the exponential function!

Anyway, i don't know if these remarks were aimed at me - i was *agreeing*
that 64-bit uses more memory.

tom
 
T

Tom Anderson

would not the memory fetch bus typically twice as wide on 64 bit, thus
giving you the name effective throughput in links per second?

No. Bus width is not coupled to register width these days. I'm not sure if
there is an architectural limit to bus width - the latest graphics
chipsets have 512-bit buses connecting the GPU to the card's RAM. Thinking
about it, the width should be limited to the width of a cache line, since
that's the unit you do memory reads in. Although i suppose you could
always transfer multiple lines at once.

tom
 
P

Patricia Shanahan

Tom said:
SPARC does this, i believe. It does seem like a very good idea. I think
it deals with running out of registers by raising an interrupt, and then
there are kernel-mode instructions which let the handler spill registers
to memory, and load them back later. All very clever.

I've done performance work both with SPARC, and before that with some
Celerity systems that did something similar.

It has its good points, but also presents problems.

First, there is the physical problem of getting fast enough transfers
between the registers and the ALU. Generally, the bigger a memory the
slower transfers between the memory and a given place in the processor.
There is a lot to be said for making the register structure very small,
very fast, and very close to the ALU.

Second, consider what happens on register stack overflow/underflow,
especially with modern coding styles that tend to have many small
functions. By the time a register window needs to be saved or restored,
the system has no data about which registers are really in use, and
which contain junk. The result is that the whole block needs to be
transfered.

I used to hate the Ackermann function, because the code to evaluate it,
which appears in a lot of benchmarks, requires large, rapid changes in
call stack depth.

Patricia
 
A

Arne Vajhøj

Tom said:
SPARC does this, i believe. It does seem like a very good idea. I think
it deals with running out of registers by raising an interrupt, and then
there are kernel-mode instructions which let the handler spill registers
to memory, and load them back later. All very clever.

It sounds as a great idea.

But I doubt that it is in practice.

HW at that level is beyond me, but I draw my conclusions from
the fact that newer RISC architectures (PPC, Alpha and Itanium)
does not use register windowing (Itanium uses register windowing
which has the same idea though).

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,781
Messages
2,569,616
Members
45,306
Latest member
TeddyWeath

Latest Threads

Top