If Perl is compiled on a 32-bit system, and the system is upgraded to64-bit...

D

David Filmer

If Perl is compiled on a 32-bit system, and the system is later
upgraded to 64-bit hardware and O/S, would Perl programs then be able
to use the full amount of memory that a 64-bit system would allow?

Or would I need to re-compile Perl in the 64-bit environment to access
the larger memory?

Thanks!
 
T

Ted Zlatanov

BM> (Think about it for a minute. The 32bit perl was, by definition,
BM> compiled with 32bit pointers. Thus, it cannot address more than 32 bits'
BM> worth of memory, regardless of what the OS can address or what is
BM> physically present in the machine.)

It's not the case here, but it's not generally true (as you imply) that
just because pointers are N-bit you are limited to 2^N memory. You can
definitely address more than the pointer size will allow in a segmented
memory model, just not necessarily all at once.

Ted
 
I

Ilya Zakharevich

.... Then your pointers are wider than 32bit. Some bits of them may
just be inferred from context.

E.g., Generally speaking, on "well-designed" 32-bit architecture, one
would be able to address 8GB of memory (4GB of data, and 4GB of
code). Unfortunately, too many people fell under JfN spell of "code
is data"; the most pronounced problems of the computing of today come
from this incest. [*]
You can even access it all at once, if your pointer data structures
contain both a segment identifier and an address.

If the combined size is 32bit, then you cannot access more than 4GB.
Theory and history aside though, I don't think perl uses far pointers,
even on a system that supports them. Nor can I think of any such system
that's still in use.

32bit Linux uses far (read: 32bit) pointers. ;-) Basically, AFAIU, on
most architectures of today paging takes the role of segments.

Yours,
Ilya

[*] AFAIK, Solaris (tries to?) separate code from data AMAP. On
Solaris/i386, are they in different segments?
 
I

Ilya Zakharevich

...but then your pointers are more than (32|16) bits wide. A DOS/Win16
__far pointer is a perfectly respectable 32bit pointer.

This depends on your GDT/LDT. E.g., on OS/2 LDT is organized so that
8K of 64KB segments cover the first 512MB of virtual memory in a 4GB
segment (0x053, IIRC). So the same 32-bit-wide bitmaps can be used as
far 16-bit or near 32-bit pointers - which leads to very simple thunks
between 16-bit and 32-bit code.

Yours,
Ilya
 
P

Peter J. Holzer

... Then your pointers are wider than 32bit. Some bits of them may
just be inferred from context.

E.g., Generally speaking, on "well-designed" 32-bit architecture, one
would be able to address 8GB of memory (4GB of data, and 4GB of
code).

And we could split off the stack into a different segment, too and then
address 12 GB of memory. But while such a separation has security
advantages, it doesn't seem worthwhile for space reasons: The stack is
usually tiny, and the code also much, much smaller than 4GB - so in
effect you can only address 4 GB + some small amount. It was different
on 16-bit systems: Whether code and data had to share 64kB or each could
be up to 64kB made a real difference (and sometimes I really wanted a
separate stack, but I didn't have that).

Unfortunately, too many people fell under JfN spell of "code
is data"; the most pronounced problems of the computing of today come
from this incest. [*]

I agree with that but I think that per-page protections (rw/nx bits) are
a better solution to the problem than separate address spaces.

If the combined size is 32bit, then you cannot access more than 4GB.


32bit Linux uses far (read: 32bit) pointers. ;-)

I see the smiley but I'd like to clarify for our young readers that
32bit Linux uses near pointers. On the 386, a far pointer would be 48
bits.
Basically, AFAIU, on
most architectures of today paging takes the role of segments.
Yup.

[*] AFAIK, Solaris (tries to?) separate code from data AMAP. On
Solaris/i386, are they in different segments?

I don't think so. Sun likes Java. Java uses JIT compilers. JIT compilers
and separated address spaces for code and data don't mesh well.

hp
 
I

Ilya Zakharevich

And we could split off the stack into a different segment, too and then
address 12 GB of memory.

Not with C. The same subroutine should accept stack data pointers and
heap data pointers.
the code also much, much smaller than 4GB

This is not what I experience with my system (which has less than 4GB
memory, though). The monsters like Mozilla take more text memory that
data memory (unless one loads a LOT of HTML into the browser). (This
might be related to 64KB granularity of system allocator in OS/2 [in
units of virtual memory, not real memory] - e.g, you order 1000 small
chunks of shared memory, and you use 64MB of virtual space.) And many
things related to DLLs are loaded into a code-like memory ("shared
address space region").

So maybe I'm wrong: the usage of "shared address space region" may be
large, but not everything which is put there is code. But to be
absolutely sure, I must run some tools on a memory state dump...
I see the smiley but I'd like to clarify for our young readers that
32bit Linux uses near pointers. On the 386, a far pointer would be 48
bits.

.... but only if you round up to a multiple of 8bit; otherwise 46bit.
[*] AFAIK, Solaris (tries to?) separate code from data AMAP. On
Solaris/i386, are they in different segments?
I don't think so. Sun likes Java. Java uses JIT compilers. JIT compilers
and separated address spaces for code and data don't mesh well.

Do not see how this would be related. AFAI suspect (basing on the
sparse info I have seen), the only way to load code on solaris is to
write an executable module on disk, and dlopen() it.

Yours,
Ilya
 
P

Peter J. Holzer

Not with C.

I used to think so, but it's not quite true.
The same subroutine should accept stack data pointers and
heap data pointers.

Yes, but

* automatic variables don't have to be on the "hardware stack".
* only those variables which actually are accessed via pointer
need to be in a pointer-accessible space.

So a compiler could put stuff like

* return addresses
* non-array automatic variables which don't have their address taken
* function arguments and return values
* temporary variables

into the stack segment and automatic variables which do have their
address taken into the data segment.

Among other things this means that return addresses are not accessible
with a pointer and can't be overwritten by a buffer overflow. It also
means that the size of the stack segment will almost always be very
small (arrays will (almost) never be there) and that function call and
return are more expensive (you need to maintain a second "stack").
So I'm not sure whether the advantages outweigh the disadvantages.

But that's moot. I don't expect any new segmented architectures, and
existing ones are either obsolete or used in "flat" mode.
This is not what I experience with my system (which has less than 4GB
memory, though). The monsters like Mozilla take more text memory that
data memory (unless one loads a LOT of HTML into the browser).

Mozilla is a monster, but it still uses only about 40 MB of code memory,
which is about 1% of 4 GB:

% perl -e 'while (<>) { my ($b, $e, $p) = /^(\w+)-(\w+) (\S+)/; $s =
hex($e) - hex($b); $s{$p} += $s } for (keys %s) { printf "%s %9d\n",
$_, $s{$_} / 1024 }' /proc/18752/maps |sort -n -k 2
---p 64
rwxp 192
r--s 496
rw-s 768
r-xp 40656
r--p 98524
rw-p 279960

(this is firefox 3.6 running on 32bit linux for about a day)

So if you moved that code into a different segment, you could use 4GB
instead of 3.96GB for data. Doesn't seem like much of an improvement.
(especially if you consider that on most 32-bit OSs the address space is
limited to 2 or 3 GB anyway - lifting that limit would have a much
larger effect).

I see the smiley but I'd like to clarify for our young readers that
32bit Linux uses near pointers. On the 386, a far pointer would be 48
bits.

... but only if you round up to a multiple of 8bit; otherwise 46bit.
[*] AFAIK, Solaris (tries to?) separate code from data AMAP. On
Solaris/i386, are they in different segments?
I don't think so. Sun likes Java. Java uses JIT compilers. JIT compilers
and separated address spaces for code and data don't mesh well.

Do not see how this would be related.

A JIT compiler needs to generate executable code which is immediately
executed by the same process. This is hard to do if the JIT compiler
can't put the code into a place where it can be executed.
AFAI suspect (basing on the sparse info I have seen), the only way to
load code on solaris is to write an executable module on disk, and
dlopen() it.

That would absolutely kill the performance of a JIT compiler. If
Solaris/x86 uses separate code and data segments (which I doubt) then
there is probably some way (maybe with mmap) to map a region of memory
into both the data and the code segment. More likely they use a common
address space and just use mprotect to prevent execution of data which
isn't meant to be code.

hp
 
I

Ilya Zakharevich

So a compiler could put stuff like

* return addresses
* non-array automatic variables which don't have their address taken
* function arguments and return values
* temporary variables

into the stack segment and automatic variables which do have their
address taken into the data segment.
Among other things this means that return addresses are not accessible
with a pointer and can't be overwritten by a buffer overflow.

Cool argument!
Mozilla is a monster, but it still uses only about 40 MB of code memory,
which is about 1% of 4 GB:

I suspect your system has 4K virtual address space granularity. Mine
has 64K. (And it has an order of magnitude less memory.)

What is important is the ratio of data/text. In your case, it is
less than 10. (With more memory, you run more of OTHER monsters. ;-)
instead of 3.96GB for data. Doesn't seem like much of an improvement.
(especially if you consider that on most 32-bit OSs the address space is
limited to 2 or 3 GB anyway - lifting that limit would have a much
larger effect).

No, the effect would be the opposite: 40M/2G is LARGER than 40M/4G. ;-)
That would absolutely kill the performance of a JIT compiler. If
Solaris/x86 uses separate code and data segments (which I doubt) then
there is probably some way (maybe with mmap) to map a region of memory
into both the data and the code segment. More likely they use a common
address space and just use mprotect to prevent execution of data which
isn't meant to be code.

BTW, I looked, and OS/2 uses different segments for code and data.
(Although the paging data for them is almost identical, so they are
more or less aliases of each other.)

Yours,
Ilya
 
P

Peter J. Holzer

I suspect your system has 4K virtual address space granularity.
Yes.

Mine has 64K.

So that would increase the average internal fragmentation per code
region from 2 kB to 32 kB (half the granularity - of course that depends
on the size distribution but its good enough for a back of the envelope
calculation). On Linux Firefox maps 132 code regions into memory (the
GNOME people have a serious case of shared-libraryritis). So that's 132
* (32 kB - 2 kB) = 3960 kB or about 4 MB more. Noticable but probably
less than the effects of other differences between OS/2 and Linux.
What is important is the ratio of data/text.

No. What is important is the ratio between code and the usable address
space.
In your case, it is less than 10. (With more memory, you run more of
OTHER monsters. ;-)

Yes, but those other monsters get their own virtual address space, so
they don't matter in this discussion.

No, the effect would be the opposite: 40M/2G is LARGER than 40M/4G. ;-)

No, you misunderstood. If you now have an address space of 2 GB for
code+data, and you move the code to a different segment, you win 40MB
for data. But if the OS is changed to give each process a 4 GB address
space, then you win 2 GB, which is a lot more than 40 MB.

hp
 
I

Ilya Zakharevich

No. What is important is the ratio between code and the usable address
space.

I see (below) that we discuss different scenarios.
Yes, but those other monsters get their own virtual address space, so
they don't matter in this discussion.

They do on OS/2: the DLL's-related memory is loaded into shared
address region. (This way one does not need any "extra"
per-process-context patching or redirection of DLL address accesses.)
No, you misunderstood. If you now have an address space of 2 GB for
code+data, and you move the code to a different segment, you win 40MB
for data. But if the OS is changed to give each process a 4 GB address
space, then you win 2 GB, which is a lot more than 40 MB.

I do not see how one would lift this limit (without a segmented
architecture ;-). I expect that (at least) this would make context
switch majorly costlier...

Thanks,
Ilya
 
P

Peter J. Holzer

I see (below) that we discuss different scenarios.



They do on OS/2: the DLL's-related memory is loaded into shared
address region. (This way one does not need any "extra"
per-process-context patching or redirection of DLL address accesses.)

Sounds a bit like the pre-ELF shared library system in Linux. Of course
that was designed when 16 MB was a lot of RAM and abandoned when 128 MB
became normal for a server (but then I guess the same is true for OS/2).

I'd still be surprised if anybody ran an application mix on OS/2 where
the combined code size of all DLLs exceeds 1 GB. Heck, I'd be surprised
if anybody did it on Linux (with code I really mean code - many systems
put read-only data into the text segment of an executable, but you
couldn't move that to a different address space, so it doesn't count
here).

I do not see how one would lift this limit (without a segmented
architecture ;-).

If you can move code to a different segment you obviously have a
segmented architecture. But even without ...
I expect that (at least) this would make context switch majorly
costlier...

I don't see why the kernel should need a large address space in the same
context as the running process. When both the size of physical RAM and
the maximum VM of any process could realistically be expected to be much
smaller than 4GB, a fixed split between user space and kernel space
(traditionally 2GB + 2GB in Unix, but 3GB + 1GB in Linux) made some
sense: Within a system call, the kernel could access the complete
address space of the calling process and the complete RAM without
fiddling with page tables. But when physical RAM exceeded the the kernel
space that was no longer possible anyway, so there was no longer a
reason to reserve a huge part of the address space of each process for
the kernel. But of course making large changes for a factor of at most 2
doesn't make much sense in a world governed by Moore's law, and anybody
who needed the space moved to 64 bit systems anyway.

hp
 
I

Ilya Zakharevich

Sounds a bit like the pre-ELF shared library system in Linux.

No, there is a principal difference: on Linux (and most other flavors
of Unix), you never know whether your program would be "assembled"
(from shared modules) correctly or not: it is a russian roulette which
C symbol is resolved to which shared module (remember these Perl_ and
PL_ prefixes? They are the only workaround I know of). On OS/2, the
linking is done at link time; each symbol KNOWS to which DLL (and
which entry point inside the DLL) it must link.

So:

a DLL is compiled to the same assebler code as EXE (no indirection)

if a DLL is used from two different programs, its text pages would be
the same - provided all modules it links to are loaded at the same
addresses (no process-specific fixups).

- and it is what happens. DLL runs as quick as EXE, and there is no
overhead if it is reused.

[And, of course, a program loader living in user space is another
gift from people having no clue about security... As if an
executable stack was not enough ;-)]
Of course that was designed when 16 MB was a lot of RAM and
abandoned when 128 MB became normal for a server (but then I guess
the same is true for OS/2).

No, it was designed when 2M was a lot of RAM. ;-) On the other hand,
the architecture was designed by mainframe people, so they may have had
different experiences.
I'd still be surprised if anybody ran an application mix on OS/2 where
the combined code size of all DLLs exceeds 1 GB.

"Contemporary flavors" of OS/2 still run confortably in 64MB systems.
(Of course, there is no FireWire/Bluetooth support, but I do not
believe that they would add much - IIRC, the USB stack is coded with
pretty minimal overhead.)
the kernel. But of course making large changes for a factor of at most 2
doesn't make much sense in a world governed by Moore's law, and anybody
who needed the space moved to 64 bit systems anyway.

I do not believe in Moore's law (at least not in this context). Even
with today's prices on memory, DockStar has only 128MB of memory. 2
weeks ago it costed $24.99 on Amazon (not now, though!). I think in a
year or two we can start getting Linux stations in about $10 range.
In this pricerange, memory size matters.

So Moore's law works both ways; low-memory situation does not
magically go out of scope.

Yours,
Ilya
 
P

Peter J. Holzer

No, there is a principal difference: on Linux (and most other flavors
of Unix), you never know whether your program would be "assembled"

There are a lot more principal differences, some of which are even
slightly relevant to the current topic, which is address space usage.

No, it was designed when 2M was a lot of RAM. ;-) On the other hand,
the architecture was designed by mainframe people, so they may have had
different experiences.


"Contemporary flavors" of OS/2 still run confortably in 64MB systems.

And users of 64 MB systems load 1 GB of DLLs into their virtual memory?

Sorry, but that's just bullshit. The combined code size of all DLLs on a
64MB system is almost certainly a lot less than 64 MB, or the system
wouldn't be usable[1]. So by moving code from a general "code+data"
address space to a special code address space, you free up at most 64 MB
in the data address space - a whopping 3% of the 2GB you already have.

(Of course, there is no FireWire/Bluetooth support, but I do not
believe that they would add much - IIRC, the USB stack is coded with
pretty minimal overhead.)


I do not believe in Moore's law (at least not in this context). Even
with today's prices on memory, DockStar has only 128MB of memory.

The users of systems with 128 MB of memory don't benefit from a change
which increases the virtual address space from 2GB to 4GB. The only
people who benefit from such a change are those for whom 2GB is too
small and 4 GB is just enough. These are very likely to be the people
for whom next year 4 GB will be too small and the year after 8 GB will be
too small. An architectural change which just increases the usable
address space by a factor of two just isn't worth the effort, as the
people who need it will run into the new limit within a year or two. If
you make such a change you have to make it large enough to last for at
least a few years - a factor of 16 (8080 -> 8086, 8086 -> 80286, 80386
-> x86/PAE) seems to be just large enough to be viable.
So Moore's law works both ways; low-memory situation does not
magically go out of scope.

Nobody claimed that.

hp

[1] Yes, it's possible that you load a 1MB DLL just to use a single
function. But it's unlikely.
 
I

Ilya Zakharevich

Sorry, but that's just bullshit. The combined code size of all DLLs on a
64MB system is almost certainly a lot less than 64 MB, or the system
wouldn't be usable[1].

BS. Obviously, you never used OS/2 (in IBM variant - v2.0 or more;
versions up to 1.2 were done by MicroSoft, and were of a very
different "quality")... I think the first 24/7 system I used had 4MB
of memory, and, in my today's estimates, loaded more than 20MB of DLLs
(counting VIRTUAL MEMORY usage). It was quite usable - with "light
usage". When I got fluent enough to go to "heavy usage", it became
usable after upgrade to 8MB of physical memory.

What you forget about is that

a) there are well-designed systems - and effects of paging are quite
sensitive to design.

b) Virtual memory usage may, theoretically, be more than an order of
magnitude more than the resulting physical memory usage - if unit
of virtual memory (what an analogue of sbrk() increments) is
64KB, and of physical memory (page size) is 4KB.

Imaging that when you sbrk() for 2KB, you are returned 1 fully
accessible page of memory, but the next sbrk() would start at
64KB increment. (Yes, this is how my first implementation of
Perl's malloc() behaved on OS/2. ;-)

Well-designed malloc()s do not work this way. But a DLL
typically loads 2-3 segments; they take minimum 128-194KB of
shared memory region.
The users of systems with 128 MB of memory don't benefit from a change
which increases the virtual address space from 2GB to 4GB. The only
people who benefit from such a change are those for whom 2GB is too
small and 4 GB is just enough.

And, in the chunk of the future as I foresee it, there ALWAYS be a
computer's formfactor for which it is going to matter. You see that
today, Linuxes are used mostly with 128MB - 12GB of memory. I do not
foresee that in "close" future the lower range would float to be above 4GB.

Yours,
Ilya
 
P

Peter J. Holzer

Sorry, but that's just bullshit. The combined code size of all DLLs on a
64MB system is almost certainly a lot less than 64 MB, or the system
wouldn't be usable[1].

BS. Obviously, you never used OS/2 (in IBM variant - v2.0 or more;
versions up to 1.2 were done by MicroSoft, and were of a very
different "quality")...

True. I only used OS/2 1.x (around 1988/89).
I think the first 24/7 system I used had 4MB of memory, and, in my
today's estimates, loaded more than 20MB of DLLs (counting VIRTUAL
MEMORY usage).

I am sceptical - when 4 MB of memory was normal, 20 MB of code was a lot
- even if you allow for a huge overhead caused by 64 kB segment
granularity. But even if you are right and there would be the same level
of overcommitment on a 64 MB system (which I also doubt), then that
would still be only 320 MB - far away from the 2GB limit.

What you forget about is that [...]
Well-designed malloc()s do not work this way. But a DLL
typically loads 2-3 segments; they take minimum 128-194KB of
shared memory region.

2-3 segments of code or 1 segment of code, 1 segment of private data and
1 segment of shared data? Only code segment(s) are relevant here.

And, in the chunk of the future as I foresee it, there ALWAYS be a
computer's formfactor for which it is going to matter. You see that
today, Linuxes are used mostly with 128MB - 12GB of memory. I do not
foresee that in "close" future the lower range would float to be above 4GB.

You misunderstood me. I didn't say that the range below 4GB would
vanish. But people who need virtual memory sizes (not physical memory!)
up to 3 GB are perfectly fine with the current 32-bit scheme, and those
who need more than 4 GB need to move to 64-bit anyway. So the only
people who would benefit from a change of the 32-bit scheme are those
who need between 3 GB and 4 GB. This is IMNSHO a tiny minority, and it
will stay a tiny minority.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top