object allocation performance question

J

Jimmy Zhang

Hi,

I was previous under the impression that allocating a single large block of
memory in Java
is faster (possibly much faster) than allocating lots of the smaller blocks
whose total storage space
equals that of the large block.
so 1 MB in a single allocation is faster than 10000 allocations of 100 byte
blocks.

Also the garbage collection performance goes way up as well?

Could you please comment on whether this is right, or in th ballpark?

I was reading some one's paper saying that object allocation time is linear
with the the size of the allocation,
which counters my intuition.

Thank,s
Jimmy
 
R

Roedy Green

I was previous under the impression that allocating a single large block of
memory in Java
is faster (possibly much faster) than allocating lots of the smaller blocks
whose total storage space
equals that of the large block.
so 1 MB in a single allocation is faster than 10000 allocations of 100 byte
blocks.

Allocating a large block can be problematic. There may be no hole big
enough to fit. You have to do a garbage collection or compaction to
make room.

Obviously there is more overhead allocating 10 small objects than one
big one given that room can be found without gc.
 
M

Mark Bottomley

Jimmy:

Multiple allocations will be slower as you go through the allocation
code that many more times, but the ratio may not be what you expect. There
is a linear component to allocations within Java in that it requires that
all uninitialized fields of an object be set to zero for primitives and Null
for references. This initialization means that a large allocation's timing
may be dominated by the clearing of memory. The performance ratio will be
set by the VM's implementation of memory allocation. If the VM sub-allocates
out of a large memory chunk, the timing may be faster, but if it allocates
from the OS's allocator, then there will be additional overhead from
acquiring a global lock for memory access.

Garbage collection is another area where the type of collector plays a
big role. All collectors will spend a lot of time either incrementally or
bulk scanning of the thread stacks and heaps to determine object liveness
(hence collectability). More objects - more time.

Java however does not let the user directly allocate memory any ways
except as an object or some variant of an array. The most common ways of
avoiding many allocations in systems that care about it are using object
pools to manage the memory yourself (assuming you know upper limits) and
implementing multidimensional arrays as larger linear arrays with
row/column/plane calculations performed by the software instead of the
internal walking of arrays of arrays of arrays of ... arrays of
primitives/objects.

The better answer to this would be for you to explain your problem and
why memory allocation is troublesome.

Mark...
 
R

Roedy Green

VMS provided a system service that allowed you to allocate a
region of virtual memory and, as an option, zero it at the
same time. I doubt if it was performed as a single
instruction since that's a pretty complicated service

To do it you would need special memory chips that zero themselves in
parallel on a signal.
 
M

Michael Amling

Roedy said:
Does any hardware architecture have a way of logically zeroing a page
with a single instruction?

System 370 does (MVCL). Also fresh entire pages of virtual memory are
regarded as zeroed before they're mapped to real memory. The MVCL
instruction is needed only when the page is first referenced.

--Mike Amling
 
S

Sudsy

Roedy Green wrote:
Does any hardware architecture have a way of logically zeroing a page
with a single instruction? Surely the OS must have to do this every
time a new page is allocated to ensure there is nothing left from the
previous owner.

On *NIX systems, man calloc. It clears the assigned memory for you.
 
R

Roedy Green

On *NIX systems, man calloc. It clears the assigned memory for you.

If the memory were in use by someone ELSE before, the O/S has no
choice but to clear the ram for you first. It can do it with a REP
STOSD (or the equivalent cx loop) in Intel to clear a block of words.
However, it seems to me there should be some to tell a memory chip to
clear itself or clear a plane without having to be told bit by bit.

Perhaps a memory controller could do it, and trap attempts to read the
memory before it were completely cleared, and clear it on an as-needed
basis.
 
X

xarax

Michael Amling said:
System 370 does (MVCL). Also fresh entire pages of virtual memory are
regarded as zeroed before they're mapped to real memory. The MVCL
instruction is needed only when the page is first referenced.

--Mike Amling

MVCL (Move Character Long) instruction on the System/370 instruction
set is an interruptible instruction. The Program Status Word (PSW)
which contains the address of the currently executing instruction
is not updated when MVCL is interrupted. Interruption can occur for
various reasons, like an I/O operation, or a page fault. When the
operating system resumes the interrupted thread, it loads the saved
PSW that is still pointing at the MVCL. The thread's registers
were updated by the interruption to show the current state of
the MVCL progress. The MVCL is resumed and it may again be
interrupted. Only when the next instruction that follows the MVCL
is executed will the application know that the MVCL has completely
cleared the storage.

So, the program that is using MVCL to clear a chunk of storage
may be executing that same instruction many times to complete
the operation. MVCL can operate on up to 16 megabytes, but it
is interruptible on a granularity of as little as 8 bytes at
a time.

MVCL is definitely NOT a single unit of operation, and an
application cannot consider ANY machine instruction to operate
as an atomic operation on an entire page of storage. Most modern
machines can perform atomic operations on small chunks, like 8
or 16 bytes at most.

The IBM MVS (Multiple Virtual Systems) operating system will
clear virtual pages to zero on first reference (which causes
a page fault). If an application doesn't request an exact
multiple of a page of virtual storage, then the obtained storage
is part of a page of a previous request (by the same application)
and it will not be cleared to zero. It's been this way since dirt
was invented, and MVS applications know that they must clear
partial page allocations. An application written in a higher
language, like C, would have its own set of library routines
that will handle this automatically (like calloc()).
 
S

Sudsy

Roedy said:
How can it not clear? Surely you are never allowed to peek at anyone
else's left over RAM.

Roedy,
It's clear that you've never written a VTAM application. Imagine
multiple tasks in a single address space...
Fortunately, CS (Compare and Swap) IS an atomic instruction. ;-)
 
R

Roedy Green

Roedy,
It's clear that you've never written a VTAM application. Imagine
multiple tasks in a single address space...

I can see allocating you uncleared ram that YOU used before, but not
that some unrelated user did.

"task" is not universally clearly defined. It could mean thread or it
could mean process.
 
X

xarax

Roedy Green said:
How can it not clear? Surely you are never allowed to peek at anyone
else's left over RAM.

In MVS, a "process" is called an "Address Space", and a "thread"
is called a "Task" (sometimes called a TCB which is the acronym
for the Task Control Block data structure for a task). There
are also "lightweight threads" that are called Service Request
Blocks (SRB), but those are used only by authorized or system-related
services. Anyway, any thread within a single process can usually
see all of the allocated memory within that process.

Memory is segregated into application-defined and system-defined
logical groups called "subpools". If two or more threads share
a subpool, then they can each allocate small chunks of memory that
can potentially reside within the same 4KB virtual page.

Suppose thread#1 allocates two 2KB chunks of memory that just
happen to get allocated in the same 4KB page. Thread#1 uses both
chunks, then later decides to free one of those chunks. The other
2KB chunk is still in use, so the page is not altered in any way
by the OS. Now comes thread#2 that allocates a 2KB chunk in the
SAME subpool. That chunk may come from the same page that thread#1
is using, and thread#2 will see the trash that was left over by
thread#1. That's why it is always good practice to clear newly
allocated storage that is not a page multiple in size. The OS
will always clear a newly allocated page to zero on its first
use (not necessarily when it's allocated -- rather when it's used).

It's not a security problem or integrity problem, because both
threads are sharing the same subpool, and supposedly the application
was designed that way. The threads must be created in such a way
that the can share the subpools and, of course, the threads must be
owned by the same process.

I've oversimplified the example to make my points. Be assured that
it is impossible in MVS for an unauthorized application to gain
access to data that is owned by another application.
 
R

Roedy Green

In MVS, a "process" is called an "Address Space", and a "thread"
is called a "Task" (sometimes called a TCB which is the acronym
for the Task Control Block data structure for a task).

Using that terminology, the OS is not obligated to clear ram for
requests for ram from the address space pool. However, whenever it
adds ram to that pool, it is.

If a 10 MB job runs, the OS will have to clear 10 MB of ram, which is
considerably less than the total amount of Ram allocated (then freed).
If it does some sort of memory mapping to load programs, it may not
even have to clear all 10 MB.
 
E

Eric Sosman

xarax said:
The OS only clears virtual pages on the first reference to the page.

When a process ends, all of its virtual pages are automatically freed
(the process doesn't have to explicitly free its pages, but it's a
good programming practice to do so). Thus, when the next process runs
and allocates 10MB that happens to coincide with the same virtual
address range of the prior process, there is no problem. Those pages
will again be "first use" pages and will be cleared to zero on the
first reference.

There seems to be some confusion about the word "virtual."
Saying that the OS "clears virtual pages" sounds like nonsense
to me: the term "virtual page" could only refer to a range of
virtual addresses, and since addresses hold no data they can't
be cleared at all. The OS clears the *physical* pages to which
a range of addresses is mapped.

The virtual confusion also pervades the description of how
an expired process' memory gets cleared when it's handed to a
new process. This does not and should not depend on a coincidence
of the virtual addresses at which the two processes map the page;
it has, in fact, nothing at all to do with virtual addresses.
The O/S clears the page to protect the first process' information
from the second process, and that information resides in the
physical page, no matter at what address range it happens to
occupy in either of the processes.

On another matter (and this, I confess, is more of an opinion
than hard-and-fast fact), I can't concur that it's "good programming
practice" for a process to pre-free its own pages. In the first
place, the means of doing so are not very portable -- yes, every
virtual memory O/S has *some* means of mapping and unmapping pages,
but the means differ from one O/S to another and tend not to be
perfect analogs. In the second place, the process cannot do a
thorough job of freeing its pages (unmapping your stack and your
instructions might be *one* way to terminate a process, but ...),
so the O/S is going to have to do at least some of the work in
any event. And finally, throwing away your pages runs a good
chance of interfering with other agencies that are "looking at"
your program, things like profilers and debuggers.
 
X

xarax

Eric Sosman said:
There seems to be some confusion about the word "virtual."
Saying that the OS "clears virtual pages" sounds like nonsense
to me: the term "virtual page" could only refer to a range of
virtual addresses, and since addresses hold no data they can't
be cleared at all. The OS clears the *physical* pages to which
a range of addresses is mapped.

Ordinary applications cannot see real memory. They only see
virtual memory. When a virtual address is referenced, the
hardware will attempt to translate the virtual address into
a real address. From the point of view of the application,
a virtual address is just an address. That's where its data
is located.

Real (physical) pages are cleared only when necessary. That is
on the first reference via a virtual address. The hardware
attempts to translate the virtual address and gets an indication
that the virtual address is not mapped to a real page. The OS
gets control and decides why it's not mapped.

(1) The virtual page has never been allocated by the current
process. This is an unrecoverable page or segment fault situation.
Usually the thread that was in control at the time of the fault
is terminated by the OS. If that thread is the main thread of
the process, then the process itself is terminated.

(2) The virtual page has been allocated by the current process,
and this is the first reference to the virtual page. A real page
is allocated by the OS and assigned to the virtual page. The OS
clears the real page to zero and resumes the thread the encountered
the page fault to redrive the failing instruction. The thread never
sees the fault.

(3) The virtual page has been allocated by the current process,
and has been paged-out due to memory requirements for the underlying
real page. The thread that caused the page fault is suspended while
the data for the virtual page is scheduled for page-in. When the
page-in completes, the data for the virtual page is assigned to
a real page (which may be different from the real page that last
held the data for that virtual page). The thread is resumed to
redrive the instruction that encountered the page fault.
The virtual confusion also pervades the description of how
an expired process' memory gets cleared when it's handed to a
new process. This does not and should not depend on a coincidence
of the virtual addresses at which the two processes map the page;
it has, in fact, nothing at all to do with virtual addresses.
The O/S clears the page to protect the first process' information
from the second process, and that information resides in the
physical page, no matter at what address range it happens to
occupy in either of the processes.

Yes, of course. That point may not have been clear in my earlier
post.
On another matter (and this, I confess, is more of an opinion
than hard-and-fast fact), I can't concur that it's "good programming
practice" for a process to pre-free its own pages. In the first
place, the means of doing so are not very portable -- yes, every
virtual memory O/S has *some* means of mapping and unmapping pages,
but the means differ from one O/S to another and tend not to be
perfect analogs.

Of course there are portable means. The C language has free(),
which corresponds with calloc() and malloc(). If the application
performs a malloc(), then it should also perform a free() when
it's possible. A normal end of the application should always
free its allocated resources, rather than depending on the OS
to do it. That's good programming practice. In the case of an
abnormal exit, then it's probably alright to punt and let the
OS handle it.
In the second place, the process cannot do a
thorough job of freeing its pages (unmapping your stack and your
instructions might be *one* way to terminate a process, but ...),
so the O/S is going to have to do at least some of the work in
any event.

If the OS is providing stackframes, then OS will deal with them.
If the application is managing its own stackframes, then the
application can and should manage them, including freeing them
when the application ends.
And finally, throwing away your pages runs a good
chance of interfering with other agencies that are "looking at"
your program, things like profilers and debuggers.

That's really not the application's concern. If the application
has "hooks" built into it for preserving state information, then
so be it. Otherwise, design and implement the application according
to good programming practices by tracking and managing resources
in a responsible manner. If some other "agency", like a profiler
or a debugger, wants to poke around inside an application's data
areas, then it's the responsibility of that agency to avoid
contaminating the application. It would be near impossible for
an application to predict, detect, or interact with a potential
external agency, like a profiler or debugger, and just plain
fruitless to try to predict the future in that regard.
 
S

Stephen Kellett

xarax said:
Ordinary applications cannot see real memory. They only see
virtual memory. When a virtual address is referenced, the
hardware will attempt to translate the virtual address into
a real address. From the point of view of the application,
a virtual address is just an address. That's where its data
is located.

I guess now is as good a time as any to mention that if you are running
Windows NT or better, there is a free tool for visualizing the 4KB
virtual memory pages of any application.

http://www.softwareverify.com

Go to the downloads section and download VM Validator.

Stephen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
DewittMill
Top