mmap caching

G

George Sakkis

I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?

George
 
N

Nick Craig-Wood

George Sakkis said:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?

I'm not familiar with mmap() on windows, but assuming it works the
same way as unix...

The point of mmap() is to map files into memory. It is completely up
to the OS to bring pages into memory for you to read / write to, and
completely up to the OS to get rid of them again.

What you would expect is that the file is demand paged into memory as
you access bits of it. These pages will remain in memory until the OS
feels some memory pressure when the pages will be written out if dirty
and then dropped.

The OS will try to keep hold of pages as long as possible just in case
you need them again. The pages dropped should be the least recently
used pages.

I wouldn't have expected a MemoryError though...

Did you do mmap.flush() after writing?
 
G

George Sakkis

Nick said:
I'm not familiar with mmap() on windows, but assuming it works the
same way as unix...

The point of mmap() is to map files into memory. It is completely up
to the OS to bring pages into memory for you to read / write to, and
completely up to the OS to get rid of them again.

What you would expect is that the file is demand paged into memory as
you access bits of it. These pages will remain in memory until the OS
feels some memory pressure when the pages will be written out if dirty
and then dropped.

The OS will try to keep hold of pages as long as possible just in case
you need them again. The pages dropped should be the least recently
used pages.

I wouldn't have expected a MemoryError though...

Did you do mmap.flush() after writing?

The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?

George
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

George said:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError.

You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.

Notice that there are several values that can be called "memory
usage" (such as the size of the committed address space, the working
set size, etc); you don't mention which of these values grows several
hundreds MB.

Regards,
Martin
 
G

George Sakkis

Martin said:
You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.

Sure; what I meant was that that whatever the limit is, it's high
enough that a MemoryError is raised before the limit is reached.
Notice that there are several values that can be called "memory
usage" (such as the size of the committed address space, the working
set size, etc); you don't mention which of these values grows several
hundreds MB.

It's the one in the 'Processes' tab of the Windows task manager (XP
proffesional). By the way, I ran the same program on a box with more
physical memory and the mem. usage stops growing at around 430MB, by
which time the whole file is most likely cached. I'd be interested in
any suggestions other than "buy more RAM" :) (these are not my machines
anyway).

Thanks,
George
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

George said:
Sure; what I meant was that that whatever the limit is, it's high
enough that a MemoryError is raised before the limit is reached.

The operating system will absolutely, definitely, certainly release
any cached data it can purge before reporting it is out of memory.

So if you get a MemoryError, it is *not* because the operating system
has cached too much data.

In fact, memory that is read in because of mmap should *never* cause
a MemoryError. Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
It's the one in the 'Processes' tab of the Windows task manager (XP
proffesional). By the way, I ran the same program on a box with more
physical memory and the mem. usage stops growing at around 430MB, by
which time the whole file is most likely cached. I'd be interested in
any suggestions other than "buy more RAM" :) (these are not my machines
anyway).

As a starting point, try understanding better what is really happening.
Turn on "Virtual Memory Size" in "View/Select Columns" also, and perhaps
a few additional counters as well. Also take a look at the "Commit
Charge", which takes into account swap file usage as well. Try
increasing the size of the swap file.

Regards,
Martin
 
N

Nick Craig-Wood

Martin v. Löwis said:
In fact, memory that is read in because of mmap should *never* cause
a MemoryError. Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.

So presumably it is python generating a MemoryError. It is asking for
a new bit of memory and it is failing so it throws a MemoryError.

Could memory allocation under windows be affected by a large chunk of
mmap()ed file which is physically swapped in at the time of the
allocation?
 
N

Nick Craig-Wood

George Sakkis said:
The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?

The above behaviour isn't as expected. So either there is something
going on in your program that we don't know about or there is a bug
somewhere, either in the OS or in python.

Can you make a short program to replicate the problem? That will help
narrow down the problem.
 
D

Dennis Lee Bieber

The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my

How large is said file? While the OS should handle swapping pages as
needed, you do have to recall that those pages are /mapped/ into the
process virtual address space. Trying to mmap a 2GB file into a process
that is already using 1GB of memory may not work (what is the default
Windows split? 2GB process and 2GB shared OS?)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
G

George Sakkis

Dennis said:
How large is said file? While the OS should handle swapping pages as
needed, you do have to recall that those pages are /mapped/ into the
process virtual address space. Trying to mmap a 2GB file into a process
that is already using 1GB of memory may not work (what is the default
Windows split? 2GB process and 2GB shared OS?)

It's around 400MB. As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less. Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping. That's an issue with the specific
program/algorithm rather with Python or the OS.

George
 
L

Laszlo Nagy

In fact, memory that is read in because of mmap should *never* cause
a MemoryError.
This is certainly not true. You can run out of virtual address space by
reading data from a memory mapped file.
Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
Hmm, maybe I'm wrong. Are you sure that Windows allocates the size of
the whole file in terms of memory address space? I also wrote a program
before (in Delphi). That program was playing a memory mapped wave file.
From the task manager, I have seen that "used memory" was growing as
the program was playing the wave file. For me, this indicates that
Windows extends the mapped address space in chunks.

Regards,

Laszlo
 
L

Laszlo Nagy

It's around 400MB. As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less. Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping. That's an issue with the specific
program/algorithm rather with Python or the OS.
Well, if the same program runs when you have 1GB physical memory then
probably the problem is not that you ran out of virtual address space.
It would help to provide the related code from your program.

Laszlo
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Laszlo said:
This is certainly not true. You can run out of virtual address space by
reading data from a memory mapped file.

That is true, but not what I said. I said you cannot run out of memory
*while reading it*. You can only run out of virtual address space when
you invoke mmap.mmap itself (and when the application later tries to
allocate more virtual address space through VirtualAlloc).
Hmm, maybe I'm wrong. Are you sure that Windows allocates the size of
the whole file in terms of memory address space?

Yes, I am. See MapViewOfFile, at

http://msdn2.microsoft.com/en-us/library/aa366761.aspx

"Mapping a file makes the specified portion of a file visible in the
address space of the calling process."

Notice allocating address space doesn't consume much memory (it
consumes a little memory for the page tables).
I also wrote a program
before (in Delphi). That program was playing a memory mapped wave file.
From the task manager, I have seen that "used memory" was growing as the
program was playing the wave file. For me, this indicates that Windows
extends the mapped address space in chunks.

You are misinterpreting the data. I'm not sure what precisely
"used memory" is, most likely it is the working set of the process, i.e.
the amount the number of physical pages that are allocated for the
process. That is typically much smaller than the address space, since
many pages will be paged out (or not yet read in at all).

You need to display the virtual address space in the task manager
to determine how much address space the application is using.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Nick said:
So presumably it is python generating a MemoryError. It is asking for
a new bit of memory and it is failing so it throws a MemoryError.

Could memory allocation under windows be affected by a large chunk of
mmap()ed file which is physically swapped in at the time of the
allocation?

To my knowledge, no. There might be virtual memory quotas, but I don't
think Windows supports such a concept.

More likely, this is entirely unrelated to the mmap issue. I would
guess that the machine on which the problem occurs is close to
exhausting its swap file (because of other activities in the system),
so Python occasionally manages to exhaust the swap file, through
regular allocations (memory-mapped files don't contribute to
swap file usage, as they have their own disk-backing, namely in the
file being mapped).

Regards,
Martin
 
R

Ross Ridge

George said:
It's around 400MB.

On Windows you may not be able to map a file of this size into memory
because of virtual address space fragmentation. A Win32 process has
only 2G of virtual address space, and DLLs tend to get scattered
through out that address space.
As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less.

Virtual address space fragmentation isn't affected by the amount of
physical memory in your system. A system with 64MB of RAM might be
able to map a 400MB file while system with 3G of RAM might not be able
to map it because of how DLLs got loaded in to the process.
Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping.

Well, that's what you're asking for when you use mmap. The same
mechanism that creates virtual memory using a swap file is used to
create a virtual memory mapping of your file. When you read from the
mmap file pages from the file a swapped into memory and stay in memory
until they need to be swapped out to make room for something else. If
you don't want this behaviour, don't use mmap.

Ross Ridge
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top