How to constraint memory usage?

L

Lambda

In my program, I find when the program use up all the physical memory,
it will use virtual memory silently.

With C++, is it possible that when physical memory is used up,
I can order the program to write data in memory to disk,
clear the data in memory, then continue running.

Or is there such Linux API? So I can write portion of my program with
C.
 
I

Ian Collins

Lambda said:
In my program, I find when the program use up all the physical memory,
it will use virtual memory silently.

With C++, is it possible that when physical memory is used up,
I can order the program to write data in memory to disk,
clear the data in memory, then continue running.
You can provide your own new and delete operators, otherwise C++ has no
clue as to where its memory comes from.
 
L

Lambda

You can provide your own new and delete operators, otherwise C++ has no
clue as to where its memory comes from.

Thank you!
I'm using STL container such as std::vector. I have to redefine it?
 
I

Ian Collins

*Please* don't quote signatures.
Thank you!
I'm using STL container such as std::vector. I have to redefine it?

No.

You can either provide an allocator for each container, or provide your
global own new and delete operators.
 
F

Fred Zwarts

Lambda said:
In my program, I find when the program use up all the physical memory,
it will use virtual memory silently.

With C++, is it possible that when physical memory is used up,
I can order the program to write data in memory to disk,
clear the data in memory, then continue running.

Why do you need that?
Most virtual memory systems (amongst which Linux)
will do that already automatically for you.

C++ has no notion of the difference between physical and virtual memory.
So, this problem cannot be solved with standard C++ tools.

You can write your own new and delete operators for specific classes
(or even globally, but that has a lot of pittfalls), as suggested in another
reply, but the code for these operators will need non-standard C++ functions.
Or is there such Linux API? So I can write portion of my program with
C.

For this question you are in the wrong newsgroup. But even if there is
such an API, why should you use it in C and not in C++?
 
E

Erik Wikström

In my program, I find when the program use up all the physical memory,
it will use virtual memory silently.

No, when you program starts up it uses virtual memory and will continue
to do so until it terminates (provided that you run the program on an OS
which uses virtual memory, which most non-embedded do).

If there is not enough physical memory free the OS will swap some out to
disk, but which memory (and from what program) is impossible to tell.
With C++, is it possible that when physical memory is used up,
I can order the program to write data in memory to disk,
clear the data in memory, then continue running.

I suppose you could use mmap() or similar and tell the OS not to swap
the memory and then create your own allocator which uses this memory.
Then add functionality so when the memory is full you manually write it
to file, unmap the memory and map in new memory resident memory. The
problem is to trap the segmentation-fault which will occur when you try
to access any of the out-swapped memory and then map it and read it from
the file, but it might be possible.

Of course that would only apply for dynamic memory, your automatic
memory is still under the OS control.
Or is there such Linux API? So I can write portion of my program with
C.

I doubt it.

The real question is why anyone would ever want to do something like
that, virtual memory is one of the best thing invented for us
programmers, since we do not have to worry about this stuff.
 
L

Lambda

No, when you program starts up it uses virtual memory and will continue
to do so until it terminates (provided that you run the program on an OS
which uses virtual memory, which most non-embedded do).

If there is not enough physical memory free the OS will swap some out to
disk, but which memory (and from what program) is impossible to tell.


I suppose you could use mmap() or similar and tell the OS not to swap
the memory and then create your own allocator which uses this memory.
Then add functionality so when the memory is full you manually write it
to file, unmap the memory and map in new memory resident memory. The
problem is to trap the segmentation-fault which will occur when you try
to access any of the out-swapped memory and then map it and read it from
the file, but it might be possible.

Of course that would only apply for dynamic memory, your automatic
memory is still under the OS control.


I doubt it.

The real question is why anyone would ever want to do something like
that, virtual memory is one of the best thing invented for us
programmers, since we do not have to worry about this stuff.

Thank you, Erik!

I'm trying to implement a information retrieval algorithm,
Single-pass in-memory indexing.
It' used to construct inverted index from document collections.

SPIMI-INVERT(token_stream)
1 output_file = NEWFILE()
2 dictionary = NEWHASH()
3 while (free memory available) <--
4 do token ↠next(token_stream)
5 if term(token) /∈ dictionary
6 then postings_list = ADDTODICTIONARY(dictionary, term(token))
7 else postings_list = GETPOSTINGSLIST(dictionary, term(token))
8 if full(postings_list)
9 then postings_list = DOUBLEPOSTINGSLIST(dictionary, term(token))
10 ADDTOPOSTINGSLIST(postings_list, docID(token))
11 sorted_terms ↠SORTTERMS(dictionary)
12 WRITEBLOCKTODISK(sorted_terms, dictionary, output_file)
13 return output_file

What the algorithm do is:
1. Construct index in memory until the memory is used up.
2. Write the index to a file, release memory, then continue,
until all documents are precessed
3. Merge all the files

So if I just let the program use virtual memory,
and I have a huge document collection,
it is possible that the swap partition will be used up.
And the machine will become very slow
due to a lot of I/O operations, right?

If so, this algorithm will become useless.
That's why I think if there is no physical memory available,
I should take some actions.

I know this is a strange requirement,
and maybe this is not the right way.

I'll try your method. From your description,
I find this is NOT a easy task.
 
P

Puppet_Sock

[worries about program possibly being slow snipped]

Before you go inventing a huge complicated scheme to
avoid something, do some measuring.

Define what you mean by "slow." Work out from your
algorithm what the time dependancy should be as a
function of the number of records. I didn't look too
carefully, but I thought it should have some parts
that were roughly linear. That is, double the number
of entries will make it take twice as long. (Maybe
worse, I see sort/merge actions.)

Then figure out how long is too long for a given
number of entries. Maybe it is not possible with
your algorithm no matter how fast your memory.

Whomp up a test case doing something as close as you
can manage to similar operations. See if it really
does become slow. Test it with different numbers of
entries. The code that does these tests does not
have to be complete or polished. Just make it do
some similar operations.

Only if it does become too slow should you consider
some alternative. Consider whether you can speed the
algorithm. Maybe you don't need as much stuff in
memory. Or maybe searching can be made faster.
Or any of several other things.

Example: Maybe you can keep some kind of key in
memory instead of the entire record. Then when
you have all your data read in, maybe you can do
the sorting/merging just on that key. Then you
can use the sorted/merged key list to write your
final data out.

If all that fails, only then start thinking about
doing big code twisting rewrites. And then, again
you need to measure. Again, whomp up a test rig
that tries things the way you think might work.
See if it really is faster. If it is faster, is it
fast enough? And does it stay fast enough with
larger number of records?
Socks
 
E

Erik Wikström

Thank you, Erik!

I'm trying to implement a information retrieval algorithm,
Single-pass in-memory indexing.
It' used to construct inverted index from document collections.
So if I just let the program use virtual memory,
and I have a huge document collection,
it is possible that the swap partition will be used up.
And the machine will become very slow
due to a lot of I/O operations, right?

First of all you should realise that you will use virtual memory
regardless of what you do (you should read up on the subject since it is
clear you do not fully understand it yet). Second, if you computer is
not very low on RAM you can use pretty much memory before it starts
getting swapped out. Third, since the OS will swap out inactive programs
first it is possible your application will run entirely in memory (or
with minimal swapping) unless the computer is already under heavy load,
in which case it will not matter much if your applications swaps or not.
If so, this algorithm will become useless.

Then you should probably look for an algorithm which is so dependant on
running in RAM.
That's why I think if there is no physical memory available,
I should take some actions.

The OS will be much better at making that decision.
I know this is a strange requirement,
and maybe this is not the right way.

Very probably not.
I'll try your method. From your description,
I find this is NOT a easy task.

No, probably impossible the way I described it (I do not think you can
trap the segmentation fault signal). And even if it was possible the
overhead of manually managing the memory (as opposes to letting the OS
do it) will probably drag down performance substantially. Another
problem would be the fact that the amount of memory you could lock down
in RAM is limited per application (probably max a few 100 MB) so the
fact that you would have substantially less memory to use would be a
problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,144
Latest member
KetoBaseReviews
Top