sync databse table based on current directory data without losignprevious values

  • Thread starter Íßêïò Ãêñ33ê
  • Start date
D

Dave Angel

Apologies as after I have left the group for a while I have forgotten how not to post a question on top of another question. Very sorry and appreciate your replies.

I tried explicitly calling gc.collect() and didn't manage to see the memory footprint reduced. I probably haven't left the process idle long enough to see the internal garbage collection takes place but I will leave it idle for more than 8 hours and check again. Thanks!

You're top-posting, which makes things very confusing, since your
contribution to the message is out of accepted order. Put your remarks
after the part you're commenting on, and delete anything following your
message, as it clearly didn't need your comments.

Once you've called gc.collect(), there's no point in waiting 8 hours for
it to run again. It either triggered the C runtime's logic or it
didn't, and running it again won't help unless in the meantime you
rearranged the remaining allocated blocks.

Accept the fact that not all freeing of memory blocks can possibly make
it through to the OS. If they did, we'd have a minimum object size of
at least 4k on the Pentium, and larger on some other processors. We'd
also have performance that would crawl. So an external tool can only
give you a very approximate size for what's going on in your own code.
 
D

Dave Angel

I'm sorry, typo on my part.

That should have been "fullpath", not "file" (and neither "files" as you
wrongly reported back!):

# Compute a set of current fullpaths
current_fullpaths = set()
for root, dirs, files in os.walk(path):
for fullpath in files:

'fullpath' is a rather misleading name to use, since the 'files' list
contains only the terminal node of the file name. It's only a full path
after you do the following join.
 
C

Chris Angelico

You were told yesterday at least twice that os.walk returns a tuple but you
still insist on refusing to take any notice of our replies when it suits
you, preferring instead to waste everbody's time with these questions. Or
are you trying to get into the Guinness Book of World Records for the
laziest bastard on the planet?

This is the same person who posts under the name Ferrous Cranus. His
threads are open-ended time sinks; he generally expects everyone else
to do the work, while his contribution is to complain that the
provided code doesn't fit some arbitrary set of restrictions that
don't always even make sense.

Guinness might be interested in him as "Most Successful Troll",
though. He certainly absorbed a lot of my dev time before I gave up on
him, and by the look of things, he's still getting other people to do
his work for him.

ChrisA
 
M

Michael Ross

You were told yesterday at least twice that os.walk returns a tuple but
you still insist on refusing to take any notice of our replies when it
suits you, preferring instead to waste everbody's time with these
questions. Or are you trying to get into the Guinness Book of World
Records for the laziest bastard on the planet?

Hold on a sec ...

He has

for result in os.walk(path):
for filename in result[2]:

So he *did* take notice of that.


Nikos:
Expectation is to iterate through a tuple like this:

for dirpath, dirnames, filenames in os.walk(path):
...
 
L

Lele Gaifax

Dave Angel said:
'fullpath' is a rather misleading name to use, since the 'files' list
contains only the terminal node of the file name. It's only a full
path after you do the following join.

Yes, you're right.

Dunno what urged me to ``M-x replace-string file fullpath`` introducing
both an error and a bad variable name, most probably the unconscious
desire of not clobbering builtin names... :)

Thanks for pointing it out,
ciao, lele.
 
N

nagia.retsina

Τη ΤετάÏτη, 6 ΜαÏτίου 2013 4:04:26 μ.μ. UTC+2, ο χÏήστης Michael Ross έγÏαψε:
On 06/03/2013 07:45, Îίκος ΓκÏ33κ wrote:

You were told yesterday at least twice that os.walk returns a tuple but
you still insist on refusing to take any notice of our replies when it
suits you, preferring instead to waste everbody's time with these
questions. Or are you trying to get into the Guinness Book of World
Records for the laziest bastard on the planet?



Hold on a sec ...



He has



for result in os.walk(path):

for filename in result[2]:



So he *did* take notice of that.





Nikos:

Expectation is to iterate through a tuple like this:



for dirpath, dirnames, filenames in os.walk(path):

...

Thank you Michael, yes i ahve understood that myself yesterday after one ofthe guys here have shown the output of os.walk(path) in IDLE.
So, yes in fact i was in need for the 3rd item in the tuple, so to get holdon to the files.

Thank you for supporting me, some ppl here think i'am a troll and don't trythinks or ignore evrything but 'some_ppl's_opinion != True'

I'am just sometimes persistant on having thing my way that why i was calling my self Ferrous cRanus, which i changes it since it was annoying....
 
N

nagia.retsina

Τη ΤετάÏτη, 6 ΜαÏτίου 2013 4:04:26 μ.μ. UTC+2, ο χÏήστης Michael Ross έγÏαψε:
On 06/03/2013 07:45, Îίκος ΓκÏ33κ wrote:

You were told yesterday at least twice that os.walk returns a tuple but
you still insist on refusing to take any notice of our replies when it
suits you, preferring instead to waste everbody's time with these
questions. Or are you trying to get into the Guinness Book of World
Records for the laziest bastard on the planet?



Hold on a sec ...



He has



for result in os.walk(path):

for filename in result[2]:



So he *did* take notice of that.





Nikos:

Expectation is to iterate through a tuple like this:



for dirpath, dirnames, filenames in os.walk(path):

...

Thank you Michael, yes i ahve understood that myself yesterday after one ofthe guys here have shown the output of os.walk(path) in IDLE.
So, yes in fact i was in need for the 3rd item in the tuple, so to get holdon to the files.

Thank you for supporting me, some ppl here think i'am a troll and don't trythinks or ignore evrything but 'some_ppl's_opinion != True'

I'am just sometimes persistant on having thing my way that why i was calling my self Ferrous cRanus, which i changes it since it was annoying....
 
S

Steven D'Aprano

Hello there,

I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box.

I discovered following behavior whereby the python process doesn't seem
to release memory utilized even after a variable is set to None, and
"deleted". I use glance tool to monitor the memory utilized by this
process. Obviously after the for loop is executed, the memory used by
this process has hiked to a few MB. However, after "del" is executed to
both I and str variables, the memory of that process still stays at
where it was.

Any idea why?

Python does not guarantee to return memory to the operating system.
Whether it does or not depends on the OS, but as a general rule, you
should expect that it will not.

... str=str+"%s"%(i,)


You should never build large strings in that way. It risks being
horribly, horribly slow on some combinations of OS, Python implementation
and version.

Instead, you should do this:

items = ["%s" % i for i in range(100000)]
s = ''.join(items)
 
W

Wong Wah Meng-R32813

Python does not guarantee to return memory to the operating system.
Whether it does or not depends on the OS, but as a general rule, you shouldexpect that it will not.

... str=str+"%s"%(i,)


You should never build large strings in that way. It risks being
horribly, horribly slow on some combinations of OS, Python implementation
and version.

Instead, you should do this:

items = ["%s" % i for i in range(100000)]
s = ''.join(items)

[] The example is written for illustration purpose. Thanks for pointing outa better way of achieving the same result. Yes it seems so that the OS thinks the piece allocated to Python should not be taken back unless the process dies. :(
 
C

Chris Angelico

[] The example is written for illustration purpose. Thanks for pointing out a better way of achieving the same result. Yes it seems so that the OS thinks the piece allocated to Python should not be taken back unless the process dies. :(

Don't be too bothered by that. That memory will be reused by Python
for subsequent allocations.

ChrisA
 
W

Wong Wah Meng-R32813

The problem is my server hits memory usage threshold, and starts giving me errors like Oracle unable to spawn off new session stating Out of Memory error and what not. I won't be bothered much if I have the luxury of available memory for other processes to use. If only if the UNIX understand my concerns and release the allocation when I issue gc.collect() or the gc.collect() takes place. :)






[] The example is written for illustration purpose. Thanks for
pointing out a better way of achieving the same result. Yes it seems
so that the OS thinks the piece allocated to Python should not be
taken back unless the process dies. :(

Don't be too bothered by that. That memory will be reused by Python for subsequent allocations.
 
D

Dennis Lee Bieber

The problem is my server hits memory usage threshold, and starts giving me errors like Oracle unable to spawn off new session stating Out of Memory error and what not. I won't be bothered much if I have the luxury of available memory for other processes to use. If only if the UNIX understand my concerns and release the allocation when I issue gc.collect() or the gc.collect() takes place. :)

If the memory usage is continually growing, you have something else
that is a problem -- something is holding onto objects. Even if Python
is not returning memory to the OS, it should be reusing the memory it
has if objects are being freed.
 
W

Wong Wah Meng-R32813

If the memory usage is continually growing, you have something else that is a problem -- something is holding onto objects. Even if Python is not returning memory to the OS, it should be reusing the memory it has if objects are being freed.
 
D

Dave Angel

Yes I have verified my python application is reusing the memory (just that it doesn't reduce once it has grown) and my python process doesn't have any issue to run even though it is seen taking up more than 2G in footprint. My problem is capacity planning on the server whereby since my python process doesn't release memory back to the OS, the OS wasn't able to allocate memory when a new process is spawn off>

So is the server running out of disk space? If not, why not increase
the swapfile(s) size? That's what's being held onto, not RAM.

If you are short on disk space, and therefore cannot afford to increase
the swapfile size, then you may want to plan the Python program's
execution around that constraint.

Discounting the play-program that started this thread, is it possible
that your actual app unnecessarily uses lots of space during one phase
of execution, and therefore is saddled with that space at other times?
For example, are you using a large list to hold a data file when an
iterator would do just as well?

If that phase of execution cannot be eliminated, but it's a fleeting
time, perhaps that part of the execution can be launched from the main
app as a separate process. When the process ends, the memory is freed.
 
I

Isaac To

In general, it is hard for any process to return the memory the OS allocate
to it back to the OS, short of exiting the whole process. The only case
that this works reliably is when the process allocates a chunk of memory by
mmap (which is chosen by libc if it malloc or calloc a large chunk of
memory), and that whole chunk is not needed any more. In that case the
process can munmap it. Evidently you are not see that in your program.
What you allocate might be too small (so libc choose to allocate it using
another system call "sbrk"), or that the allocated memory also hold other
objects not freed.

If you want to reduce the footprint of a long running program that
periodically allocates a large chunk of memory, the "easiest" solution is
to fork a different process to achieve the computations that needs the
memory. That way, you can exit the process after you complete the
computation, and at that point all memory allocated to it is guaranteed to
be freed to the OS.

Modules like multiprocessing probably make the idea sufficiently easy to
implement.
 
G

Grant Edwards

Your entire post is in your signature block. Don't do that. Many
people have newsreaders or e-mail clinets configured to hide or ignore
signature blocks.
Yes I have verified my python application is reusing the memory (just
that it doesn't reduce once it has grown) and my python process
doesn't have any issue to run even though it is seen taking up more
than 2G in footprint.

Then there's nothing wrong with Python.
My problem is capacity planning on the server whereby since my python
process doesn't release memory back to the OS,

In Unix there is no way to release heap memory (which is what you're
talking about) back to the OS except for terminating the process.
the OS wasn't able to allocate memory when a new process is spawn off.

You need to either get more memory, change your Python program to use
less memory, or have your Python program terminate in order to return
memory to the OS.

Perhaps you can put the memory hungery portion of your processing into
a subprocess that terminates when it's completed its work?
 
R

Roy Smith

Grant Edwards said:
In Unix there is no way to release heap memory (which is what you're
talking about) back to the OS except for terminating the process.

That's not quite true. The man page for BRK(2) (at least on the Linux
box I happen to have handy) says:

"brk() and sbrk() change the location of the program break, which
defines the end of the process's data segment (i.e., the program
break is the first location after the end of the uninitialized data
segment). Increasing the program break has the effect of allocating
memory to the process; decreasing the break deallocates memory."

So, in theory, it's possible. I just ran this C program:

#include <unistd.h>
#include <time.h>

int main(int argc, char** argv) {
struct timespec t;
t.tv_sec = 10;
t.tv_nsec = 0;

nanosleep(&t, NULL);
sbrk(500 * 1024 * 1024);

nanosleep(&t, NULL);
sbrk(-500 * 1024 * 1024);

nanosleep(&t, NULL);
}

while watching the process with ps. I could see the process appear and
for the first 10 seconds it had a VSZ of 4156. Then, for the next 10
seconds, the VSZ was 516156, then it went back down to 4156.

$ while sleep 1; do ps augx | grep a.out | grep -v grep; done
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 516156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out
roy 6657 0.0 0.0 4156 356 pts/10 S+ 19:56 0:00 ./a.out

In practice, unless you go to extraordinary lengths (i.e. avoid almost
all of the standard C library), the break is going to be managed by
malloc(), and malloc() provides no way to give memory back (insert
handwave here about mmap, and another handwave about various versions of
malloc). But, that's due to the way malloc() manages its arena, not
anything fundamental about how Unix works.
 
G

Grant Edwards

That's not quite true. The man page for BRK(2) (at least on the Linux
box I happen to have handy) says:

"brk() and sbrk() change the location of the program break, which
defines the end of the process's data segment (i.e., the program
break is the first location after the end of the uninitialized data
segment). Increasing the program break has the effect of allocating
memory to the process; decreasing the break deallocates memory."

So, in theory, it's possible.

Well spotted. I would have bet money (OK, not a lot), that sbrk()
only accepted positive increment values. I seem to have conflated
what sbrk() will accept and the fact that the C library's malloc/free
calls never call sbrk() with a nagative number.

[... nicely done demonstraction of negative sbrk() parameter usage ...]
In practice, unless you go to extraordinary lengths (i.e. avoid
almost all of the standard C library), the break is going to be
managed by malloc(), and malloc() provides no way to give memory back
(insert handwave here about mmap, and another handwave about various
versions of malloc). But, that's due to the way malloc() manages its
arena, not anything fundamental about how Unix works.

Indeed.

What I should have said was that there's no way to return to the OS
memory obtained via calls to malloc() et al, and those are the calls
that "good" C programmers (like the maintainers of CPython) use.

In theory, you could modify the C library so that calls to free()
might return memory to the OS. Sometimes. If you're lucky.

The problem is that if there's _one_byte_ of memory in use at the
"end" of the heap region, it doesn't matter if there's an unused 16GB
chunk in the middle -- it can't be returned to the OS.
 
R

Roy Smith

Grant Edwards said:
What I should have said was that there's no way to return to the OS
memory obtained via calls to malloc() et al.

That's true (for certain values of "et al").
and those are the calls that "good" C programmers (like the
maintainers of CPython) use.

Well, there is mmap, which is exposed via the Python mmap module.
Python doesn't have anything like C++'s "placement new", so there's no
way to use that memory to hold generic Python objects, but you can
certainly use mmap to allocate a large chunk of memory, use it, and then
give it back to the OS. For example:

#!/usr/bin/env python

import mmap
import time

time.sleep(5)

f = open('my-500-mbyte-text-file')
data = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)

count = 0
while 1:
line = data.readline()
if not line:
break
count += 1

print count
time.sleep(5)
data.close()
time.sleep(5)

When I run that and watch the process size (like I did with the previous
example), you can see the process grow when mmap() is called, and shrink
again when the segment is closed.

I have to admit, in all the years I've been using Python, this is the
first time I've ever used the mmap module. Even for long running
processes, the automatic memory management that Python provides out of
the box has always been good enough for me. But, it's nice to know mmap
is there if I need to do something unusual.
 
G

Grant Edwards

That's true (for certain values of "et al").


Well, there is mmap, which is exposed via the Python mmap module.
Python doesn't have anything like C++'s "placement new", so there's
no way to use that memory to hold generic Python objects, but you can
certainly use mmap to allocate a large chunk of memory, use it, and
then give it back to the OS.

[example]

I was surprised to find that the object returned by mmap.mmap()
supported file semantics (seek, tell, read, etc.) in addition to byte
buffer semantics. Usually, the reason one maps a file is that one
doesn't want to use file semantics (which are already supported by the
file object) but wants instead to use buffer semantics.

Using memmap to obtain a "returnable" chunk of memory seems a bit
obtuse, since it requires creating an underlying file of appropriate
size that ends up being superfluous.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top