File Polling (Rereading)

D

Daniel Mueller

Hello Fellow Python Programmers,

I have the following problem:

i want to read the content of a file every 10 seconds. now what is the
best funktion to do that? to open the file, read it and close it
consumes a lot of CPU time. is there a better solution?

Greetings Daniel
 
D

Diez B. Roggisch

Daniel said:
Hello Fellow Python Programmers,

I have the following problem:

i want to read the content of a file every 10 seconds. now what is the
best funktion to do that? to open the file, read it and close it
consumes a lot of CPU time. is there a better solution?

No. At least not if what you really want is always the full content of the
file. If you're only interested in the content if the file has actually
changed - thats a totally different animal, and is under unix doable using
stat-calls (somewhere in the os module). I'm not sure how much that extends
to windows, but I'm pretty much confident that there is similar stuff
available.

Diez-
 
D

Daniel Mueller

Diez said:
If you're only interested in the content if the file has actually
changed - thats a totally different animal,

Good to hear! i only need the changes! do you have code examples?
and is under unix doable using
stat-calls (somewhere in the os module). I'm not sure how much that extends
to windows, but I'm pretty much confident that there is similar stuff
available.

im programming under Linux
 
D

Diez B. Roggisch

Good to hear! i only need the changes! do you have code examples?

This is a misunderstanding: I didn't mean "the changes". What you can get by
calling

os.stat

is the timestamp of the last modification - so you can then skip rereading
the file if there has been no modification after the last time you read the
file.

But you _can't_ only read what has been changed in the file - there is no
such thing neither in python, nor the underlying OSses (which is the reason
for python not having such a functionality)

I suggest you tell us more about what you actually want to accomplish, then
we might be able to offer better advice.

Diez
 
O

Oliver Fromme

Daniel Mueller said:
> i want to read the content of a file every 10 seconds. now what is the
> best funktion to do that? to open the file, read it and close it
> consumes a lot of CPU time. is there a better solution?

The portable way is to stat the file (see os.stat) every
10 seconds, look at the mtime (modification time), and
if it did change, then open/read/close. This still means
polling the file, but at least stat is much more efficient
than open/read/close. If you're also interested in changes
to the metainformations of the file (permissions, owner
etc.), you also have to look at the ctime.

If you're lucky enough to work on a FreeBSD UNIX system
(and don't need a portable solution), you can use FreeBSD's
kqueue API. Using that interface, you don't have to poll
at all. The kernel will notify you immediately when a
specific event occurs, such as someone writing to a certain
file (this is used by the "tail -f" command, for example,
so it doesn't have to poll the file). The Python bindings
for the kqueue interface are in the ports collection of
FreeBSD (see ports/devel/kqueue). Otherwise, see this
webpage for more information:

http://people.freebsd.org/~dwhite/PyKQueue/

Best regards
Oliver
 
M

Miki Tebeka

Hello Diez,
This is a misunderstanding: I didn't mean "the changes". What you can get by
calling

os.stat

is the timestamp of the last modification - so you can then skip rereading
the file if there has been no modification after the last time you read the
file.
After that you can compare md5 hashes (which are fast to compute) and know if
the file content has changed.
But you _can't_ only read what has been changed in the file - there is no
such thing neither in python, nor the underlying OSses (which is the reason
for python not having such a functionality)
IMO it is possible (don't know exactly how) on journalized file systems.

Bye.
 
D

Daniel Mueller

Oliver said:
The portable way is to stat the file (see os.stat) every
10 seconds, look at the mtime (modification time), and
if it did change, then open/read/close.

hmm i'll have to rephrase my question... i KNOW that the file changes
every 10 seconds... easiest would be i just give you the source code ;)
the program is a nice little windows which displays ACPI information
about battery status and stuff like that. the program is working just
fine. but i want the performance tweak it...

here is the link to the source code:

http://sourceforge.net/project/showfiles.php?group_id=108369&package_id=117073

help would be greatly appreciated!
If you're lucky enough to work on a FreeBSD UNIX system
(and don't need a portable solution)

Linux Platform... and no i dont need a portable solution
 
D

Diez B. Roggisch

But you _can't_ only read what has been changed in the file - there is no
IMO it is possible (don't know exactly how) on journalized file systems.

I seriously doubt that - at least that this sort of diff you can get is
more than only a list of file-offsets together with blocks of a certain
size. The os doesn't do versioned updates like subverision or cvs do. So
most times, that won't be of much use (think of xml-data - howto replace
only certain parts, that not neccelarily respect the structural
requirements of xml?)

But as long as the OP doesn't fill us in with more details of what he is
after , this is all idle speculation.
 
D

Diez B. Roggisch

hmm i'll have to rephrase my question... i KNOW that the file changes
every 10 seconds... easiest would be i just give you the source code ;)
the program is a nice little windows which displays ACPI information
about battery status and stuff like that. the program is working just
fine. but i want the performance tweak it...

Why don't you keep the file open, and read until the end - that will give
you all that has been appended since the last read. You might run into
buffering issues here, but you should be able to modifiy the
termios-settings of the filedescriptor so you can read data even if only a
byte has been send.
 
P

Peter Hansen

Daniel said:
hmm i'll have to rephrase my question... i KNOW that the file changes
every 10 seconds... easiest would be i just give you the source code ;)
the program is a nice little windows which displays ACPI information
about battery status and stuff like that. the program is working just
fine.

What makes you think there is any way to detect the file changes
or read only the changes from files in the /proc file system?

Looking briefly at /proc on my own (non-laptop, so no acpi folder)
machine, I see that the dates as shown by "ls -l" or os.stat()
are *always* the current time... they increment second-by-second
as I run os.stat() repeatedly.
> but i want the performance tweak it...

Why? Do you have any evidence that you have a performance problem
related to reading the data from these pseudo-files? (I'm guessing
that you think they are real files on your hard drive, but even
if that were the case, you probably haven't measured the access
or read times to prove that you actually *have* a problem.)

/proc is a *virtual* file system, so reading data from it is
about as fast as transferring bytes around in memory (to a
first approximation, anyway). Measure it!

If you don't have evidence of poor performance, you are probably
doing "premature optimization".

-Peter
 
D

Daniel Mueller

Peter said:
Why? Do you have any evidence that you have a performance problem
related to reading the data from these pseudo-files? (I'm guessing
that you think they are real files on your hard drive, but even
if that were the case, you probably haven't measured the access
or read times to prove that you actually *have* a problem.)

Yeah i know that /proc is a virtual filesystem. And yes i know that the
files always have the pressent timestamp.

Well the application uses up to 10% of CPU power with every poll... is
that a normal amount??

Daniel
 
P

Peter Hansen

Daniel said:
Yeah i know that /proc is a virtual filesystem. And yes i know that the
files always have the pressent timestamp.

Then we can at least dispense with any kind of "read changes only"
concept, can't we? You clearly have to read the entire file and
compare it with the previous file to know exactly what has changed.
Or read the file and parse it and compare the parsed results. I'm
unclear what other approach could be conceived of...
Well the application uses up to 10% of CPU power with every poll... is
that a normal amount??

Sure... actually I wouldn't be surprised if my program appeared
to take 100% of CPU while processing the data after reading the
files... to do otherwise would be abnormal. What I'd be focusing
on, however, was for *how long* it did so.

-Peter
 
O

Oliver Fromme

Daniel Mueller said:
>
> hmm i'll have to rephrase my question... i KNOW that the file changes
> every 10 seconds...

In that case you have to re-read the file every 10 seconds
anyway, no matter what.

The only optimization that's worthwhile is to keep the file
open all the time, so you spare the overhead of the open()
system call. In other words, open it _once_, then read it
(do not close it!), and after 10 seconds rewind -- that is,
file.seek(0) -- and re-read it.

I don't know what the contents of that file look like (I
don't use Linux), nor do I want to know. But since the Li-
nux procfs usually produces ASCII text files (which is a
mistake, in my opinion), I guess it should be pretty easy
to parse and find the difference.

If it's more complicated than that, Python's difflib might
be helpful: http://docs.python.org/lib/module-difflib.html

Best regards
Oliver
 
J

John Taylor

Daniel Mueller said:
Hello Fellow Python Programmers,

I have the following problem:

i want to read the content of a file every 10 seconds. now what is the
best funktion to do that? to open the file, read it and close it
consumes a lot of CPU time. is there a better solution?

Greetings Daniel

Daniel,

I have written a python program that does this. It does not use a lot
of CPU cycles. You need to change the SLEEP variable from 0.50 to 10.
I have used it on Windows and on Linux. Here it is...

-John

#!/usr/bin/env python
"""
tail.py
John Taylor, Oct 19 2004
Adapted from:
http://groups.google.com/groups?hl=en&lr=&[email protected]
"""

import sys,os,time,stat
SLEEP = 0.50

if len(sys.argv) != 2:
print
print "tail.py [ filename ]"
print
sys.exit(1)

FILENAME = sys.argv[1]

try:
fd = os.open(FILENAME,os.O_RDONLY) # on Linux, may want |O_LARGEFILE
except OSError, e:
print e
sys.exit(1)

info = os.fstat( fd )
lastsize = info[stat.ST_SIZE]
os.lseek( fd, lastsize, 2 )

try:
while True:
info = os.fstat( fd )
size = info[stat.ST_SIZE]
if size > lastsize:
os.lseek(fd, lastsize, 0)
data = os.read(fd, size - lastsize)
print data,
lastsize=size
time.sleep( SLEEP )
except KeyboardInterrupt:
pass

# end of program
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top