Imitating "tail -f"

I

Ivan Voras

I'm trying to simply imitate what "tail -f" does, i.e. read a file, wait
until it's appended to and process the new data, but apparently I'm
missing something.

The code is:

54 f = file(filename, "r", 1)
55 f.seek(-1000, os.SEEK_END)
56 ff = fcntl.fcntl(f.fileno(), fcntl.F_GETFL)
57 fcntl.fcntl(f.fileno(), fcntl.F_SETFL, ff | os.O_NONBLOCK)
58
59 pe = select.poll()
60 pe.register(f)
61 while True:
62 print repr(f.read())
63 print pe.poll(1000)

The problem is: poll() always returns that the fd is ready (without
waiting), but read() always returns an empty string. Actually, it
doesn't matter if I turn O_NDELAY on or off. select() does the same.

Any advice?
 
E

exarkun

I'm trying to simply imitate what "tail -f" does, i.e. read a file,
wait
until it's appended to and process the new data, but apparently I'm
missing something.

The code is:

54 f = file(filename, "r", 1)
55 f.seek(-1000, os.SEEK_END)
56 ff = fcntl.fcntl(f.fileno(), fcntl.F_GETFL)
57 fcntl.fcntl(f.fileno(), fcntl.F_SETFL, ff | os.O_NONBLOCK)
58
59 pe = select.poll()
60 pe.register(f)
61 while True:
62 print repr(f.read())
63 print pe.poll(1000)

The problem is: poll() always returns that the fd is ready (without
waiting), but read() always returns an empty string. Actually, it
doesn't matter if I turn O_NDELAY on or off. select() does the same.

Any advice?

select(), poll(), epoll, etc. all have the problem where they don't
support files (in the thing-on-a-filesystem sense) at all. They just
indicate the descriptor is readable or writeable all the time,
regardless.

"tail -f" is implemented by sleeping a little bit and then reading to
see if there's anything new.

Jean-Paul
 
W

Wolodja Wentland

I'm trying to simply imitate what "tail -f" does, i.e. read a file, wait
until it's appended to and process the new data, but apparently I'm
missing something. [..]
Any advice?

Have a look at [1], which mimics "tail -f" perfectly. It comes from a
talk by David Beazley on generators which you can find at [2] and
[3].

Enjoy!

[1] http://www.dabeaz.com/generators/follow.py
[2] http://www.dabeaz.com/generators-uk/
[3] http://www.dabeaz.com/coroutines/

--
.''`. Wolodja Wentland <[email protected]>
: :' :
`. `'` 4096R/CAF14EFC
`- 081C B7CD FF04 2BA9 94EA 36B2 8B7F 7D30 CAF1 4EFC

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBCAAGBQJLCQBPAAoJEIt/fTDK8U78kTAQAMd59UQSI3kmyQErW3hcPDYN
PLuRZDS9+61aYQKLl8UuWKBaQ2dMLY5E8lV5gMPsiciyBYGADsi1FWUoe5/jasvj
xTBRALvmLYeuyAT31Veq9svxzpSUC9oMp5WEusr3igMrdvQxqD8ps33aQZkuS5XP
Ctl5bATzu3xad8AoQnmfmS8K8dqMq/Kbwt2CGfZ6QEjPRqkBOa0Jtm8H3Ku5Rvhc
23jNZGsNVTuhm4E+6VSUhv5D2Aqh+7hX1EMphJQglXwr12kXByXTyczbtrsLVUQ6
LEFLDiwA2daKYQozszp9reSkBtjMEh1TY2zCtdZk0NaRqUlVkikwQaZIHAzuC7uv
9/3AlXMm6MtmIaRLPmMVozrFhdrKDopUONff0PGrItTr9Fm/i6XDSnVdt+3PFYRf
fcWt4Ebg9xcmO1J5UmCi8idPxrjedf+Pccn3oK0XPuC1rG0gnToqHg0ihb5m1YjE
jVmT3H97mPnZjCFApGTAxBYeiaFljvGvEvdpfo2+A+DRaDcEBSeSydAsq663Y80i
FUV22zldNBLuASp0qSMIlDvWkE9g8Iz8njMXjxjF13PgO20f8IjpgV2MaJGF4zRt
NiACdpT3oTjqhl5Hcg+lhTi9TYm2DLiarR/elXaxryGE944vvACcfebRntG+KdrG
+wU80WVY1I4RpagM691f
=CXgr
-----END PGP SIGNATURE-----
 
P

Paul Rudin

Matt Nordhoff said:
Some other operating systems have similar facilities, e.g. FSEvents on OS X.

Yeah, and there's a similar kind of thing in the windows api.

A nice python project would be a cross-platform solution that presented
a uniform api and just did the right thing behind the scenes on each OS.

(Incidentally on linux you need to watch out for the value of
/proc/sys/fs/inotify/max_user_watches - if you're using inotify in anger
it's easy to exceed the default set by a lot of distributions.)
 
N

Nobody

The problem is: poll() always returns that the fd is ready (without
waiting), but read() always returns an empty string. Actually, it
doesn't matter if I turn O_NDELAY on or off. select() does the same.

Regular files are always "ready" for read/write. read() might return EOF,
but it will never block (or fail with EAGAIN or EWOULDBLOCK).
Any advice?

The Linux version of "tail" uses the Linux-specific inotify_add_watch()
mechanism to block waiting for file-modification events.

If you don't have access to inotify_add_watch(), you'll just have to keep
trying to read from the file, sleep()ing whenever you hit EOF so that you
don't tie up the system with a busy-wait.
 
A

Aahz

Some other operating systems have similar facilities, e.g. FSEvents on OS X.

Having spent some time with FSEvents, I would not call it particularly
similar to inotify. FSEvents only works at the directory level. Someone
suggested pnotify the last time this subject came up, but I haven't had
time to try it out.
 
P

Paul Boddie

"tail -f" is implemented by sleeping a little bit and then reading to
see if there's anything new.

This was the apparent assertion behind the "99 Bottles" concurrency
example:

http://wiki.python.org/moin/Concurrency/99Bottles

However, as I pointed out (and as others have pointed out here), a
realistic emulation of "tail -f" would actually involve handling
events from operating system mechanisms. Here's the exchange I had at
the time:

http://wiki.python.org/moin/Concurrency/99Bottles?action=diff&rev2=12&rev1=11

It can be very tricky to think up good examples of multiprocessing
(which is what the above page was presumably intended to investigate),
as opposed to concurrency (which can quite easily encompass responding
to events asynchronously in a single process).

Paul

P.S. What's Twisted's story on multiprocessing support? In my limited
experience, the bulk of the work in providing usable multiprocessing
solutions is in the communications handling, which is something
Twisted should do very well.
 
E

exarkun

This was the apparent assertion behind the "99 Bottles" concurrency
example:

http://wiki.python.org/moin/Concurrency/99Bottles

However, as I pointed out (and as others have pointed out here), a
realistic emulation of "tail -f" would actually involve handling
events from operating system mechanisms. Here's the exchange I had at
the time:

http://wiki.python.org/moin/Concurrency/99Bottles?action=diff&rev2=12&rev1=11

It can be very tricky to think up good examples of multiprocessing
(which is what the above page was presumably intended to investigate),
as opposed to concurrency (which can quite easily encompass responding
to events asynchronously in a single process).

Paul

P.S. What's Twisted's story on multiprocessing support? In my limited
experience, the bulk of the work in providing usable multiprocessing
solutions is in the communications handling, which is something
Twisted should do very well.

Twisted includes a primitive API for launching and controlling child
processes, reactor.spawnProcess. It also has several higher-level APIs
built on top of this aimed at making certain common tasks more
convenient. There is also a third-party project called Ampoule which
provides a process pool to which it is is relatively straightforward to
send jobs and then collect their results.

Jean-Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top