What is best way to implement "tail"?

O

Owen Zhang

What is the best way to implement "tail -f" in C or C++ and higher
performance compared to either unix shell command "tail -f" or perl
File::Tail ? Any suggestion appreciated. Thanks.
 
B

Ben Pfaff

Owen Zhang said:
What is the best way to implement "tail -f" in C or C++ and higher
performance compared to either unix shell command "tail -f" or perl
File::Tail ? Any suggestion appreciated. Thanks.

What performance problems have you observed with these
implementations?
 
G

Guest

What is the best way to implement "tail -f" in C or C++ and higher
performance compared to either unix shell command "tail -f" or perl
File::Tail ? Any suggestion appreciated. Thanks.

You would need to use platform specific functions to do that, ask in a
group discussing programming on your platform.
 
L

Lew Pitcher

You would need to use platform specific functions to do that, ask in a
group discussing programming on your platform.

Nonsense. It can be done (disregarding questions of efficiency) in ISO
standard C.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void WaitFiveSecs(void)
{
clock_t now, later;

now = clock();
do
later = clock();
while (((later-now)/CLOCKS_PER_SEC) < 5);
}

int main(void)
{
int datum;

for(;;)
{
if ((datum = getchar()) != EOF)
putchar(datum);
else WaitFiveSecs();
}

return EXIT_SUCCESS;
}

 
U

user923005

You would need to use platform specific functions to do that, ask in a
group discussing programming on your platform.

I think it could be implemented purely in ANSI C without much
difficulty.

A super-simple (stinky) implementation would just read into a ring
buffer of size specified by the user, overwriting as it goes.
When the file has been read, spew out the lines still in the buffer.

Now, suppose we want something a bit smarter.

We could create a static array of characters with 1000 characters/row
and 1000 rows.
Next, we seek to the end of the file, and back up one megabyte.
Next we fgets from the current position into our ring buffer,
overwriting as we go until we hit the end.
If any rows exceeds 1000 characters, signal an error.
When we hit the end of the file, just cough up whatever is in the ring
buffer.

I'm sure that there are better ways to do it in standard C, but off
the top of my head I think it would work.

On the other hand, I guess that the UNIX shell command tail is
implemented in C and a simple hack job is not going to do better.
For that matter, the Perl version probably just calls the system tail
function or implements it in C underneath the covers anyway.

So I guess that it will be an exercise in futility.
 
L

Lew Pitcher

I think it could be implemented purely in ANSI C without much
difficulty.

A super-simple (stinky) implementation would just read into a ring
buffer of size specified by the user, overwriting as it goes.
When the file has been read, spew out the lines still in the buffer.

The minimum implementation of tail(1) on Unix depends on knowledge of
an admitedly platform-specific behaviour of the standard C I/O
library. On Unix, EOF is not necessarily a permanent and final
condition, and the implementation of the standard library acknowledges
this. I believe that this falls within a gray area of the C standard,
being an implementation-defined behaviour.

Knowing that EOF is possibly a temporary condition, programs can
choose to ignore the EOF indication on a file, and attempt to continue
reading it. tail(1) (when invoked with the -f "follow" flag)
disregards the EOF indicator, waits a bit (for the process that is
writing the file to write some more, and thus extend the file past
it's current end-of-file condition), and then attempts to read more
data from the file. No ring buffer necessary.

However, some sort of "out of band" condition or signal must be used
to tell the program to stop reading the file. Typically, this would be
done with an implicit or explicit signal handler that would terminate
the program when a suitable signal is presented to it. SIGINT does
nicely here.
 
A

Army1987

Nonsense. It can be done (disregarding questions of efficiency) in ISO
standard C.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void WaitFiveSecs(void)
{
clock_t now, later;

now = clock();
do
later = clock();
while (((later-now)/CLOCKS_PER_SEC) < 5);
If this uses 100% of processor time, it will block all other
tasks, including any one writing on that file. Anyway it's not a
good idea on a multitasking system, and a particularly bad idea on
multi-user ones. If it uses less than 100% of processor time, it
will actually wait longer than five seconds. The latter problem
can be solved using time() instead of clock(). The resolution will
be just of one seconds, though. (BTW, I'd not call 'now' the one
which stays constant... I'd call 'before' the former and 'now' the
latter.)
 
M

Martien verbruggen

Nonsense. It can be done (disregarding questions of efficiency) in ISO
standard C.

Not if you also take into account that one of the requirements is that
you need to have 'higher performace' than given other tools. To get that
sort of performance, you generally need to resort to platform-specific
tricks, or use functions or assumptions that fall outside of ISO
standard.

Martien
 
A

Army1987

You would need to use platform specific functions to do that, ask in a
group discussing programming on your platform.

Nonsense. It can be done (disregarding questions of efficiency) in ISO
standard C.
[snip]
for(;;)
{
if ((datum = getchar()) != EOF)
putchar(datum);
else WaitFiveSecs();
And in which way is this better than calling clearerr(stdin)?
 
K

Keith Thompson

Army1987 said:
If this uses 100% of processor time, it will block all other
tasks, including any one writing on that file. Anyway it's not a
good idea on a multitasking system, and a particularly bad idea on
multi-user ones. If it uses less than 100% of processor time, it
will actually wait longer than five seconds. The latter problem
can be solved using time() instead of clock(). The resolution will
be just of one seconds, though. (BTW, I'd not call 'now' the one
which stays constant... I'd call 'before' the former and 'now' the
latter.)

The resolution of time() is typically one second, but that's not
guaranteed.
 
F

Flash Gordon

Lew Pitcher wrote, On 19/09/07 19:52:
Nonsense. It can be done (disregarding questions of efficiency) in ISO
standard C.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void WaitFiveSecs(void)
{
clock_t now, later;

now = clock();
do
later = clock();
while (((later-now)/CLOCKS_PER_SEC) < 5);
}

int main(void)
{
int datum;

for(;;)
{
if ((datum = getchar()) != EOF)
putchar(datum);
else WaitFiveSecs();
}

return EXIT_SUCCESS;
}

This is not the same as the Unix "tail -f" since you cannot apply it to
a file being written by another process.
 
L

Lew Pitcher

Flash said:
Lew Pitcher wrote, On 19/09/07 19:52:

This is not the same as the Unix "tail -f" since you cannot apply it to
a file being written by another process.

Hmmm..... No.

lpitcher@merlin:~/code/mytail$ head -10 mytail.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void WaitFiveSecs(void)
{
clock_t now, later;

now = clock();
do
lpitcher@merlin:~/code/mytail$ cc -o mytail mytail.c
lpitcher@merlin:~/code/mytail$ mytail </var/log/messages
Sep 19 04:40:02 merlin syslogd 1.4.1: restart.
Sep 19 05:08:52 merlin -- MARK --
Sep 19 05:28:52 merlin -- MARK --
Sep 19 05:48:52 merlin -- MARK --
Sep 19 06:08:52 merlin -- MARK --
.
.
.
Sep 19 20:23:24 merlin popa3d[28471]: 1 (835) deleted, 0 (0) left
Sep 19 20:25:18 merlin sshd[28533]: Accepted publickey for lpitcher from
192.168.11.2 port 32882 ssh2

Seems to work fine for me


--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
C

CBFalconer

Erik said:
You would need to use platform specific functions to do that, ask
in a group discussing programming on your platform.

Implementing the -f requires abilities outside of standard C. For
the rest of it, a simple method would be to read the file with
ggets and save the last N pointers returned (not forgetting to free
one when overwriting an earlier one). Set ggets.zip on:

<http://cbfalconer.home.att.net/download/>
 
C

CBFalconer

Lew said:
Nonsense. It can be done (disregarding questions of efficiency) in
ISO standard C.

Yes, but your method eats up the whole machine, thus preventing any
other process from updating the input file. Not really practical.
 
C

C. Benson Manica

If this uses 100% of processor time, it will block all other
tasks, including any one writing on that file. Anyway it's not a
good idea on a multitasking system, and a particularly bad idea on
multi-user ones.

Well, the "higher performance" requirement didn't mention anything
about the performance of those other irrelevant tasks, and especially
not the silly tasks that other users might be running. It runs just
fine on the virtual machine! :)
 
F

Flash Gordon

Lew Pitcher wrote, On 20/09/07 01:27:
Flash said:
Lew Pitcher wrote, On 19/09/07 19:52:
This is not the same as the Unix "tail -f" since you cannot apply it to
a file being written by another process.

Hmmm..... No.

lpitcher@merlin:~/code/mytail$ head -10 mytail.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void WaitFiveSecs(void)
{
clock_t now, later;

now = clock();
do
lpitcher@merlin:~/code/mytail$ cc -o mytail mytail.c
lpitcher@merlin:~/code/mytail$ mytail </var/log/messages
Sep 19 04:40:02 merlin syslogd 1.4.1: restart.
Sep 19 05:08:52 merlin -- MARK --
Sep 19 05:28:52 merlin -- MARK --
Sep 19 05:48:52 merlin -- MARK --
Sep 19 06:08:52 merlin -- MARK --
.
.
.
Sep 19 20:23:24 merlin popa3d[28471]: 1 (835) deleted, 0 (0) left
Sep 19 20:25:18 merlin sshd[28533]: Accepted publickey for lpitcher from
192.168.11.2 port 32882 ssh2

Seems to work fine for me

Now do it on a file that is not updated for a long time, where "long
time" is defined as long enough for the end of the file to be reached
before the file is extended.
 
C

Chris Torek

What is the best way to implement "tail -f" [with emphasis on]
higher performance ...

Nonsense. It can be done (disregarding questions of efficiency) in ISO
standard C.

Indeed; but as I noted with a little editing above, "questions of
efficiency" seem to be central to the original poster.

[snippage]
for(;;)
{
if ((datum = getchar()) != EOF)
putchar(datum);
else WaitFiveSecs();
}

This fails on any system using my stdio, because you neglect to
call clearerr(stdin). As soon as the first EOF occurs, a "sticky"
EOF flag is set on the underlying stream, and further attempts to
read from it signal EOF again, without asking the underlying OS
for more bytes. Using clearerr() on the stream resets this "sticky"
flag, so that subsequent attempts to read from the stream will ask
the OS for more bytes.

(Implementations are allowed, but not required, to have this sort
of "sticky EOF" behavior. Various stdio implementations vary. I
made my behavor depend upon a single line in the stdio source code,
so that implementors could change it if desired.)
 
R

Richard Tobin

Chris Torek said:
(Implementations are allowed, but not required, to have this sort
of "sticky EOF" behavior.

C99 says, under fgetc():

If the end-of-file indicator [...] is set, or if the stream is at
end-of-file, the end-of-file indicator [...] is set and the fgetc
function returns EOF.

C90 is less clear: it says:

If the stream is at end-of-file, the end-of-file indicator [...] is
set and the fgetc function returns EOF.

So it seems to have been changed to clarify that sticky behaviour is
required.

-- Richard
 
L

Lew Pitcher

Yes, but your method eats up the whole machine, thus preventing any
other process from updating the input file. Not really practical.

Demonstrably not true.

But even if it were, then it would be a possible QOI issue on the
implementation of the standard library.

On a unitasking system, there would not and could not be another
process for this code to contend against. Thus the code would have
explicit permission to use as much CPU as necessary, but would be
useless because there would be no other code executing (because of the
design of the system, and not because of the example code) to write
the file that this code reads.

On a co-operative multitasking system, the program would effect it's
voluntary release of processing time either through the explicit
invocation a system-specific API (off topic here), or through an
implicit release imbedded in the standard library. I would expect
that, sans system-specific API, the implementation of standard
functions (especially like the time functions) would include such a
voluntary release. Once that release is met, then competing processes
(including the one that writes to the file) can run and extend the
file. Surely, this is a QOI issue here.

Finally, on a preemptive multitasking system, the program is forced
(through an off-topic external mechanism) to release it's processing.
Again, once released, competing processes (including the one that
writes to the file) can run and extend the file.

I stand by the assertions that this will and does work, does not "hog
CPU" and is legal (at least from a C90 standpoint). I don't know how
code written to the C99 standard (sans POSIX) would do this, though.
 
R

Richard Tobin

Lew Pitcher said:
Finally, on a preemptive multitasking system, the program is forced
(through an off-topic external mechanism) to release it's processing.
Again, once released, competing processes (including the one that
writes to the file) can run and extend the file.

And it will compete with them for cpu time. If you have one other
process running, they may well get half the cpu time each, which
certainly amounts to hogging the cpu given that the program doesn't
need to be using it at all.

Anyway, what is the point of the WaitFiveSecs function? If you're
going to busy-wait, why not just keep calling getchar()?

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,173
Latest member
GeraldReund
Top