CPU usage while reading a named pipe

M

Miguel P

Hey everyone,

I've been working on parsing (tailing) a named pipe which is the
syslog output of the traffic for a rather busy haproxy instance. It's
a fair bit of traffic (upto 3k hits/s per server), but I am finding
that simply tailing the file in python, without any processing, is
taking up 15% of a CPU core. In contrast HAProxy takes 25% and syslogd
takes 5% with the same load. `cat < /named.pipe` takes 0-2%

Am I just doing things horribly wrong or is this normal?

Here is my code:

from collections import deque
import io, sys

WATCHED_PIPE = '/var/log/haproxy.pipe'

if __name__ == '__main__':
try:
log_pool = deque([],10000)
fd = io.open(WATCHED_PIPE)
for line in fd:
log_pool.append(line)
except KeyboardInterrupt:
sys.exit()

Deque appends are O(1) so that's not it. And I am using 2.6's io
module because it's supposed to handle named pipes better. I have
commented the deque appending line and it still takes about the same
CPU.

The system is running Ubuntu 9.04 with kernel 2.6.28 and ext4 (not
sure the FS is relevant).

Any help bringing down the CPU usage would be really appreciated, and
if it can't be done I guess that's ok too, server has 6 cores not
doing much.
 
M

MRAB

Miguel said:
Hey everyone,

I've been working on parsing (tailing) a named pipe which is the
syslog output of the traffic for a rather busy haproxy instance. It's
a fair bit of traffic (upto 3k hits/s per server), but I am finding
that simply tailing the file in python, without any processing, is
taking up 15% of a CPU core. In contrast HAProxy takes 25% and syslogd
takes 5% with the same load. `cat < /named.pipe` takes 0-2%

Am I just doing things horribly wrong or is this normal?

Here is my code:

from collections import deque
import io, sys

WATCHED_PIPE = '/var/log/haproxy.pipe'

if __name__ == '__main__':
try:
log_pool = deque([],10000)
fd = io.open(WATCHED_PIPE)
for line in fd:
log_pool.append(line)
except KeyboardInterrupt:
sys.exit()

Deque appends are O(1) so that's not it. And I am using 2.6's io
module because it's supposed to handle named pipes better. I have
commented the deque appending line and it still takes about the same
CPU.

The system is running Ubuntu 9.04 with kernel 2.6.28 and ext4 (not
sure the FS is relevant).

Any help bringing down the CPU usage would be really appreciated, and
if it can't be done I guess that's ok too, server has 6 cores not
doing much.

Is this any faster?

log_pool.extend(fd)
 
N

Ned Deily

I've been working on parsing (tailing) a named pipe which is the
syslog output of the traffic for a rather busy haproxy instance. It's
a fair bit of traffic (upto 3k hits/s per server), but I am finding
that simply tailing the file in python, without any processing, is
taking up 15% of a CPU core. In contrast HAProxy takes 25% and syslogd
takes 5% with the same load. `cat < /named.pipe` takes 0-2%

Am I just doing things horribly wrong or is this normal?

Here is my code:

from collections import deque
import io, sys

WATCHED_PIPE = '/var/log/haproxy.pipe'

if __name__ == '__main__':
try:
log_pool = deque([],10000)
fd = io.open(WATCHED_PIPE)
for line in fd:
log_pool.append(line)
except KeyboardInterrupt:
sys.exit()

Deque appends are O(1) so that's not it. And I am using 2.6's io
module because it's supposed to handle named pipes better. I have
commented the deque appending line and it still takes about the same
CPU.

Be aware that the io module in Python 2.6 is written in Python and was
viewed as a prototype. In the current svn trunk, what will be Python
2.7 has a much faster C implementation of the io module backported from
Python 3.1.
 
M

Miguel P

I've been working on parsing (tailing) a named pipe which is the
syslog output of the traffic for a rather busy haproxy instance. It's
a fair bit of traffic (upto 3k hits/s per server), but I am finding
that simply tailing the file  in python, without any processing, is
taking up 15% of a CPU core. In contrast HAProxy takes 25% and syslogd
takes 5% with the same load. `cat < /named.pipe` takes 0-2%
Am I just doing things horribly wrong or is this normal?
Here is my code:
from collections import deque
import io, sys
WATCHED_PIPE = '/var/log/haproxy.pipe'
if __name__ == '__main__':
    try:
        log_pool = deque([],10000)
        fd = io.open(WATCHED_PIPE)
        for line in fd:
            log_pool.append(line)
    except KeyboardInterrupt:
        sys.exit()
Deque appends are O(1) so that's not it. And I am using 2.6's io
module because it's supposed to handle named pipes better. I have
commented the deque appending line and it still takes about the same
CPU.

Be aware that the io module in Python 2.6 is written in Python and was
viewed as a prototype.  In the current svn trunk, what will be Python
2.7 has a much faster C implementation of the io module backported from
Python 3.1.

Aha, I will test with trunk and see if the performance is better, if
so I'll use 2.6 in production until 2.7 comes out. I will report back
when I have made the tests.

Thanks,
Miguel Pilar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top