NEWBIE: Tokenize command output

L

Lorenzo Thurman

This is what I have so far:

//
#!/usr/bin/python

import os

cmd = 'ntpq -p'

output = os.popen(cmd).read()
//

The output is saved in the variable 'output'. What I need to do next is
select the line from that output that starts with the '*'

remote refid st t when poll reach delay offset
jitter
=========================================================================
=====
+ntp-1.gw.uiuc.e 128.174.38.133 2 u 479 1024 377 33.835 -0.478
3.654
+milo.mcs.anl.go 130.126.24.44 3 u 676 1024 377 70.143 1.893
1.296
*caesar.cs.wisc. 128.105.201.11 2 u 635 1024 377 29.514 -0.231
0.077


From there, I need to tokenize the line using the spaces as delimiters.
Can someone give me some pointers?
Thanks
 
T

Tim Chase

Lorenzo said:
This is what I have so far:

//
#!/usr/bin/python

import os

cmd = 'ntpq -p'

output = os.popen(cmd).read()
//

The output is saved in the variable 'output'. What I need to do next is
select the line from that output that starts with the '*'

Well, if you don't need "output" for anything else, you can
just iterate over the lines with

import os
cmd = 'ntpq -p'
p = os.popen(cmd)
starLines = [line for line in p.readlines() if
line.startswith("*")]

or you may optionally want to prune of the "\n" characters
in the process:

starLines = [line[:-1] for line in p.readlines() if
line.startswith("*")]

If there's only ever one, then you can just use

myLine = starLines[0]

Otherwise, you'll have to react accordingly if there are
zero lines or more than one line that begin(s) with an asterisk.
remote refid st t when poll reach delay offset
jitter [cut]

*caesar.cs.wisc. 128.105.201.11 2 u 635 1024 377 29.514 -0.231
0.077
From there, I need to tokenize the line using the spaces as delimiters.
Can someone give me some pointers?

Again, you can use the one-line pythonizm of "tuple
unpacking" here (the split() method takes an optional
parameter for the delimiter, but defaults to whitespace):

remote, refid, st, t, when, poll, reach, delay, offset,
jitter = myLine.split()

If you just want one of them, you can do it by offset:

delay = myLine.split()[7]

HTH,

-tkc
 
S

Steven Bethard

Lorenzo said:
This is what I have so far:

//
#!/usr/bin/python

import os

cmd = 'ntpq -p'

output = os.popen(cmd).read()
//

The output is saved in the variable 'output'. What I need to do next is
select the line from that output that starts with the '*' [snip]
From there, I need to tokenize the line using the spaces as delimiters.

I use the subprocess module instead for better error reporting, but
basically if you just iterate over the file object, you'll get the lines.

import subprocess

# open the process
cmd = subprocess.Popen(['ntpq', '-p'], stdout=subprocess.PIPE)

# iterate over the file object until we see a line-initial "*"
line = None
for line in cmd.stdout:
if line.startswith('*'):
break

# make sure to catch the error if ntpq didn't produce any output
assert line is not None

# split the line into tokens (stolen from Tim Chase's answer
(remote, refid, st, t, when,
poll, reach, delay, offset, jitter) = line.split()

If you need to do this for each line that starts with a "*" instead of
just the first, move the line.split() code inside the for-loop.

STeVe
 
B

bruno at modulix

Tim Chase wrote:
(snip)
starLines = [line for line in p.readlines() if line.startswith("*")]

files are iterators, so no need to use readlines() (unless it's an old
Python version of course):

starLines = [line for line in p if line.startswith("*")]
or you may optionally want to prune of the "\n" characters in the process:

starLines = [line[:-1] for line in p.readlines() if line.startswith("*")]

*please* use str.rstrip() for this:
starLines = [line.rstrip() for line in p if line.startswith("*")]
 
T

Tim Chase

starLines = [line for line in p.readlines() if line.startswith("*")]
files are iterators, so no need to use readlines() (unless it's an old
Python version of course):

starLines = [line for line in p if line.startswith("*")]

Having started with some old Python, it's one of those
things I "know", but my coding fingers haven't yet put into
regular practice. :)
or you may optionally want to prune of the "\n" characters in the process:

starLines = [line[:-1] for line in p.readlines() if line.startswith("*")]

*please* use str.rstrip() for this:
starLines = [line.rstrip() for line in p if line.startswith("*")]

They can yield different things, no?
>>> s = "abc \n"
>>> s[:-1] 'abc '
>>> s.rstrip() 'abc'
>>> s.[:-1] == s.rstrip()
False

If trailing space matters, you don't want to throw it away
with rstrip().

Otherwise, just to be informed, what advantage does rstrip()
have over [:-1] (if the two cases are considered
uneventfully the same)?

Thanks,

-tkc
 
B

bruno at modulix

Tim said:
starLines = [line for line in p.readlines() if line.startswith("*")]


files are iterators, so no need to use readlines() (unless it's an old
Python version of course):

starLines = [line for line in p if line.startswith("*")]


Having started with some old Python, it's one of those things I "know",
but my coding fingers haven't yet put into regular practice. :)

I reeducated my fingers after having troubles with huge files !-)
or you may optionally want to prune of the "\n" characters in the
process:

starLines = [line[:-1] for line in p.readlines() if
line.startswith("*")]


*please* use str.rstrip() for this:
starLines = [line.rstrip() for line in p if line.startswith("*")]


They can yield different things, no?
s = "abc \n"
s[:-1] 'abc '
s.rstrip() 'abc'
s.[:-1] == s.rstrip()
False

If trailing space matters, you don't want to throw it away with rstrip().

then use rstrip('\n') - thanks for this correction.
Otherwise, just to be informed, what advantage does rstrip() have over
[:-1] (if the two cases are considered uneventfully the same)?

1/ if your line doesn't end with a newline, line[:-1] will still remove
the last caracter.

2/ IIRC, if you don't use universal newline and the file uses the
DOS/Windows newline convention, line[:-1] will not remove the CR - only
the LF (please someone correct me if I'm wrong here).

I know this may not be a real issue in the actual case, but using
rstrip() is still a safer way to go IMHO - think about using this same
code to iterate over a list of strings without newlines...
 
D

Duncan Booth

bruno said:
Otherwise, just to be informed, what advantage does rstrip() have over
[:-1] (if the two cases are considered uneventfully the same)?

1/ if your line doesn't end with a newline, line[:-1] will still remove
the last caracter.
In particular, if the last line of the file doesn't end with a newline then
the last line you read won't have a newline to be stripped.
 
T

Tim Chase

I reeducated my fingers after having troubles with huge files !-)

I'll keep it in mind...the prospect of future trouble with
large files is a good kick-in-the-pants to remember.
Otherwise, just to be informed, what advantage does rstrip() have over
[:-1] (if the two cases are considered uneventfully the same)?

1/ if your line doesn't end with a newline, line[:-1] will still remove
the last caracter.

Good catch. Most *nix editors are smart about having a
trailing NL character at the end of the file, but some
Windows text-editors aren't so kind.
2/ IIRC, if you don't use universal newline and the file uses the
DOS/Windows newline convention, line[:-1] will not remove the CR - only
the LF (please someone correct me if I'm wrong here).

To get this behavior, I think you have to open the file in
binary mode. To me, opening as binary is a signal that I
should be using read() rather than readlines() (or
xreadlines, or the iterator, or whatever). If you've opened
in binary mode, you might have to use rstrip("\r\n") to get
both possible line-ending characters.
I know this may not be a real issue in the actual case, but using
rstrip() is still a safer way to go IMHO - think about using this same
code to iterate over a list of strings without newlines...

Makes sense. Using rstrip("\r\n") has all the benefits,
plus more gracefully handles cases where a newline might not
be present or comprised of two (or more) characters. Got
it! Thanks for the explanation.

-tkc
 
B

bruno at modulix

Duncan said:
bruno at modulix wrote:

Otherwise, just to be informed, what advantage does rstrip() have over
[:-1] (if the two cases are considered uneventfully the same)?

1/ if your line doesn't end with a newline, line[:-1] will still remove
the last caracter.

In particular, if the last line of the file doesn't end with a newline then
the last line you read won't have a newline to be stripped.

Thanks - I knew there was a corner case for files, but couldn't remember
it exactly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top