Grep Equivalent for Python


T

tereglow

Hello all,

I come from a shell/perl background and have just to learn python. To
start with, I'm trying to obtain system information from a Linux
server using the /proc FS. For example, in order to obtain the amount
of physical memory on the server, I would do the following in shell:

grep ^MemTotal /proc/meminfo | awk '{print $2}'

That would get me the exact number that I need. Now, I'm trying to do
this in python. Here is where I have gotten so far:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
print re.search('MemTotal', line)
memFile.close()

I guess what I'm trying to logically do is... read through the file
line by line, grab the pattern I want and assign that to a variable.
The above doesn't really work, it comes back with something like
"<_sre.SRE_Match object at 0xb7f9d6b0>" when a match is found.

Any help with this would be greatly appreciated.
Tom
 
Ad

Advertisements

S

Steve Holden

tereglow said:
Hello all,

I come from a shell/perl background and have just to learn python. To
start with, I'm trying to obtain system information from a Linux
server using the /proc FS. For example, in order to obtain the amount
of physical memory on the server, I would do the following in shell:

grep ^MemTotal /proc/meminfo | awk '{print $2}'

That would get me the exact number that I need. Now, I'm trying to do
this in python. Here is where I have gotten so far:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
print re.search('MemTotal', line)
memFile.close()

I guess what I'm trying to logically do is... read through the file
line by line, grab the pattern I want and assign that to a variable.
The above doesn't really work, it comes back with something like
"<_sre.SRE_Match object at 0xb7f9d6b0>" when a match is found.

Any help with this would be greatly appreciated.
Tom
Regular expressions aren't really needed here. Untested code follows:

for line in open('/proc/meminfo').readlines:
if line.startswith("Memtotal:"):
name, amt, unit = line.split()
print name, amt, unit
break

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007
 
P

Paul Boddie

Hello all,

I come from a shell/perl background and have just to learn python.

Welcome aboard!
To start with, I'm trying to obtain system information from a Linux
server using the /proc FS. For example, in order to obtain the amount
of physical memory on the server, I would do the following in shell:

grep ^MemTotal /proc/meminfo | awk '{print $2}'

That would get me the exact number that I need. Now, I'm trying to do
this in python. Here is where I have gotten so far:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
print re.search('MemTotal', line)

You could even use the regular expression '^MemTotal' as seen in your
original, or use the match function instead of search. However...
memFile.close()

I guess what I'm trying to logically do is... read through the file
line by line, grab the pattern I want and assign that to a variable.
The above doesn't really work, it comes back with something like
"<_sre.SRE_Match object at 0xb7f9d6b0>" when a match is found.

This is because re.search and re.match (and other things) return match
objects if the regular expression has been found in the provided
string. See this page in the library documentation:

http://docs.python.org/lib/match-objects.html
Any help with this would be greatly appreciated.

The easiest modification to your code is to replace the print
statement with this:

match = re.search('MemTotal', line)
if match is not None:
print match.group()

You can simplify this using various idioms, I imagine, but what you
have to do is to test for a match, then to print the text that
matched. The "group" method lets you get the whole matching text (if
you don't provide any arguments), or individual groups (applicable
when you start putting groups in your regular expressions).

Paul
 
L

Laurent Pointal

Steve Holden a écrit :
Regular expressions aren't really needed here. Untested code follows:

for line in open('/proc/meminfo').readlines:
for line in open('/proc/meminfo').readlines():
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

I come from a shell/perl background and have just to learn python. To
start with, I'm trying to obtain system information from a Linux
server using the /proc FS. For example, in order to obtain the amount
of physical memory on the server, I would do the following in shell:

grep ^MemTotal /proc/meminfo | awk '{print $2}'

That would get me the exact number that I need. Now, I'm trying to do
this in python. Here is where I have gotten so far:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
print re.search('MemTotal', line)
memFile.close()

import sys
sys.stdout.write(L.split()[1] + '\n' for L in open('/proc/meminfo') if
L.startswith('MemTotal'))
 
M

Marc 'BlackJack' Rintsch

Steve Holden a écrit :
for line in open('/proc/meminfo').readlines():
for line in open('/proc/meminfo'):

Of course it's cleaner to assign the file object to a name and close the
file explicitly after the loop.

Ciao,
Marc 'BlackJack' Rintsch
 
Ad

Advertisements

T

tereglow

for line in open('/proc/meminfo'):


Of course it's cleaner to assign the file object to a name and close the
file explicitly after the loop.

Ciao,
Marc 'BlackJack' Rintsch

Thanks all for the help with this, am learning a lot; really
appreciate it.
Tom
 
T

tereglow

Okay,

It is now working as follows:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
if line.startswith("MemTotal"):
memStr = line.split()
memTotal = memStr[1]
memFile.close()
print "Memory: " + memTotal + "kB"

I'm learning the whole try, finally exception stuff so will add that
in as well. Now, I'm trying to figure out the CPU speed. In shell,
I'd do:

grep "^cpu MHz" /proc/cpuinfo | awk '{print $4}' | head -1

The "head -1" is added because if the server has 2 or more processors,
2 or more lines will result, and I only need data from the first
line. So, now I'm looking for the equivalent to "head (or tail" in
Python. Is this a case where I'll need to break down and use the re
module? No need to give me the answer, a hint in the right direction
would be great though.

Thanks again,
Tom
 
S

Steve Holden

tereglow said:
Okay,

It is now working as follows:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
if line.startswith("MemTotal"):
memStr = line.split()
memTotal = memStr[1]
memFile.close()
print "Memory: " + memTotal + "kB"

I'm learning the whole try, finally exception stuff so will add that
in as well. Now, I'm trying to figure out the CPU speed. In shell,
I'd do:

grep "^cpu MHz" /proc/cpuinfo | awk '{print $4}' | head -1

The "head -1" is added because if the server has 2 or more processors,
2 or more lines will result, and I only need data from the first
line. So, now I'm looking for the equivalent to "head (or tail" in
Python. Is this a case where I'll need to break down and use the re
module? No need to give me the answer, a hint in the right direction
would be great though.

Thanks again,
Tom
If you are interested in a number of fields I'd create a dict or set
containing the keys you are interested in. For each line, if the text
indicates you are interested in the value then extract the value and
store it in a dict against the text as a key.

Something like (untested):

kwdlist = "cpu MHz|MemTotal"
d = dict((x, None) for x in kwdlist.split("|"))
memFile = open('/proc/meminfo')
for line in memFile.readlines():
keyword = line.split(":")[0]
if keyword in d and d[keyword] is None:
d[keyword] = line.split()[1]
memFile.close()

This should give you a dict with non-None values against the keywords
you have found. Because of the "and d[keyword] is None" test you won;t
overwrite an existing value, meaning you only see the first value for
any given keyword.

Again, bear in mind this code is untested.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007
 
F

Fabio FZero

Hello all,

I come from a shell/perl background and have just to learn python. To
start with, I'm trying to obtain system information from a Linux
server using the /proc FS. For example, in order to obtain the amount
of physical memory on the server, I would do the following in shell:

grep ^MemTotal /proc/meminfo | awk '{print $2}'

That would get me the exact number that I need. Now, I'm trying to do
this in python. Here is where I have gotten so far:

memFile = open('/proc/meminfo')
for line in memFile.readlines():
print re.search('MemTotal', line)
memFile.close()

I guess what I'm trying to logically do is... read through the file
line by line, grab the pattern I want and assign that to a variable.
The above doesn't really work, it comes back with something like
"<_sre.SRE_Match object at 0xb7f9d6b0>" when a match is found.

Ok, two things:

1. You don't need the .readlines() part. Python can iterate over the
file object itself.
2. You don't need the re module for this particular situation; you
could simply use the 'in' operator.

You could write it like this:

memFile = open('/proc/meminfo')
for line in memFile:
if 'MemTotal' in line: print line
memFile.close()

[]s
FZero
 
S

sjdevnull

for line in open('/proc/meminfo'):

Yeah, that's nicer.
Of course it's cleaner to assign the file object to a name and close the
file explicitly after the loop.

For certain definitions of "cleaner" (insert old argument about how
ref-counting semantics or at least immediate gc of locally scoped
variables when leaving scope _should be_ (not _are_) language-
guaranteed because it makes for cleaner, more programmer-friendly code
and often avoids ugly hacks like assigning a spurious name and/or
using "with" constructs).

But if you're going to do that, "with" is the better option IMO:

from __future__ import with_statement
....
with open('/proc/meminfo') as infile:
for line in infile:

Of course, that alternative requires Python 2.5
 
Ad

Advertisements

A

Alex Martelli

tereglow said:
server using the /proc FS. For example, in order to obtain the amount
of physical memory on the server, I would do the following in shell:

grep ^MemTotal /proc/meminfo | awk '{print $2}'

If you would indeed do that, maybe it's also worth learning something
more about the capabilities of your "existing" tools, since

awk '/^MemTotal/ {print $2}' /proc/meminfo

is a more compact and faster way to perform exactly the same task.

(You already received a ton of good responses about doing this in
Python, but the "pipe grep into awk instead of USING awk properly in the
first place!" issue has been a pet peeve of mine for almost 30 years
now, and you know what they say about old dogs + new tricks!-).


Alex
 
S

sjdevnull

Alex said:
If you would indeed do that, maybe it's also worth learning something
more about the capabilities of your "existing" tools, since

awk '/^MemTotal/ {print $2}' /proc/meminfo

is a more compact and faster way to perform exactly the same task.


That's correct on small files, but note that at least on my platform
(Linux, GNU grep and awk), "grep" is just massively insanely faster
than awk (to the point where doing the equivalent on /usr/share/dict/
words takes 10-20% of the time with "grep re|awk..." as it does with
"awk '/re...'"

On this small proc file, the plain-awk version is faster (and it's
cleaner looking), as Alex points out.

But for large files, "grep" is often _way_ faster than awk/perl/python/
whatever alternative, easily swamping the fork/exec cost. It's often
quite handy to keep that in mind.
 
T

tereglow

...



If you would indeed do that, maybe it's also worth learning something
more about the capabilities of your "existing" tools, since

awk '/^MemTotal/ {print $2}' /proc/meminfo

is a more compact and faster way to perform exactly the same task.

(You already received a ton of good responses about doing this in
Python, but the "pipegrepinto awk instead of USING awk properly in the
first place!" issue has been a pet peeve of mine for almost 30 years
now, and you know what they say about old dogs + new tricks!-).

Alex

I had no idea you could do that. Thanks for the tip, I need to start
reading that awk/sed book collecting dust on my shelf!
 
A

Aahz

I had no idea you could do that. Thanks for the tip, I need to start
reading that awk/sed book collecting dust on my shelf!

Your other option is to completely abandon awk/sed. I started writing
stuff like this in Turbo Pascal back in the early 80s because there
simply wasn't anything like awk/sed available for CP/M. In the 90s, when
I needed to do similar things, I used Perl. Now I use Python.

From my POV, there is really no reason to learn the advanced shell
utilities.
 
Ad

Advertisements

T

tereglow

Your other option is to completely abandon awk/sed. I started writing
stuff like this in Turbo Pascal back in the early 80s because there
simply wasn't anything like awk/sed available for CP/M. In the 90s, when
I needed to do similar things, I used Perl. Now I use Python.

From my POV, there is really no reason to learn the advanced shell
utilities.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith- Hide quoted text -

- Show quoted text -

Well, my goal is to become proficient enough at Python, that I can
replace most shell functionality with it. I was partially successful
when learning Perl. The trouble is that I started with shell, awk/sed/
grep, that sort of stuff. It is somewhat difficult to break free of
"shell" like thinking and programming when you have created habits of
coding in that style. I've recently started to work with an
application (HP Application Mapping) that has an awful lot of Jython
code running in the background; so that project sort of inspired me to
start really digging into Python; to gain a better understanding of
the application, and for any added benefit; especially in regards to
automating systems administration. Am really enjoying learning it,
though it hurts the head a bit, trying to re-train myself. Things
take time though, I'm not giving up!
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top