Question regarding lists and regex

  • Thread starter Prabhu Gurumurthy
  • Start date
P

Prabhu Gurumurthy

Here is a simple program, which queries /var/log/daemon on my OpenBSD box and
gets the list of valid ntp peers.

Questions:
what is the easiest way for me to create lists on the fly, by that I mean like perl

push my @foo, something_from_say_stderr. The reason is as you can ip = [""]
statement before the for loop, I want to avoid that and use list within the
second ip loop, where I extract the ip address. Am I confusing?

regex: I presume this is rather a dumb question, anyways here it comes! as you
can see from my program, pattIp = r\d{1,3}\.... etc, is there any other easy way
to group the reptitions, instead of typing the same regex 4 times.

TIA
Prabhu
-

amazon: [~/working/programs/python/regex]
ttyp4: [109]$ cat syslog.py
#!/usr/bin/env python
# $Id: syslog.py,v 1.6 2006/11/09 06:24:03 pgurumur Exp $

import getopt, re, os, string, sys, time
(dirname, program) = os.path.split(sys.argv[0])
argc = len(sys.argv)

def usage():
print program + ": options"
print "options: "
print " --filename | -f [ name of the file ]"
print " --help | -h [ prints this help ]"
sys.exit(1)

if __name__ == "__main__":
if (argc <= 1):
usage()
else:
try:
opts, args = getopt.getopt(sys.argv[1:], "f:h", ["help", "filename="])
except getopt.GetoptError:
usage()
else:
filename = ""
for optind, optarg in opts:
if optind in ("-f", "--filename"):
filename = optarg
elif optind in ("-h", "--help"):
usage()

if len(filename):
fh = 0
try:
fh = open(filename, "r")
except IOError, (error, message):
print program + ": cannot open " + filename + ": " + message
sys.exit(1)

pattNtp = r'.*ntpd(?=.*now\s+valid)'
count = 0
ip = [""]
pid = 0
for line in fh.readlines():
if re.match(pattNtp, line.strip(), re.IGNORECASE):
string = line.strip()
pattPid = r'\[\d{1,5}\]'
pidMatch = re.search(pattPid, string, re.IGNORECASE)
if pidMatch is not None:
pid = int(re.sub(r'\[|\]', "", pidMatch.group()))

pattIp = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
match = re.search(pattIp, string, re.IGNORECASE)
if match is not None:
ip.append(match.group())
count += 1

print "NTP program started with pid:", pid
print "Number of valid peers:", count
for x in ip:
if len(x):
print x

fh.close()
 
P

Paul McGuire

Prabhu Gurumurthy said:
Here is a simple program, which queries /var/log/daemon on my OpenBSD box
and gets the list of valid ntp peers.

Questions:
what is the easiest way for me to create lists on the fly, by that I mean
like perl

push my @foo, something_from_say_stderr. The reason is as you can ip =
[""] statement before the for loop, I want to avoid that and use list
within the second ip loop, where I extract the ip address. Am I confusing?

Typically, one initializes a list to be empty, that is [], not [""]. Python
will not read your mind at append time and think "oh! we're appending to a
list and we forgot to create one in the first place, let's make one now." I
guess Perl allows this, but the clarity of including the initialization
statement overrules the convenience of leaving it out.
regex: I presume this is rather a dumb question, anyways here it comes! as
you can see from my program, pattIp = r\d{1,3}\.... etc, is there any
other easy way to group the reptitions, instead of typing the same regex 4
times.

Here's one way, tested at the Python command line:
print r'\.'.join( [r'\d{1,3}']*4 )
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

This avoids the pattern duplication, but I think using join is much less
easily recognized as a pattern for an IP address.
TIA
Prabhu

Some other comments/free advice:
1. I was curious about this line:
pid = int(re.sub(r'\[|\]', "", pidMatch.group()))
You already know pidMatch.group() is going to start with a '[', followed by
an integer string, and end with a ']', otherwise it wouldn't have matched
pidPatt. Instead of whacking this with another re-type call, how about just
some simple string slicing:
pid = pidMatch.group()[1:-1]

2. No real need to keep count of the found ip's, just use len(ip) to tell
you how many entries there are in the list (especially once you convert to
intializing with an empty list).

3. Similarly, you'll be able to remove the 'if len(x)' test when printing
out the contents of the ip list if you init with [] instead of [""]. Also,
the Python idiom for testing if x is the empty string is usually just 'if
x', not 'if len(x)'.

-- Paul
 
A

Ant

regex: I presume this is rather a dumb question, anyways here it comes! as you
can see from my program, pattIp = r\d{1,3}\.... etc, is there any other easy way
to group the reptitions, instead of typing the same regex 4 times. ....
pattIp = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'

pattIp = r"\d{1,3}(\.\d{1,3}){3}"

Is the best you can get using pure regexes (rather than something like
Paul's solution).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,054
Latest member
LucyCarper

Latest Threads

Top