Question on regex

P

Prabhu Gurumurthy

Hello all -

I have a file which has IP address and subnet number and I use regex to extract
the IP separately from subnet.

pattern used for IP: \d{1,3}(\.\d{1,3}){3}
pattern used for subnet:((\d{1,3})|(\d{1,3}(\.\d{1,3}){1,3}))/(\d{1,2})

so I have list of ip/subnets strewn around like this

10.200.0.34
10.200.4.5
10.178.9.45
10.200/22
10.178/16
10.100.4.64/26,
10.150.100.0/28
10/8

with that above examples:
ip regex pattern works for all IP address
subnet regex pattern works for all subnets

problem now is ip pattern also matches the last 2 subnet numbers, because it
falls under ip regex.

to fix this problem, i used negative lookahead with ip pattern:
so the ip pattern now changes to:
\d{1,3}(\.\d{1,3}){3}(?!/\d+)

now the problem is 10.150.100.0 works fine, 10.100.4.64 subnet gets matched
with ip pattern with the following result:

10.100.4.6

Is there a workaround for this or what should change in ip regex pattern.

python script:
#!/usr/bin/env python

import re, sys

fh = 0
try:
fh = open(sys.argv[1], "r")
except IOError, message:
print "cannot open file: %s" %message
else:

for lines in fh.readlines():
lines = lines.strip()

pattIp = re.compile("(\d{1,3}(\.\d{1,3}){3})(?!/\d+)")
pattNet = re.compile("((\d{1,3})|(\d{1,3}(\.\d{1,3}){1,3}))/(\d{1,2})")

match = pattIp.search(lines)
if match is not None:
print "ipmatch: %s" %match.groups()[0]

match = pattNet.search(lines)
if match is not None:
print "subnet: %s" %match.groups()[0]

fh.close()

output with that above ip/subnet in a file

ipmatch: 10.200.0.34
ipmatch: 10.200.4.5
ipmatch: 10.178.9.45
subnet: 10.200
subnet: 10.178
ipmatch: 10.100.4.6
subnet: 10.100.4.64
subnet: 10.150.100.0
subnet: 10

TIA
Prabhu
 
F

Felix Benner

Prabhu said:
to fix this problem, i used negative lookahead with ip pattern:
so the ip pattern now changes to:
\d{1,3}(\.\d{1,3}){3}(?!/\d+)

now the problem is 10.150.100.0 works fine, 10.100.4.64 subnet gets
matched with ip pattern with the following result:

10.100.4.6

Is there a workaround for this or what should change in ip regex pattern.

I think what you want is that neither /d+ nor another digit nor a . follows:
\d{1,3}(\.\d{1,3}){3}(?!(/\d)|\d|\.)
This way 10.0.0.1234 won't be recognized as ip. Neither will 23.12.
which could be a problem if an ip is at the end of a sentence, so you
might want to omit that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top