V
Vivek
Hi,
I am trying to construct a regular expression using the re module that
matches for
1. my hostname
2. absolute from the root URLs including just "/"
3. relative URLs.
Basically I want the attern to not match for URLs that are not on my
host.
The following statement satisfies numbers 1 and 2, but not 3:
line =
re.sub(r'(href=")(http?://'+hostname+'[/]?|/)([^"]*?)(")',r'\1\2\3'+sInfo+r'\4',line)
An improvement that also partially satisfies number 3 is
line =
re.sub(r'(href=")(http?://'+hostname+'[/]?|/|[^h][^t][^t][^p][^:][^/][^/])([^"]*?)(")',r'\1\2\3'+sInfo+r'\4',line)
This is not complete because if the relative url is less than seven
characters, than it will not match.
Any suggestions?
Thanx.
I am trying to construct a regular expression using the re module that
matches for
1. my hostname
2. absolute from the root URLs including just "/"
3. relative URLs.
Basically I want the attern to not match for URLs that are not on my
host.
The following statement satisfies numbers 1 and 2, but not 3:
line =
re.sub(r'(href=")(http
An improvement that also partially satisfies number 3 is
line =
re.sub(r'(href=")(http
This is not complete because if the relative url is less than seven
characters, than it will not match.
Any suggestions?
Thanx.