better regular expression?

Discussion in 'Python' started by Vivek, Dec 7, 2004.

  1. Vivek

    Vivek Guest

    Hi,

    I am trying to construct a regular expression using the re module that
    matches for
    1. my hostname
    2. absolute from the root URLs including just "/"
    3. relative URLs.

    Basically I want the attern to not match for URLs that are not on my
    host.

    The following statement satisfies numbers 1 and 2, but not 3:

    line =
    re.sub(r'(href=")(http?://'+hostname+'[/]?|/)([^"]*?)(")',r'\1\2\3'+sInfo+r'\4',line)

    An improvement that also partially satisfies number 3 is

    line =
    re.sub(r'(href=")(http?://'+hostname+'[/]?|/|[^h][^t][^t][^p][^:][^/][^/])([^"]*?)(")',r'\1\2\3'+sInfo+r'\4',line)

    This is not complete because if the relative url is less than seven
    characters, than it will not match.

    Any suggestions?

    Thanx.
    Vivek, Dec 7, 2004
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,268
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    829
    Alan Moore
    Dec 2, 2005
  3. GIMME
    Replies:
    3
    Views:
    11,920
    vforvikash
    Dec 29, 2008
  4. Vivek

    better regular expression?

    Vivek, Dec 7, 2004, in forum: Python
    Replies:
    2
    Views:
    316
    Roy Smith
    Dec 7, 2004
  5. Vivek

    better regular expression?

    Vivek, Dec 7, 2004, in forum: Python
    Replies:
    1
    Views:
    238
    Steve Holden
    Dec 7, 2004
Loading...

Share This Page