Regexp: how to match string that do not contain a word

Discussion in 'Java' started by sfeher, Jul 18, 2006.

  1. sfeher

    sfeher Guest

    Hi All,

    I have a question regarding Regexp. The string that I need to change
    is:

    href="http://www.mysite.com/test1.html" ... href="/test2.html" ...

    and this is what I would like to get after the replaceAll:

    href="http://www.mysite.com/test1.html" ...
    href="http://www.mysite.com/test2.html" ...

    In other words, match all occurences of href=" that are not followed by
    the http:// sequence.

    I did look in the docs but could not figure out how to exclude a
    string. Any ideas?

    Regards,
    Sebastian
     
    sfeher, Jul 18, 2006
    #1
    1. Advertisements

  2. sfeher

    Oliver Wong Guest

    Your example doesn't match your specification. If you were to match all
    occurences of href=" that are not followed by the http:// sequence, with the
    input:

    <input>
    href="http://www.mysite.com/test1.html" ... href="/test2.html" ...
    </input>

    you'd get one match:

    <output>
    <match>href="</match>
    </output>

    you also mention a "replaceAll" but you don't say what you're replacing, and
    with what.

    Perhaps it'd help if you specified the goal, and not the method.

    Are you trying to change all relative URLs in an HTML document to absolute
    URLs?

    - Oliver
     
    Oliver Wong, Jul 18, 2006
    #2
    1. Advertisements

  3. sfeher

    John Maline Guest

    A pattern like "href=\"(?!http://).*" would exclude the string "http://"
    after the "href=\"" part. Depending on how everything's configured,
    you've got to be sure to actually match the stuff you've just excluded
    (as I do with the ".*").

    The java.util.regex.Pattern doc on writing a pattern can be tough to
    read. Maybe unavoidable, regular expressions can be tough. The (?!X)
    construct is mentioned as a "zero-width negative lookahead" under
    Special constructs. By zero-width, they mean it doesn't actually
    consume any characters. It just asserts that at the current point in
    the match, we must not be looking at X.

    Cheers!
    John
     
    John Maline, Jul 18, 2006
    #3
  4. sfeher

    Ben Guest

    In case you're trying to replace relative URL with absolute, look at the
    URL class, one of its constructor does just that:

    Something like: URL absolute = new URL( URL referenceURL, String relative)

    Ben
     
    Ben, Jul 21, 2006
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.