Regexp: how to match string that do not contain a word

S

sfeher

Hi All,

I have a question regarding Regexp. The string that I need to change
is:

href="http://www.mysite.com/test1.html" ... href="/test2.html" ...

and this is what I would like to get after the replaceAll:

href="http://www.mysite.com/test1.html" ...
href="http://www.mysite.com/test2.html" ...

In other words, match all occurences of href=" that are not followed by
the http:// sequence.

I did look in the docs but could not figure out how to exclude a
string. Any ideas?

Regards,
Sebastian
 
O

Oliver Wong

Hi All,

I have a question regarding Regexp. The string that I need to change
is:

href="http://www.mysite.com/test1.html" ... href="/test2.html" ...

and this is what I would like to get after the replaceAll:

href="http://www.mysite.com/test1.html" ...
href="http://www.mysite.com/test2.html" ...

In other words, match all occurences of href=" that are not followed by
the http:// sequence.

I did look in the docs but could not figure out how to exclude a
string. Any ideas?

Your example doesn't match your specification. If you were to match all
occurences of href=" that are not followed by the http:// sequence, with the
input:

<input>
href="http://www.mysite.com/test1.html" ... href="/test2.html" ...
</input>

you'd get one match:

<output>
<match>href="</match>
</output>

you also mention a "replaceAll" but you don't say what you're replacing, and
with what.

Perhaps it'd help if you specified the goal, and not the method.

Are you trying to change all relative URLs in an HTML document to absolute
URLs?

- Oliver
 
J

John Maline

In other words, match all occurences of href=" that are not followed by
the http:// sequence.

A pattern like "href=\"(?!http://).*" would exclude the string "http://"
after the "href=\"" part. Depending on how everything's configured,
you've got to be sure to actually match the stuff you've just excluded
(as I do with the ".*").

The java.util.regex.Pattern doc on writing a pattern can be tough to
read. Maybe unavoidable, regular expressions can be tough. The (?!X)
construct is mentioned as a "zero-width negative lookahead" under
Special constructs. By zero-width, they mean it doesn't actually
consume any characters. It just asserts that at the current point in
the match, we must not be looking at X.

Cheers!
John
 
B

Ben

Oliver said:
Your example doesn't match your specification. If you were to match
all occurences of href=" that are not followed by the http:// sequence,
with the input:

<input>
href="http://www.mysite.com/test1.html" ... href="/test2.html" ...
</input>

you'd get one match:

<output>
<match>href="</match>
</output>

you also mention a "replaceAll" but you don't say what you're replacing,
and with what.

Perhaps it'd help if you specified the goal, and not the method.

Are you trying to change all relative URLs in an HTML document to
absolute URLs?

- Oliver

In case you're trying to replace relative URL with absolute, look at the
URL class, one of its constructor does just that:

Something like: URL absolute = new URL( URL referenceURL, String relative)

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top