Regular expressions - marking up a URL

R

rico.fabrini

Hi Everyone,

I've got the following code:
======================================================================
private static final Pattern rgxUrlsInHTML = Pattern.compile("(http:\\/
\\/([\\w.]+\\/?)\\S*)");
public static String emailBodyToHtml(String body)
{
Matcher matcher = rgxUrlsInHTML.matcher(body);
int start = 0;
ArrayList<String> matches = new ArrayList<String>();
while(matcher.find())
{
matches.add(matcher.group());
}

for(String match : matches)
{
int index = body.indexOf(match);
if(index>-1)
body = body.replaceFirst(match, "<a href=\"" + match+ "\">"+match
+"</a>");
}
}
======================================================================

1) I might have missed it, but I couldn't find a method that would
just return a collection of all matches.
Accordingly, the API feels rather low-level. Probably sufficient,
but still low-level.

2) My actual query here:
body = body.replaceFirst(match, "<a href=\"" + match+ "\">"+match
+"</a>");

seems to work as I would wish only for the first member match of the
matches collection.
Even though the if block is executed, the 2nd URL isn't substituted.

Is there something glaringly obvious that's eluding me here?
Also, comments for improvement from people familiar with the Regular
Expressions API are welcome.
Thanks.

Rico.
 
J

Joshua Cranmer

Hi Everyone,

I've got the following code:
[ snip ]
Is there something glaringly obvious that's eluding me here?
Also, comments for improvement from people familiar with the Regular
Expressions API are welcome.
Thanks.

What you want to do is a regex-replace:
body.replaceAll( < regex >, "<a href=\"$0\">$0</a>");

The 0-th group is the entire matched string; the dollar-signs represent
matching per
<http://java.sun.com/javase/6/docs/a...ent(java.lang.StringBuffer, java.lang.String)>
 
R

rico.fabrini

Hi Everyone,
I've got the following code:
[ snip ]
Is there something glaringly obvious that's eluding me here?
Also, comments for improvement from people familiar with the Regular
Expressions API are welcome.
Thanks.

What you want to do is a regex-replace:
body.replaceAll( < regex >, "<a href=\"$0\">$0</a>");

The 0-th group is the entire matched string; the dollar-signs represent
matching per
<http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html#ap...)>

Thanks Joshua. That works and does what I was looking for.

I've got to admit that I've struggled somewhat to picture the idea of
"the entire matched string" though. So, I think of replaceAll() having
to scan the input sequence, and at every match $0 is a placeholder for
that particular match.

Without that scanning process in mind, I was baffled by the idea that
"entire matched string" meant some kind of concatenation of all the
matches.

Rico.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top