Scanning Strings to replace with links.

Q

questionmarc420

hi again, :D
i have large strings containing many paragraphs. The string is to be
displayed. I want to have all substrings that start with "html" and
"www." wraped in <a></a> tags.

here is the code i have:

data_body = data_body.replaceAll("((https?|ftp)://|mailto:)[^\\s<]+",
"<A TARGET=\"_new\" HREF=\"$0\">$0</a>");

data_body = data_body.replaceAll("(www.)[^\\s<]+", "<A
TARGET=\"_new\" HREF=\"http://$0\">$0</a>");

my problem is:
if a link is "http://www." the substring is replaced twice. meaning the
link would appear something like this:
< a HREF=http://<a HREFwww.>http://www.</a/a>>
or soemthing along those lines.
i tried putting the www. in the first expression but that does not work
because it only links to the locahost.

i also tried adding a space before the "www." so it would be like "
www."
this almost worked however there came to be spaces in the link so it
would not work in the browser.

if anyone understands and can help, please do.
if you are unclear on anything praticualr please tell me to explain.
thanks
-morc

oh and also sometimes if a link is placed in parentheses in the text
then it tends to add them to the link. if anyone knows of a way to
exclude the parentheses please share. :D thanks
 
J

javabuddha

Good point, I didn't consider that when I posted the HTMLEncode
function. Anyway
the topic is updated now. This is the part that is relevant to your
questions, you
are on the right track, just need to switch the order:

str = str.replaceAll("([\\s\\(])www\\.", "$1http://www.");
str = str.replaceAll("((https?|ftp)://|mailto:)[^\\s<\\(\\)]+", "<A
HREF=\"$0\">$0</a>");

Good luck,

Matt
 
R

Roedy Green

data_body = data_body.replaceAll("((https?|ftp)://|mailto:)[^\\s<]+",
"<A TARGET=\"_new\" HREF=\"$0\">$0</a>");

data_body = data_body.replaceAll("(www.)[^\\s<]+", "<A
TARGET=\"_new\" HREF=\"http://$0\">$0</a>");

The simplest way would be to convert naked www. to http://www. first
then apply your http: -> <a href transform. You want to search for
"not // followed by www. " You seem to have a good grasp of regex
already so I will leave you to compose the string. If you have
trouble, see http://mindprod.com/jgloss/regex.html
 
R

Roedy Green

data_body = data_body.replaceAll("((https?|ftp)://|mailto:)[^\\s<]+",
"<A TARGET=\"_new\" HREF=\"$0\">$0</a>");

data_body = data_body.replaceAll("(www.)[^\\s<]+", "<A
TARGET=\"_new\" HREF=\"http://$0\">$0</a>");

the other way to do it, which you might find easier is to scan for
strings with indexof, and compose your results in a StringBuilder as
you go.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top