Replace characters on html using regex????

A

AjalaDeveloper

Hello people, I'm trying to do something very easy (I think so!). I'm
using Java regular expression to remove-> // in an html code, for
example in the next example, I would like to replace // with /, but
only those wich aren't next to http:
---------------------------------------------------------------------------­---------

<img src="http://example.com//img/img.jpg>
---------------------------------------------------------------------------­---------

I have problems in this example, just because I don't want to remove //

which are next to http:

I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
wih the next java code:
---------------------------------------------------------------------------­---------

completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
---------------------------------------------------------------------------­---------

I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
I don't want to lose a / in http://


Please help me! Tk you
 
L

lewmania942

I would like to replace // with /, but only those wich aren't
next to http:
completeHtml = completeHtml.replaceAll("(?!http://)//*","/");

nobody proposed a "good" way to do it with regexp (if such a
thing exists) so I propose a quick and dirty solution:

completeHtml = completeHtml.replaceAll("http://",
"http:///").replaceAll("//","/");

Note that I'm not for (nor against) using such code: I'm just
proposing a solution.

Hope it helps :)
 
A

Alan Moore

Hello people, I'm trying to do something very easy (I think so!). I'm
using Java regular expression to remove-> // in an html code, for
example in the next example, I would like to replace // with /, but
only those wich aren't next to http:
---------------------------------------------------------------------------­---------

<img src="http://example.com//img/img.jpg>
---------------------------------------------------------------------------­---------

I have problems in this example, just because I don't want to remove //

which are next to http:

I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
wih the next java code:
---------------------------------------------------------------------------­---------

completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
---------------------------------------------------------------------------­---------

I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
I don't want to lose a / in http://


Please help me! Tk you

What you want is a lookbehind, not a lookahead:

completeHtml = completeHtml.replaceAll("(?<!http:)//","/");
 
R

Rob Skedgell

AjalaDeveloper said:
Hello people, I'm trying to do something very easy (I think so!). I'm
using Java regular expression to remove-> // in an html code, for
example in the next example, I would like to replace // with /, but
only those wich aren't next to http:
---------------------------------------------------------------------------­---------

<img src="http://example.com//img/img.jpg>
---------------------------------------------------------------------------­---------

I have problems in this example, just because I don't want to remove
//

which are next to http:

I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
wih the next java code:
---------------------------------------------------------------------------­---------

completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
---------------------------------------------------------------------------­---------

I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
I don't want to lose a / in http://

You should also note that the SGML DOCTYPE declaration may also contain
double slashes which you want to preserve, something which may look
like this:

<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/REC-html401-19991224/loose.dtd">

or

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

You might find a quick and dirty solution to not changing the //s here
might be to skip the first few (say 3-5) lines, or everything between
the "<!DOCTYPE" and its closing ">". Of course, if the HTML documents
concerned don't have a DOCTYPE declaration, there's no need to worry
about this.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top