Replace characters on html using regex????

Discussion in 'Java' started by AjalaDeveloper, Feb 24, 2006.

  1. Hello people, I'm trying to do something very easy (I think so!). I'm
    using Java regular expression to remove-> // in an html code, for
    example in the next example, I would like to replace // with /, but
    only those wich aren't next to http:
    ---------------------------------------------------------------------------­---------

    <img src="http://example.com//img/img.jpg>
    ---------------------------------------------------------------------------­---------

    I have problems in this example, just because I don't want to remove //

    which are next to http:

    I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
    wih the next java code:
    ---------------------------------------------------------------------------­---------

    completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
    ---------------------------------------------------------------------------­---------

    I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
    I don't want to lose a / in http://


    Please help me! Tk you
    AjalaDeveloper, Feb 24, 2006
    #1
    1. Advertising

  2. AjalaDeveloper

    Guest

    > I would like to replace // with /, but only those wich aren't
    > next to http:


    > <img src="http://example.com//img/img.jpg>


    > completeHtml = completeHtml.replaceAll("(?!http://)//*","/");


    nobody proposed a "good" way to do it with regexp (if such a
    thing exists) so I propose a quick and dirty solution:

    completeHtml = completeHtml.replaceAll("http://",
    "http:///").replaceAll("//","/");

    Note that I'm not for (nor against) using such code: I'm just
    proposing a solution.

    Hope it helps :)
    , Feb 25, 2006
    #2
    1. Advertising

  3. AjalaDeveloper

    Alan Moore Guest

    On 24 Feb 2006 15:21:09 -0800, "AjalaDeveloper"
    <> wrote:

    >Hello people, I'm trying to do something very easy (I think so!). I'm
    >using Java regular expression to remove-> // in an html code, for
    >example in the next example, I would like to replace // with /, but
    >only those wich aren't next to http:
    >---------------------------------------------------------------------------­---------
    >
    ><img src="http://example.com//img/img.jpg>
    >---------------------------------------------------------------------------­---------
    >
    >I have problems in this example, just because I don't want to remove //
    >
    >which are next to http:
    >
    >I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
    >wih the next java code:
    >---------------------------------------------------------------------------­---------
    >
    >completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
    >---------------------------------------------------------------------------­---------
    >
    >I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
    >I don't want to lose a / in http://
    >
    >
    >Please help me! Tk you


    What you want is a lookbehind, not a lookahead:

    completeHtml = completeHtml.replaceAll("(?<!http:)//","/");
    Alan Moore, Feb 26, 2006
    #3
  4. AjalaDeveloper

    Rob Skedgell Guest

    AjalaDeveloper wrote:

    > Hello people, I'm trying to do something very easy (I think so!). I'm
    > using Java regular expression to remove-> // in an html code, for
    > example in the next example, I would like to replace // with /, but
    > only those wich aren't next to http:
    >

    ---------------------------------------------------------------------------­---------
    >
    > <img src="http://example.com//img/img.jpg>
    >

    ---------------------------------------------------------------------------­---------
    >
    > I have problems in this example, just because I don't want to remove
    > //
    >
    > which are next to http:
    >
    > I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
    > wih the next java code:
    >

    ---------------------------------------------------------------------------­---------
    >
    > completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
    >

    ---------------------------------------------------------------------------­---------
    >
    > I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
    > I don't want to lose a / in http://


    You should also note that the SGML DOCTYPE declaration may also contain
    double slashes which you want to preserve, something which may look
    like this:

    <!DOCTYPE HTML PUBLIC
    "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/REC-html401-19991224/loose.dtd">

    or

    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

    You might find a quick and dirty solution to not changing the //s here
    might be to skip the first few (say 3-5) lines, or everything between
    the "<!DOCTYPE" and its closing ">". Of course, if the HTML documents
    concerned don't have a DOCTYPE declaration, there's no need to worry
    about this.

    --
    Rob Skedgell <>
    GnuPG/PGP: 7DA3 1579 C0DD 8748 C05A B984 E2A2 3234 D14B 6DD7
    Rob Skedgell, Feb 26, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Claudio Biagioli
    Replies:
    1
    Views:
    1,029
    =?Utf-8?B?SmVyZW15?=
    Feb 6, 2004
  2. Greg  --
    Replies:
    4
    Views:
    2,160
  3. Alun
    Replies:
    3
    Views:
    4,511
    Masudur
    Feb 18, 2008
  4. Replies:
    3
    Views:
    760
    Reedick, Andrew
    Jul 1, 2008
  5. Bouba654

    [RegEx] Replace characters

    Bouba654, Nov 12, 2003, in forum: Perl Misc
    Replies:
    3
    Views:
    146
    Jürgen Exner
    Nov 12, 2003
Loading...

Share This Page