regex - extract <br> before span

S

skajotde

Hi all

I'd like extraxt <br> before span.

example:

<span style="text-decoration: underline;"><span style="text-decoration:
underline;">ff<br>ff</span></span>

to:

<span style="text-decoration: underline;"><span style="text-decoration:
underline;">ff</span></span><br><span style="text-decoration:
underline;"><span style="text-decoration: underline;">ff</span></span>


* br outside of span

Pattern spanBR =
Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Matcher matcherSpanBR = spanBR.matcher(html);

while (matcherSpanBR.matches()) {
html = matcherSpanBR.replaceAll("$1$2$5$1$3$1$4$5");
}


My question is how say "part of text without </span> between <span and
<newline/> and save this part text to register" (<newline/> is my br
aftter first conversion).

Cheers
Kamil
 
S

skajotde

Pattern spanBR =
Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Before My pattenr looks like:

"(<span[^>]*?>)(.*?)(<newline/>)(.*?)(</span>)"

But this pattern matches:


<span style="text-decoration: underline;"></span>
<br /><span style="text-decoration: underline;"><span
style="text-decoration: underline;">ff<br>ff</span></span><br />* Some
test: <span style="font-weight: bold;"> Some test</span>

i have to move up <br> recursive inside all span, any suggestions?
 
O

Oliver Wong

Pattern spanBR =
Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Before My pattenr looks like:

"(<span[^>]*?>)(.*?)(<newline/>)(.*?)(</span>)"

But this pattern matches:


<span style="text-decoration: underline;"></span>
<br /><span style="text-decoration: underline;"><span
style="text-decoration: underline;">ff<br>ff</span></span><br />* Some
test: <span style="font-weight: bold;"> Some test</span>

i have to move up <br> recursive inside all span, any suggestions?

Give up with regular expressions, and use a context free grammar based
parser instead. See http://java-source.net/open-source/html-parsers

- Oliver
 
S

skajotde

Give up with regular expressions, and use a context free grammar based
parser instead. See http://java-source.net/open-source/html-parsers

- Oliver

Yes, it's not too bad solution. At the moment i'm using this code:

// wylapanie <br style="font-weight: bold;"/>
Pattern badBR = Pattern.compile("<br.*?>", Pattern.DOTALL |
Pattern.CASE_INSENSITIVE);
Matcher matcherBR = badBR.matcher(html);
html = matcherBR.replaceAll("<newline/>");

// usuniecie pustych span'ow
Pattern emptySpan = Pattern.compile("<span[^>]*?></span>",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher matcherSpan = emptySpan.matcher(html);
html = matcherSpan.replaceAll("");

// przesuniecie <newline/> ze spanu miedzy dwa spany
Pattern spanBR = Pattern.compile(

"(<span[^>]*?>)([^<>]*?)(<newline/>)([^<>]*?)(</span>)",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher matcherSpanBR = spanBR.matcher(html);
int numLoop = 0;
while (matcherSpanBR.find() == true) {
html = matcherSpanBR.replaceAll("$1$2$5$3$1$4$5");
matcherSpan.reset(html);
// jeszcze raz usun puste span'y
html = matcherSpan.replaceAll("");
matcherSpanBR.reset(html);
numLoop++;
// max 3 poziomy zagniezdzenia
if (numLoop > 3) break;
}

I hope it's sufficient (my bug was resolved).

Thanks for help

Cheers
Kamil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top