How to find and replace something that is nested inside something else?

A

alainfri

I am not sure if this group is the right place for this question but
what I need is as follows. There is a piece of html. Throughout the
html there are a lot of <br> tags. The task is to replace all these
<br> tags with \n\r. The replacement must be performed only within
<pre> blocks. I can do this using VBScript in the following way:



Option Explicit

dim path2folder : path2folder = "D:\"
dim path2file : path2file = path2folder & "test.htm"
dim fileResults : fileResults = path2folder & "test-results.txt"
dim text


dim regex
Set regex = New RegExp
regex.Global = True

Dim regex1
Set regex1 = New RegExp
regex1.Pattern = "<[Bb][Rr][\/r]{0,1}>"
regex1.Global = True


Dim matches, match, tmp, tmp1


Dim FSO : Set FSO = CreateObject("Scripting.FileSystemObject")
Dim dfile : Set dfile = FSO.OpenTextFile( path2file, 1)

If dfile.AtEndOfLine <> True Then
text =dfile.ReadAll
dfile.Close

'///////////////////////////////////////////
' START OF RELEVANT CODE
regex.Pattern = "<pre>.*?<\/pre>"
Set matches = regex.Execute(text)
For Each match In matches

tmp = Match
regex1.Pattern = "<[Bb][Rr][\/r]{0,1}>"
tmp1 = regex1.Replace(tmp,vbCrlf)
text = Replace(text,tmp,tmp1)
Next
' END OF RELEVANT CODE
'///////////////////////////////////////////
Dim outfile : Set outfile = FSO.CreateTextFile(fileResults, True)

outfile.WriteLine text
outfile.Close
MsgBox "OK"
End If


The question is how to achieve the same results using one call of
regex.Replace, like

'THIS DOES NOT WORK
regex.Pattern = "(<pre>[.\n\r]*)(<[Bb][Rr][\/r]{0,1}>)([.\n\r]*</
pre>)"
text = regex.Replace(text, "$1" & vbCrlf & vbCrlf & "$3")

Than you.
 
X

Xicheng Jia

I am not sure if this group is the right place for this question but
what I need is as follows. There is a piece of html. Throughout the
html there are a lot of <br> tags. The task is to replace all these
<br> tags with \n\r. The replacement must be performed only within
<pre> blocks. I can do this using VBScript in the following way:

If you are asking for Perl solutions, then here is one way to go:

$string =~ s{<pre>(.*?)</pre>}{ mkCrtl($1) }egs;

sub mkCrtl {
my $str = shift;
$str =~ s{<br>}{\n\r}g;
return "<pre>$str</pre>";
}

Regards,
Xicheng
 
A

alainfri

If you are asking for Perl solutions, then here is one way to go:

$string =~ s{<pre>(.*?)</pre>}{ mkCrtl($1) }egs;

sub mkCrtl {
my $str = shift;
$str =~ s{<br>}{\n\r}g;
return "<pre>$str</pre>";

}

Regards,
Xicheng

Thank you for the fast reply, Xicheng. Actually I need a solution that
would allow me to do this using the regular expressions provided by
Windows Script Host or by .NET Framework.
 
G

Gunnar Hjalmarsson

I am not sure if this group is the right place for this question but
what I need is as follows. There is a piece of html. Throughout the
html there are a lot of <br> tags. The task is to replace all these
<br> tags with \n\r. The replacement must be performed only within
<pre> blocks.

Why? I thought that <br> tags (and other html) was properly rendered
also within <pre> tags.
 
T

Tad McClellan

["Followup-To:" header set to comp.lang.perl.misc.]



This is the right place to ask Perl questions.

Actually I need a solution that
would allow me to do this using the regular expressions provided by
Windows Script Host or by .NET Framework.


Then you don't have a Perl question!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top