regular expression could do this?(newbie)

A

Alont

I want to pattern a text block, but the text block very large(and
multi-line), the first line should be:
<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

and the end of the text block:
rel="external" target="new">Forum</a></li>
</ul>
</div>
</div>
</div>

so, how I can pattern the text block in a html file(many html files
waiting for pattern and then replace to"<!-- #include
virtual="/Head.inc" -->")

I have seen much examples, but can't find a example could do this
 
W

wfsp

Alont said:
I want to pattern a text block, but the text block very large(and
multi-line), the first line should be:
<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

and the end of the text block:
rel="external" target="new">Forum</a></li>
</ul>
</div>
</div>
</div>

so, how I can pattern the text block in a html file(many html files
waiting for pattern and then replace to"<!-- #include
virtual="/Head.inc" -->")

I have seen much examples, but can't find a example could do this

Using regexs on HTML is _very_ difficult; especially "many" "very large"
files. My advice would be to not even consider it. There are many good
modules to parse HTML (I use HTML::Tokeparser) and I would urge you to have
a look at them. If you hit any snags come back with what you have tried and
we'll see how we go from there.
Best of luck.
 
A

Alont

wfsp said:
Using regexs on HTML is _very_ difficult; especially "many" "very large"
files. My advice would be to not even consider it. There are many good
modules to parse HTML (I use HTML::Tokeparser) and I would urge you to have
a look at them. If you hit any snags come back with what you have tried and
we'll see how we go from there.
Best of luck.

I'll try what you say, thank you:)
 
J

Jim Keenan

Alont said:
I want to pattern a text block, but the text block very large(and
multi-line), the first line should be:
<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

and the end of the text block:
rel="external" target="new">Forum</a></li>
</ul>
</div>
</div>
</div>

so, how I can pattern the text block in a html file(many html files
waiting for pattern and then replace to"<!-- #include
virtual="/Head.inc" -->")

The keys to solving a regex like this are: (1) use the 's' qualifier
so '\n' gets counted in '.'; (2) use the 'x' qualifier so that you can
include comments and whitespace within the substitution code; (3)
build up the successful matches incrementally. I built up the
successful match using the commented-out lines below beginning with
'if'.

my $str = '<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"

text in the middle:
rel="external" target="new">Forum</a></li>
</ul>
</div>
</div>
</div>
';

print $str, "\n";

# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"}
# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C}
# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\s}
# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\sTransitional\/\/EN"\n}
# failure
if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\sTransitional\/\/EN"\s
.*\s
rel="external"\starget="new">Forum<\/a><\/li>\s
\s+<\/ul>\s
\s+<\/div>\s
\s+<\/div>\s
<\/div>\s
} # end of pattern to be matched
{"<!-- #include virtual="\/Head.inc" -->"}sx # text to be
substituted
# qualifiers to make \n work as \s, ignore whitespace and
comments
) # end of 'if' condition
{
print "Success! String is now:\n";
print "$str\n";
} else {
print "Failure\n";
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top