parse inside of html tags

J

jjliu

Could someone tell me how to parse the inside of html tags by perl, such as

<meta> </meta>
<head> </head>
<title> </title>
.........


Thanks
 
P

Peter Mahnke

Depends if you are looking for very exact tags (i.e. <table
colspan="2"> is the start of something you are interetest in.... or
(i.e. any tag that starts with <meta.

however a good generic test is:

$contents = $1 if ($_ =~ />(.[^<\/]*)<\//);

this will find the contents of a set of tags like you have mentioned,
it looks for everthing from a > until a </

however, this doesn't work well for finding lots of tags on a single
line

then you either need to split the file on > instead of the standart \n

or you can do something like this

while (/>(.[^<\/]*)<\//g) {
push @tagContents, $1;
}

I hope this helps.

Peter
 
N

nobull

jjliu said:
Could someone tell me how to parse the inside of html tags by perl, such as

<meta> </meta>
<head> </head>
<title> </title>
........


Thanks

Use an HTML parser module. Get one in the usual place (see FAQ).

This newsgroup does not exist (see FAQ). Please do not start threads here.
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(e-mail address removed) (Peter Mahnke) wrote in
Depends if you are looking for very exact tags (i.e. <table
colspan="2"> is the start of something you are interetest in.... or
(i.e. any tag that starts with <meta.

however a good generic test is:

$contents = $1 if ($_ =~ />(.[^<\/]*)<\//);

this will find the contents of a set of tags like you have mentioned,
it looks for everthing from a > until a </

No, it looks for everything from a < to a / or <, and if that character is
not immediately followed by a </ it fails.

Regular expressions are a poor way to parse a structured, hierarchical
markup such as HTML or XML. You really have to use a parser module for all
but the most trivial cases. (I usually use HTML::TokeParser for HTML, and
XML::DOM for XML. YMMV).

For future reference, comp.lang.perl is a defunct newsgroup. General Perl
questions should be posted to comp.lang.perl.misc where they are likely to
get a better response.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP4fqo2PeouIeTNHoEQJ7jgCgwsrTJzh/EHy8OkRkeqQi6MZTTtgAniqE
wPb3UEqnZrAyXsYxMgM7Rv6G
=Ghzn
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top