P
Pen Ttt
HTMLRegexp =/(<!--.*?--\s*>)|
(<(?:[^"'>]*|"[^"]*"|'[^']*')+>)|
([^<]*)/xm
data =DATA.read
data.scan(HTMLRegexp){|match|
comment,tag,tdata=match[0..2]
if comment
p ["Comment",comment]
elseif tag
p ["Tag",tag]
elseif tdata
tdata.gsub!(/\s+/,"")
tdata.sub!(/ $/,"")
p [ "TextData",tdata] unless tdata.empty?
end
}
_END_
<!DOCTYPE HTML>
<HTML>
<BODY>
< A name="FOO" href="foo" attr >foo</A>
< A name="BAR" href="bar" attr >bar</A>
< A name=BAZ href=baz attr >baz</A>
<!--
<A href="dummy">dummy</A>
-->
<BODY>
</HTML>
i run it ,the output is:
syntax error, unexpected '<', expecting $end
<!DOCTYPE HTML>
^
what's the problem?how can i solve it?
(<(?:[^"'>]*|"[^"]*"|'[^']*')+>)|
([^<]*)/xm
data =DATA.read
data.scan(HTMLRegexp){|match|
comment,tag,tdata=match[0..2]
if comment
p ["Comment",comment]
elseif tag
p ["Tag",tag]
elseif tdata
tdata.gsub!(/\s+/,"")
tdata.sub!(/ $/,"")
p [ "TextData",tdata] unless tdata.empty?
end
}
_END_
<!DOCTYPE HTML>
<HTML>
<BODY>
< A name="FOO" href="foo" attr >foo</A>
< A name="BAR" href="bar" attr >bar</A>
< A name=BAZ href=baz attr >baz</A>
<!--
<A href="dummy">dummy</A>
-->
<BODY>
</HTML>
i run it ,the output is:
syntax error, unexpected '<', expecting $end
<!DOCTYPE HTML>
^
what's the problem?how can i solve it?