Saving only comments in HTML

O

Oldbitcollector

*Sorry if this is offtopic*

I'm using Simple::LWP to fetch an HTML page into @page.

I'm trying to find a way to save only infromation found between
<!-- and --> while discarding everything else.

In other words, I only want the comments from a the HTML documents
I'm retrieving. (strange huh?) The comments will be multi-lined in
most cases
so I also want the CR's.

I've got some nice code for stripping HTML tags
( $line =~ s/<.*?>//g; ) but can someone help me find a way to do this?

Thanks
Oldbitcollector
 
M

Matt Garrish

Oldbitcollector said:
*Sorry if this is offtopic*

I'm using Simple::LWP to fetch an HTML page into @page.

I'm trying to find a way to save only infromation found between
<!-- and --> while discarding everything else.

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
# do something with commented-out text
}

But you should really be using a parser to extract markup data.
In other words, I only want the comments from a the HTML documents
I'm retrieving. (strange huh?) The comments will be multi-lined in
most cases
so I also want the CR's.

I've got some nice code for stripping HTML tags
( $line =~ s/<.*?>//g; ) but can someone help me find a way to do this?

That's not nice code. See perlfaq 9: "How do I remove html from a string"
for a full explanation.

Matt
 
R

robic0

Oldbitcollector said:
*Sorry if this is offtopic*

I'm using Simple::LWP to fetch an HTML page into @page.

I'm trying to find a way to save only infromation found between
<!-- and --> while discarding everything else.

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->
 
G

Gunnar Hjalmarsson

robic0 said:
Matt said:
foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.
 
R

robic0

robic0 said:
Matt said:
foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.
Its too late to look, and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.
 
G

Gunnar Hjalmarsson

robic0 said:
Gunnar said:
robic0 said:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.

Its too late to look,

Really? In that case it was too late to post in the first place.
and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.
 
R

robic0

robic0 said:
Gunnar said:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.

Its too late to look,

Really? In that case it was too late to post in the first place.
and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.
Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it. Otherwise its not
on record nor relavent to anything posted here and read
by any new usenet readers.

Another thread turned to ashes...
For future readers on this thread, I contend you don't
know what your talking about.
 
G

Gunnar Hjalmarsson

robic0 said:
Gunnar said:
robic0 said:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.

and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.

Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it.

Scarcely. You posted an incorrect comment on Matt's suggestion, and I
pointed out that you were wrong. When posting answers, or criticizing
answers posted by others, you'd better make efforts to get it right. If
you don't understand that, you may just as well go away.

<further rambling snipped>
 
B

Bart Van der Donck

robic0 said:
For future readers on this thread, I contend you don't
know what your talking about.

Wrong again. Future readers will not accept anything you claim, unless
it is technically accurate.

This thread points out clearly that Gunnar's expertise stands far above
yours. He knows exactly what he's talking about. I doubt you do.
 
R

robic0

robic0 said:
Gunnar said:
robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.

and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.

Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it.

Scarcely. You posted an incorrect comment on Matt's suggestion, and I
pointed out that you were wrong. When posting answers, or criticizing
answers posted by others, you'd better make efforts to get it right. If
you don't understand that, you may just as well go away.

<further rambling snipped>

Listen jerk, your qualification without explanation won't fly.
You expanded the parameters of the discussion. If you think
your God then why don't you just fuckin cure cancer.
Your not responding on the level of the question. Perldoc
statements make you a useless son uf a bitch, biotch........
 
R

robic0

Wrong again. Future readers will not accept anything you claim, unless
it is technically accurate.

This thread points out clearly that Gunnar's expertise stands far above
yours. He knows exactly what he's talking about. I doubt you do.
i doubt you know how many balls you got asshole
 
M

Matt Garrish

robic0 said:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.

and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.

Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it.

Scarcely. You posted an incorrect comment on Matt's suggestion, and I
pointed out that you were wrong. When posting answers, or criticizing
answers posted by others, you'd better make efforts to get it right. If
you don't understand that, you may just as well go away.

<further rambling snipped>

Listen jerk, your qualification without explanation won't fly.
You expanded the parameters of the discussion. If you think
your God then why don't you just fuckin cure cancer.
Your not responding on the level of the question. Perldoc
statements make you a useless son uf a bitch, biotch........

You're the only imbecile here for making such a stupid assertion and then
trying to stick to it. If you don't understand regex modifiers, which you
obviously don't, then don't attempt to correct valid regular expressions
that use them. Maybe over recess you should take the time to read perlre.

Matt
 
R

robic0

robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole m//
operator into consideration, including the modifiers.

and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.

Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it.

Scarcely. You posted an incorrect comment on Matt's suggestion, and I
pointed out that you were wrong. When posting answers, or criticizing
answers posted by others, you'd better make efforts to get it right. If
you don't understand that, you may just as well go away.

<further rambling snipped>

Listen jerk, your qualification without explanation won't fly.
You expanded the parameters of the discussion. If you think
your God then why don't you just fuckin cure cancer.
Your not responding on the level of the question. Perldoc
statements make you a useless son uf a bitch, biotch........

You're the only imbecile here for making such a stupid assertion and then
trying to stick to it. If you don't understand regex modifiers, which you
obviously don't, then don't attempt to correct valid regular expressions
that use them. Maybe over recess you should take the time to read perlre.

Matt
Why don't you put it "ALL" down in print for the record so anybody
that reads the archives will know exactly wtf you are talking about.
Thats right mother fucker. Just take out your handy fuckin keyboard
and write down the fopa's from beginning to fucking end.
Use some qutes too asshole, we want to find out just what the ****
all you jackoffs are ranting about.
 
G

Gunnar Hjalmarsson

robic0 said:
... we want to find out just what the ****
all you jackoffs are ranting about.

"We"? Some grammatical errors do upset me, like when somebody refers to
himself in plural.
 
M

Matt Garrish

On Thu, 01 Dec 2005 12:41:48 +0100, Gunnar Hjalmarsson

robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole
m//
operator into consideration, including the modifiers.

and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.

Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it.

Scarcely. You posted an incorrect comment on Matt's suggestion, and I
pointed out that you were wrong. When posting answers, or criticizing
answers posted by others, you'd better make efforts to get it right. If
you don't understand that, you may just as well go away.

<further rambling snipped>

Listen jerk, your qualification without explanation won't fly.
You expanded the parameters of the discussion. If you think
your God then why don't you just fuckin cure cancer.
Your not responding on the level of the question. Perldoc
statements make you a useless son uf a bitch, biotch........

You're the only imbecile here for making such a stupid assertion and then
trying to stick to it. If you don't understand regex modifiers, which you
obviously don't, then don't attempt to correct valid regular expressions
that use them. Maybe over recess you should take the time to read perlre.
Why don't you put it "ALL" down in print for the record so anybody
that reads the archives will know exactly wtf you are talking about.

For the record: robic0 does not understand regular expressions. When you see
a reply from him in regards to any question you might have about regular
expressions, ignore said response.

Matt
 
G

Gunnar Hjalmarsson

Matt said:
For the record: robic0 does not understand regular expressions. When you see
a reply from him in regards to any question you might have about regular
expressions, ignore said response.

s/ about regular.+?expressions//s;
 
R

robic0

<robic0> wrote in message
On Thu, 01 Dec 2005 12:41:48 +0100, Gunnar Hjalmarsson

robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Gunnar Hjalmarsson wrote:
robic0 wrote:
Matt Garrish wrote:

foreach my $commented ($page[0] =~ /<!--(.*?)-->/gs) {
^
. == [^\n]
Make sure $page[0] is not like this:
<!-- Comments all over
the place,
more and more,
still more
-->

And why would that make a difference? You'd better take the whole
m//
operator into consideration, including the modifiers.

and what does the /g"s", s modifier do,
given $page[0] contains the whole page (even though it might be
stripped of "\n". If /s modifies ".*" just say so.

perldoc says it just fine.

Just to let you know, still didn't look.
It was only relavent to me in that you asserted it and
incumbent upon youself to explain it.

Scarcely. You posted an incorrect comment on Matt's suggestion, and I
pointed out that you were wrong. When posting answers, or criticizing
answers posted by others, you'd better make efforts to get it right. If
you don't understand that, you may just as well go away.

<further rambling snipped>

Listen jerk, your qualification without explanation won't fly.
You expanded the parameters of the discussion. If you think
your God then why don't you just fuckin cure cancer.
Your not responding on the level of the question. Perldoc
statements make you a useless son uf a bitch, biotch........


You're the only imbecile here for making such a stupid assertion and then
trying to stick to it. If you don't understand regex modifiers, which you
obviously don't, then don't attempt to correct valid regular expressions
that use them. Maybe over recess you should take the time to read perlre.
Why don't you put it "ALL" down in print for the record so anybody
that reads the archives will know exactly wtf you are talking about.

For the record: robic0 does not understand regular expressions. When you see
a reply from him in regards to any question you might have about regular
expressions, ignore said response.

Matt
My IQ is 170. I think I will reffer to you as a droit motherfuckin
asshole.

Is that ok with u?
 
M

Matt Garrish

My IQ is 170. I think I will reffer to you as a droit motherfuckin
asshole.

Trouble with the decimal place? That happens when your iq is 17. Perhaps you
might care to look up droit in a dictionary over lunch break, and then work
on figuring out exactly what saying you thought you understood but were
incapable of repeating...

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top