Asoup said:
Jürgen Exner said:
Asoup wrote: [...]
The right move. Had you read the FAQ or any of the previous threads
about this very subject you would have found that solution much
earlier.
Actually, yes, I think that people here would like to argue than
actually help. So I did read some documentation on cpan. And I am
going to study the HTML::Element module closely.
I've no idea what HTML::Element does, but I wonder why you persistently
resist looking at HTML:

arser as suggested by several people.
Here is what I have right now:
#!/usr/bin/perl
use lib '/perl/lib';
use LWP::Simple;
use HTML::TreeBuilder;
I haven't used HTML::TreeBuilder, so I can't comment on that.
[code snipped]
# It just removes the tags, but now I don't know how to sort and
*grab* the text I need and remove the rest...
Well, I suppose after you removed the tags there is nothing left to help you
identify the desired parts. So grab the right text _before_ removing the
tags resp. while you still have the syntax tree or whatever
HTML::TreeBuilder returns.
And once again: the documentation for HTML:

arser already contains an
example for how to extract the body of a <title> element.
<quote>
The next example prints out the text that is inside the <title> element of
an HTML document. Here we start by setting up a start handler. When it sees
the title start tag it enables a text handler that prints any text found and
an end handler that will terminate parsing as soon as the title end tag is
seen:
[...]
More examples are found in the eg/ directory of the HTML-Parser
distribution: the program hrefsub shows how you can edit all links found in
a document; the program htextsub shows how to edit the text only; the