F
Felix
The script below tries converting html to text while preserving
textual links. However, with www.news.com it renders many lines all
bunched together as opposed to neatly separated as lynx would do with
the same page (see www.marcfest.com/qxi5/news.cgi to see what I mean).
Anybody know how I can improve the script? Using backticks with lynx
is not an option, btw.
Thank you very much.
Marc
SCRIPT:
#!/usr/bin/perl
use LWP::Simple;
use HTML::TagFilter;
$content = get ("http://www.news.com");
my $tf = HTML::TagFilter->new(strip_comments =>
1,allow=>{a=>{'any'},br=>{'any'},p=>{'any'},script=>{'any'},style=>{'any'}});
$content = $tf->filter($content);
print $content; exit;
textual links. However, with www.news.com it renders many lines all
bunched together as opposed to neatly separated as lynx would do with
the same page (see www.marcfest.com/qxi5/news.cgi to see what I mean).
Anybody know how I can improve the script? Using backticks with lynx
is not an option, btw.
Thank you very much.
Marc
SCRIPT:
#!/usr/bin/perl
use LWP::Simple;
use HTML::TagFilter;
$content = get ("http://www.news.com");
my $tf = HTML::TagFilter->new(strip_comments =>
1,allow=>{a=>{'any'},br=>{'any'},p=>{'any'},script=>{'any'},style=>{'any'}});
$content = $tf->filter($content);
print $content; exit;