Need a Module Similar to lynx in Perl

M

Market Mutant

I used to just decode the HTML to get what I want, but my current project's
download has a table. I looked it from LYNX and it is super simple, but when
open the file in HTML, I got headache. I know I can call lynx -dump from
perl, but I need something which I can run both from windows and linux
without using lynx. Any module which can output exactly like lynx's dump?
 
W

Walter Roberson

:I used to just decode the HTML to get what I want, but my current project's
:download has a table. I looked it from LYNX and it is super simple, but when
:eek:pen the file in HTML, I got headache. I know I can call lynx -dump from
:perl, but I need something which I can run both from windows and linux
:without using lynx. Any module which can output exactly like lynx's dump?

You probably want to use the LWP module.
 
C

Chris

Market said:
I used to just decode the HTML to get what I want, but my current project's
download has a table. I looked it from LYNX and it is super simple, but when
open the file in HTML, I got headache. I know I can call lynx -dump from
perl, but I need something which I can run both from windows and linux
without using lynx. Any module which can output exactly like lynx's dump?

Ew, I totally agree that Lynx makes for some very fine (and simple) web
scraping. When I need that power from both Windows and *nix, I write it
as a Web service (using XML-RPC) and call it from either platform.
Works wonderously well. This also provices a consistence call interface
and centralizes my code in one location.

Chris
 
B

Ben Morrow

Market Mutant said:
I used to just decode the HTML to get what I want, but my current project's
download has a table. I looked it from LYNX and it is super simple, but when
open the file in HTML, I got headache. I know I can call lynx -dump from
perl, but I need something which I can run both from windows and linux
without using lynx. Any module which can output exactly like lynx's dump?

perldoc -q html

Ben
 
J

James Willmore

Ew, I totally agree that Lynx makes for some very fine (and simple) web
scraping. When I need that power from both Windows and *nix, I write it
as a Web service (using XML-RPC) and call it from either platform.
Works wonderously well. This also provices a consistence call interface
and centralizes my code in one location.

Or ... how about just using the LWP module? The OP just wants to get HTML
from a page.

And ... I bet if the OP used Google .... he would have found this to be
the question of the week :)

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Never tell a lie unless it is absolutely convenient.
 
M

Market Mutant

I want to use lynx like, beause I got a table to deal with.
formattext needs too many other modules
and there is no good html->text shit in perl yet.

I write all the codes for myself just for this project. I hope I can find
something generic later for later projects. This really sucks. I have to
write different codes using s/// and split for all the html codes to be
texted.
 
J

James Willmore

I want to use lynx like, beause I got a table to deal with.
formattext needs too many other modules
and there is no good html->text shit in perl yet.

I write all the codes for myself just for this project. I hope I can find
something generic later for later projects. This really sucks. I have to
write different codes using s/// and split for all the html codes to be
texted.

Well .... if you need to parse the HTML, why not journey to your local
neighborhood CPAN and look over the *many* HTML parsing modules
(http://search.cpan.org/ and search for HTML). I believe there is one that
handles HTML tables. You *could* also use Google and search for the *many*
posts on this subject in this newsgroup.

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Bizarreness is the essence of the exotic
 
J

Joe Smith

Market said:
and there is no good html->text shit in perl yet.

For just text, it is straight forward.

#!/usr/bin/perl -w
# Name: nohtml Author: (e-mail address removed) 07-Nov-2001
# Purpose: Extracts just the text portions of a document.

use strict;
use HTML::parser ();

sub text_handler { # Ordinary text
print @_;
}

my $p = HTML::parser->new(api_version => 3);
$p->handler( text => \&text_handler, "dtext");
$p->parse_file(shift || "-") || die $!;

1;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top