Need a Module Similar to lynx in Perl

Discussion in 'Perl Misc' started by Market Mutant, Jan 18, 2004.

  1. I used to just decode the HTML to get what I want, but my current project's
    download has a table. I looked it from LYNX and it is super simple, but when
    open the file in HTML, I got headache. I know I can call lynx -dump from
    perl, but I need something which I can run both from windows and linux
    without using lynx. Any module which can output exactly like lynx's dump?
     
    Market Mutant, Jan 18, 2004
    #1
    1. Advertising

  2. In article <1eqOb.59367$>,
    Market Mutant <> wrote:
    :I used to just decode the HTML to get what I want, but my current project's
    :download has a table. I looked it from LYNX and it is super simple, but when
    :eek:pen the file in HTML, I got headache. I know I can call lynx -dump from
    :perl, but I need something which I can run both from windows and linux
    :without using lynx. Any module which can output exactly like lynx's dump?

    You probably want to use the LWP module.
    --
    Beware of bugs in the above code; I have only proved it correct,
    not tried it. -- Donald Knuth
     
    Walter Roberson, Jan 18, 2004
    #2
    1. Advertising

  3. Market Mutant

    Chris Guest

    Market Mutant wrote:
    > I used to just decode the HTML to get what I want, but my current project's
    > download has a table. I looked it from LYNX and it is super simple, but when
    > open the file in HTML, I got headache. I know I can call lynx -dump from
    > perl, but I need something which I can run both from windows and linux
    > without using lynx. Any module which can output exactly like lynx's dump?
    >
    >


    Ew, I totally agree that Lynx makes for some very fine (and simple) web
    scraping. When I need that power from both Windows and *nix, I write it
    as a Web service (using XML-RPC) and call it from either platform.
    Works wonderously well. This also provices a consistence call interface
    and centralizes my code in one location.

    Chris
    -----
    Chris Olive
    chris (-at-) technologEase (-dot-) com
    http://www.technologEase.com
    (pronounced "technologies")
     
    Chris, Jan 18, 2004
    #3
  4. Market Mutant

    Ben Morrow Guest

    "Market Mutant" <> wrote:
    > I used to just decode the HTML to get what I want, but my current project's
    > download has a table. I looked it from LYNX and it is super simple, but when
    > open the file in HTML, I got headache. I know I can call lynx -dump from
    > perl, but I need something which I can run both from windows and linux
    > without using lynx. Any module which can output exactly like lynx's dump?


    perldoc -q html

    Ben

    --
    don't get my sympathy hanging out the 15th floor. you've changed the locks 3
    times, he still comes reeling though the door, and soon he'll get to you, teach
    you how to get to purest hell. you do it to yourself and that's what really
    hurts is you do it to yourself just you, you and noone else *
     
    Ben Morrow, Jan 18, 2004
    #4
  5. On Sun, 18 Jan 2004 16:10:10 +0000, Chris wrote:

    > Market Mutant wrote:
    >> I used to just decode the HTML to get what I want, but my current project's
    >> download has a table. I looked it from LYNX and it is super simple, but when
    >> open the file in HTML, I got headache. I know I can call lynx -dump from
    >> perl, but I need something which I can run both from windows and linux
    >> without using lynx. Any module which can output exactly like lynx's dump?
    >>
    >>

    >
    > Ew, I totally agree that Lynx makes for some very fine (and simple) web
    > scraping. When I need that power from both Windows and *nix, I write it
    > as a Web service (using XML-RPC) and call it from either platform.
    > Works wonderously well. This also provices a consistence call interface
    > and centralizes my code in one location.


    Or ... how about just using the LWP module? The OP just wants to get HTML
    from a page.

    And ... I bet if the OP used Google .... he would have found this to be
    the question of the week :)

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    Never tell a lie unless it is absolutely convenient.
     
    James Willmore, Jan 18, 2004
    #5
  6. I want to use lynx like, beause I got a table to deal with.
    formattext needs too many other modules
    and there is no good html->text shit in perl yet.

    I write all the codes for myself just for this project. I hope I can find
    something generic later for later projects. This really sucks. I have to
    write different codes using s/// and split for all the html codes to be
    texted.
     
    Market Mutant, Jan 19, 2004
    #6
  7. On Mon, 19 Jan 2004 09:01:51 +0000, Market Mutant wrote:

    > I want to use lynx like, beause I got a table to deal with.
    > formattext needs too many other modules
    > and there is no good html->text shit in perl yet.
    >
    > I write all the codes for myself just for this project. I hope I can find
    > something generic later for later projects. This really sucks. I have to
    > write different codes using s/// and split for all the html codes to be
    > texted.


    Well .... if you need to parse the HTML, why not journey to your local
    neighborhood CPAN and look over the *many* HTML parsing modules
    (http://search.cpan.org/ and search for HTML). I believe there is one that
    handles HTML tables. You *could* also use Google and search for the *many*
    posts on this subject in this newsgroup.

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    Bizarreness is the essence of the exotic
     
    James Willmore, Jan 19, 2004
    #7
  8. Market Mutant

    Joe Smith Guest

    Market Mutant wrote:

    > and there is no good html->text shit in perl yet.


    For just text, it is straight forward.

    #!/usr/bin/perl -w
    # Name: nohtml Author: 07-Nov-2001
    # Purpose: Extracts just the text portions of a document.

    use strict;
    use HTML::parser ();

    sub text_handler { # Ordinary text
    print @_;
    }

    my $p = HTML::parser->new(api_version => 3);
    $p->handler( text => \&text_handler, "dtext");
    $p->parse_file(shift || "-") || die $!;

    1;
     
    Joe Smith, Jan 20, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Dorward

    Re: Lynx

    David Dorward, Jun 27, 2003, in forum: HTML
    Replies:
    0
    Views:
    976
    David Dorward
    Jun 27, 2003
  2. Nicolai P. Zwar

    What's up with Lynx Browser Org?

    Nicolai P. Zwar, Jul 4, 2003, in forum: HTML
    Replies:
    8
    Views:
    863
    Thomas Dickey
    Jul 13, 2003
  3. Nico Schuyt

    Lynx for Win XP

    Nico Schuyt, Nov 21, 2003, in forum: HTML
    Replies:
    29
    Views:
    5,929
    Whitecrest
    Nov 25, 2003
  4. Replies:
    7
    Views:
    4,102
    Chuck Dillon
    Nov 22, 2004
  5. jl
    Replies:
    10
    Views:
    1,015
    cluedweasel
    Jun 29, 2005
Loading...

Share This Page