Want to extract the proxy list by using regexp.

Discussion in 'Perl Misc' started by Hongyi Zhao, Jan 29, 2009.

  1. Hongyi Zhao

    Hongyi Zhao Guest

    Hi all,

    I want to extract the proxy list given in the following url:

    http://www.cybersyndrome.net/pla5.html

    which is in the following form:

    ---------------
    [snipped]

    202.99.29.27:80
    221.11.27.110:8080
    ip-72-55-191-6.static.privatedns.com:3128
    114.30.47.10:80
    116.52.155.237:80
    204.73.37.112:80
    220.227.90.154:8080
    211.136.253.234:80
    host04.wilsonareasdips.w.subnet.rcn.com:8080

    [snipped]
    -----------------

    Firstly, I use wget to obtin the above webpage:

    wget -c http://www.cybersyndrome.net/pla5.html -O pla5

    Then I want to use some regular expressions to extract the proxy list,
    who can give me some hints?

    Regards,

    --
    ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
     
    Hongyi Zhao, Jan 29, 2009
    #1
    1. Advertising

  2. Hongyi Zhao <> wrote:


    > I want to extract the proxy list given in the following url:
    >
    > http://www.cybersyndrome.net/pla5.html



    > Then I want to use some regular expressions to extract the proxy list,
    > who can give me some hints?



    Regular expressions are most often not the Right Tool for processing
    HTML data.

    A module that understands HTML is best for processing HTML data.


    ------------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;
    use HTML::TreeBuilder;
    use LWP::Simple;

    my $html = get 'http://www.cybersyndrome.net/pla5.html';
    my $tree = HTML::TreeBuilder->new_from_content($html);

    foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
    print $elem->as_text, "\n";
    }
    ------------------------------


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Jan 29, 2009
    #2
    1. Advertising

  3. Hongyi Zhao

    Hongyi Zhao Guest

    On Thu, 29 Jan 2009 06:50:36 -0600, Tad J McClellan
    <> wrote:

    >Hongyi Zhao <> wrote:
    >
    >
    >> I want to extract the proxy list given in the following url:
    >>
    >> http://www.cybersyndrome.net/pla5.html

    >
    >
    >> Then I want to use some regular expressions to extract the proxy list,
    >> who can give me some hints?

    >
    >
    >Regular expressions are most often not the Right Tool for processing
    >HTML data.
    >
    >A module that understands HTML is best for processing HTML data.
    >
    >
    >------------------------------
    >#!/usr/bin/perl
    >use warnings;
    >use strict;
    >use HTML::TreeBuilder;
    >use LWP::Simple;
    >
    >my $html = get 'http://www.cybersyndrome.net/pla5.html';
    >my $tree = HTML::TreeBuilder->new_from_content($html);
    >
    >foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
    > print $elem->as_text, "\n";
    >}
    >------------------------------


    Very good, thanks a lot.

    --
    ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
     
    Hongyi Zhao, Jan 29, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike
    Replies:
    12
    Views:
    1,200
  2. Dag Sunde
    Replies:
    2
    Views:
    572
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=
    Nov 7, 2007
  3. RA

    extract the soap message from the proxy class

    RA, Apr 29, 2005, in forum: ASP .Net Web Services
    Replies:
    1
    Views:
    159
    Yunus Emre ALPĂ–ZEN [MCAD.NET]
    May 1, 2005
  4. Replies:
    6
    Views:
    169
    Brian Candler
    May 30, 2007
  5. Joao Silva
    Replies:
    16
    Views:
    409
    7stud --
    Aug 21, 2009
Loading...

Share This Page