Want to extract the proxy list by using regexp.

Hongyi Zhao · Jan 29, 2009

Hi all,

I want to extract the proxy list given in the following url:

http://www.cybersyndrome.net/pla5.html

which is in the following form:

---------------
[snipped]

202.99.29.27:80
221.11.27.110:8080
ip-72-55-191-6.static.privatedns.com:3128
114.30.47.10:80
116.52.155.237:80
204.73.37.112:80
220.227.90.154:8080
211.136.253.234:80
host04.wilsonareasdips.w.subnet.rcn.com:8080

[snipped]
-----------------

Firstly, I use wget to obtin the above webpage:

wget -c http://www.cybersyndrome.net/pla5.html -O pla5

Then I want to use some regular expressions to extract the proxy list,
who can give me some hints?

Regards,

Tad J McClellan · Jan 29, 2009

I want to extract the proxy list given in the following url:

http://www.cybersyndrome.net/pla5.html

Then I want to use some regular expressions to extract the proxy list,
who can give me some hints?

Regular expressions are most often not the Right Tool for processing
HTML data.

A module that understands HTML is best for processing HTML data.

------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $html = get 'http://www.cybersyndrome.net/pla5.html';
my $tree = HTML::TreeBuilder->new_from_content($html);

foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
print $elem->as_text, "\n";
}

Hongyi Zhao · Jan 29, 2009

Regular expressions are most often not the Right Tool for processing
HTML data.

A module that understands HTML is best for processing HTML data.

------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $html = get 'http://www.cybersyndrome.net/pla5.html';
my $tree = HTML::TreeBuilder->new_from_content($html);

foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
print $elem->as_text, "\n";
}
------------------------------

Very good, thanks a lot.

Want to write a script to do the batch conversion from domain name to IP.	20	Jan 30, 2009
Want to write a script to do the batch conversion from domain name to IP.	6	Jan 30, 2009
suds: how to set proxy?	0	Nov 8, 2010
IMAP4_SSL, libgmail, GMail and corporate firewall/proxy	1	Feb 17, 2011
Use Regular Expressions to extract URL's	3	Apr 30, 2010
How to extract a value from all the hashes in a list of hashes?	3	Nov 11, 2008
how to string format when string have {	4	Apr 20, 2014
REGEXP HELP	6	Aug 21, 2008

Want to extract the proxy list by using regexp.

Hongyi Zhao

Tad J McClellan

Hongyi Zhao

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads