Need help with parsing data

S

Shan

So I need code that will go through a list of URLs (formatted as
http://www.google.com) and for each url get the following information:

1. The url after the href= within the following tags <link
rel="alternate" and />

So if there is <link rel="alternate" type="application/atom+xml"
title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
the http://hello.typepad.com/hello/atom.xml


2. everything bewtween the following tags <title> and </title>
so if there is <title>hello, typepad</title> I want hello, typepad

3. everything between the tags <h2 id="banner-description"> and </h2>


4. Finally i would like the results to be saved to a delimited file in
the following format:

column 1: original url
column 2: data obtained from step 1
column 3: data obtained from step 2
column 4: data obtained from step 3

if there is no result for any one of the steps a null should be saved.


I would like to thank whoever can provide me with the code in advance,
Thank you.
 
D

DJ Stunks

Shan said:
So I need code that will go through a list of URLs (formatted as
http://www.google.com) and for each url get the following information:

1. The url after the href= within the following tags <link
rel="alternate" and />

So if there is <link rel="alternate" type="application/atom+xml"
title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
the http://hello.typepad.com/hello/atom.xml


2. everything bewtween the following tags <title> and </title>
so if there is <title>hello, typepad</title> I want hello, typepad

3. everything between the tags <h2 id="banner-description"> and </h2>


4. Finally i would like the results to be saved to a delimited file in
the following format:

column 1: original url
column 2: data obtained from step 1
column 3: data obtained from step 2
column 4: data obtained from step 3

if there is no result for any one of the steps a null should be saved.


I would like to thank whoever can provide me with the code in advance,
Thank you.

it is highly unlikely that anyone will do so for a simple "thanks".
check out jobs.perl.org for someone willing to follow orders in return
for compensation.

-jp
 
J

John Bokma

Shan said:
So I need code that will go through a list of URLs (formatted as
http://www.google.com) and for each url get the following information:

1. The url after the href= within the following tags <link
rel="alternate" and />

So if there is <link rel="alternate" type="application/atom+xml"
title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
the http://hello.typepad.com/hello/atom.xml


2. everything bewtween the following tags <title> and </title>
so if there is <title>hello, typepad</title> I want hello, typepad

3. everything between the tags <h2 id="banner-description"> and </h2>


I use HTML::TreeBuilder for this, since it makes life really easy. See
http://johnbokma.com/perl/ for several examples (Web automation).

For example 3. can be done as:

my $root = HTML::TreeBuilder->new_from_content( $content );

:
:

my @column4;
push @column4, $_->as_trimmed_text
for $root->look_down( _tag => h2, id =>'banner-description' );
I would like to thank whoever can provide me with the code in advance,
Thank you.

I can provide the code, and forms to thank me are here:
http://johnbokma.com/wish-list.html

Either Object Oriented Perl or Perl Best Practices would be fine with me
since directly and indirectly you will contribute back to the Perl
community.
 
T

Tad McClellan

Shan said:
Subject: Need help with parsing data


What part is it that you need help with?


(you should use a module that understands XHTML data if you need
to process XHTML data.
)

I would like to thank whoever can provide me with the code in advance,


What makes you think that someone will write your program for you?
 
S

Shan

Thanks for your advice. i will work on writing a script today and see
what kind of results I get.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top