Need help with parsing data

Discussion in 'Perl Misc' started by Shan, Aug 9, 2006.

  1. Shan

    Shan Guest

    So I need code that will go through a list of URLs (formatted as
    http://www.google.com) and for each url get the following information:

    1. The url after the href= within the following tags <link
    rel="alternate" and />

    So if there is <link rel="alternate" type="application/atom+xml"
    title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
    the http://hello.typepad.com/hello/atom.xml


    2. everything bewtween the following tags <title> and </title>
    so if there is <title>hello, typepad</title> I want hello, typepad

    3. everything between the tags <h2 id="banner-description"> and </h2>


    4. Finally i would like the results to be saved to a delimited file in
    the following format:

    column 1: original url
    column 2: data obtained from step 1
    column 3: data obtained from step 2
    column 4: data obtained from step 3

    if there is no result for any one of the steps a null should be saved.


    I would like to thank whoever can provide me with the code in advance,
    Thank you.
     
    Shan, Aug 9, 2006
    #1
    1. Advertising

  2. Shan

    DJ Stunks Guest

    Shan wrote:
    > So I need code that will go through a list of URLs (formatted as
    > http://www.google.com) and for each url get the following information:
    >
    > 1. The url after the href= within the following tags <link
    > rel="alternate" and />
    >
    > So if there is <link rel="alternate" type="application/atom+xml"
    > title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
    > the http://hello.typepad.com/hello/atom.xml
    >
    >
    > 2. everything bewtween the following tags <title> and </title>
    > so if there is <title>hello, typepad</title> I want hello, typepad
    >
    > 3. everything between the tags <h2 id="banner-description"> and </h2>
    >
    >
    > 4. Finally i would like the results to be saved to a delimited file in
    > the following format:
    >
    > column 1: original url
    > column 2: data obtained from step 1
    > column 3: data obtained from step 2
    > column 4: data obtained from step 3
    >
    > if there is no result for any one of the steps a null should be saved.
    >
    >
    > I would like to thank whoever can provide me with the code in advance,
    > Thank you.


    it is highly unlikely that anyone will do so for a simple "thanks".
    check out jobs.perl.org for someone willing to follow orders in return
    for compensation.

    -jp
     
    DJ Stunks, Aug 9, 2006
    #2
    1. Advertising

  3. Shan

    John Bokma Guest

    "Shan" <> wrote:

    > So I need code that will go through a list of URLs (formatted as
    > http://www.google.com) and for each url get the following information:
    >
    > 1. The url after the href= within the following tags <link
    > rel="alternate" and />
    >
    > So if there is <link rel="alternate" type="application/atom+xml"
    > title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
    > the http://hello.typepad.com/hello/atom.xml
    >
    >
    > 2. everything bewtween the following tags <title> and </title>
    > so if there is <title>hello, typepad</title> I want hello, typepad
    >
    > 3. everything between the tags <h2 id="banner-description"> and </h2>



    I use HTML::TreeBuilder for this, since it makes life really easy. See
    http://johnbokma.com/perl/ for several examples (Web automation).

    For example 3. can be done as:

    my $root = HTML::TreeBuilder->new_from_content( $content );

    :
    :

    my @column4;
    push @column4, $_->as_trimmed_text
    for $root->look_down( _tag => h2, id =>'banner-description' );

    > I would like to thank whoever can provide me with the code in advance,
    > Thank you.


    I can provide the code, and forms to thank me are here:
    http://johnbokma.com/wish-list.html

    Either Object Oriented Perl or Perl Best Practices would be fine with me
    since directly and indirectly you will contribute back to the Perl
    community.

    --
    John Bokma Freelance software developer
    &
    Experienced Perl programmer: http://castleamber.com/
     
    John Bokma, Aug 10, 2006
    #3
  4. Shan <> wrote:

    > Subject: Need help with parsing data



    What part is it that you need help with?


    (you should use a module that understands XHTML data if you need
    to process XHTML data.
    )


    > I would like to thank whoever can provide me with the code in advance,



    What makes you think that someone will write your program for you?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Aug 10, 2006
    #4
  5. Shan

    Shan Guest

    Thanks for your advice. i will work on writing a script today and see
    what kind of results I get.
     
    Shan, Aug 10, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    913
    GIMME
    Feb 11, 2004
  2. =?Utf-8?B?Z2c3Nw==?=
    Replies:
    3
    Views:
    719
    Eliyahu Goldin
    Aug 18, 2005
  3. Siva
    Replies:
    1
    Views:
    470
    =?Utf-8?B?UmljaA==?=
    Apr 17, 2006
  4. Replies:
    6
    Views:
    116
    Tad McClellan
    Jun 27, 2006
  5. Replies:
    5
    Views:
    98
    Chris Angelico
    May 14, 2014
Loading...

Share This Page