Perl HTML::TableExtract Question

Discussion in 'Perl Misc' started by Paul, Apr 17, 2005.

  1. Paul

    Paul Guest

    Hi !

    I hope someone can help.

    I want to extract data from a table with 2 columns.

    A sample of the table can be generated with:-

    "http://moneycentral.msn.com/investor/research/sreport.asp?Symbol=ba&QD=1&OP=1&IC=1&Y1=1&CR=1&AF=1&AIE=1&AIR=1&FRH=1&FRK=1&ISA=1&ISQ=1&BSA=1&BSQ=1&CFA=1&CFQ=1&TYS=1&ITT=1&ITP=1&Type=Equity"

    (Sorry about the long URL :) )

    What I want is the field from the top table Labelled - "Tot. Shares Out."

    My Current Code is :-

    #!/usr/bin/perl -w


    use strict;
    use HTML::TableExtract;


    my $inFile = "/home/mas/development/URLTemp.tmp";
    my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);
    $te->parse_file( $inFile );
    foreach my $ts ( $te->table_states ) {
    foreach my $row ( $ts->rows ) {
    print join( ",", @$row, "," ), "\n";
    }
    }


    But this seems to get the table lower down the page. This wouldn't be so
    bad as it has the value I need repeated but - "How do I get an
    un-labelled column ????"

    Any help would be appreciated.

    Paul
    Paul, Apr 17, 2005
    #1
    1. Advertising

  2. Paul

    Paul Guest

    Paul wrote:
    > Hi !
    >
    > I hope someone can help.
    >
    > I want to extract data from a table with 2 columns.
    >
    > A sample of the table can be generated with:-
    >
    > "http://moneycentral.msn.com/investor/research/sreport.asp?Symbol=ba&QD=1&OP=1&IC=1&Y1=1&CR=1&AF=1&AIE=1&AIR=1&FRH=1&FRK=1&ISA=1&ISQ=1&BSA=1&BSQ=1&CFA=1&CFQ=1&TYS=1&ITT=1&ITP=1&Type=Equity"
    >
    >
    > (Sorry about the long URL :) )
    >
    > What I want is the field from the top table Labelled - "Tot. Shares Out."
    >
    > My Current Code is :-
    >
    > #!/usr/bin/perl -w
    >
    >
    > use strict;
    > use HTML::TableExtract;
    >
    >
    > my $inFile = "/home/mas/development/URLTemp.tmp";
    > my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);
    > $te->parse_file( $inFile );
    > foreach my $ts ( $te->table_states ) {
    > foreach my $row ( $ts->rows ) {
    > print join( ",", @$row, "," ), "\n";
    > }
    > }
    >
    >
    > But this seems to get the table lower down the page. This wouldn't be so
    > bad as it has the value I need repeated but - "How do I get an
    > un-labelled column ????"
    >
    > Any help would be appreciated.
    >
    > Paul

    Just a bit more info on this - the ", '*'" doesn't work - in fact it
    returns empty data. Without it it assumes that the rows below are what
    is wanted and it returns:-

    Market Capitalization,,
    Earnings/Share,,

    The real question is "How do I specify a row with a NULL header ??
    Paul, Apr 17, 2005
    #2
    1. Advertising

  3. Paul <none@none> wrote:

    > What I want is the field from the top table Labelled - "Tot. Shares Out."


    > my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);



    The headers approach will not work since there are no headers
    on the table that contains the data that you are after.


    > "How do I get an
    > un-labelled column ????"



    Positionally.

    "Tot. Shares Out." is the 7th column in the 12th row of the table
    at depth=2 and count=1.


    > Any help would be appreciated.



    my $te = HTML::TableExtract->new( depth => 2, count => 1);
    my $total_outstanding = ($ts->rows)[11]->[6];


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Apr 17, 2005
    #3
  4. Paul

    Paul Guest

    Tad McClellan wrote:
    > Paul <none@none> wrote:
    >
    >
    >>What I want is the field from the top table Labelled - "Tot. Shares Out."

    >
    >
    >>my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);

    >
    >
    >
    > The headers approach will not work since there are no headers
    > on the table that contains the data that you are after.
    >
    >
    >
    >>"How do I get an
    >>un-labelled column ????"

    >
    >
    >
    > Positionally.
    >
    > "Tot. Shares Out." is the 7th column in the 12th row of the table
    > at depth=2 and count=1.
    >
    >
    >
    >>Any help would be appreciated.

    >
    >
    >
    > my $te = HTML::TableExtract->new( depth => 2, count => 1);
    > my $total_outstanding = ($ts->rows)[11]->[6];
    >
    >

    Thanks for that Tad !! I got the same answer at about 0230 in the
    morning :-(

    It seems the page isn't very well constructed.

    I spent lots of time looking for the new version of HTML::TableExtract
    which is supposed to address rows as well as columns but could only find
    fleeting references to it.

    Regards.
    Paul, Apr 17, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sdfgsd
    Replies:
    6
    Views:
    190
    sdfgsd
    Oct 9, 2003
  2. Jim Monty
    Replies:
    0
    Views:
    103
    Jim Monty
    May 16, 2005
  3. Maqo
    Replies:
    3
    Views:
    143
    Bob Walton
    May 25, 2005
  4. Ninja Li

    Parsing HTML with HTML::TableExtract

    Ninja Li, Nov 27, 2009, in forum: Perl Misc
    Replies:
    2
    Views:
    208
    Martien Verbruggen
    Nov 28, 2009
  5. Marko Riedel

    HTML::TableExtract w. perl 5.10

    Marko Riedel, Sep 28, 2012, in forum: Perl Misc
    Replies:
    1
    Views:
    332
    Marko Riedel
    Sep 28, 2012
Loading...

Share This Page