Parsing HTML with HTML::TableExtract

Ninja Li · Nov 27, 2009

Hi,

I am trying to a comma-delimited file by parsing HTML from the
website "http://www.earnings.com/conferencecall.asp?client=cb"
using HTML::TableExtract module (Thanks for Tad McClellan for the
introduction). However, I got the following error message when running
my script at the end of the post:
----------------------
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
HOGGF.PK
,HOGG ROBINSON GROUP PLC,Half- Year HOGG ROBINSON GROUP PLC
Earnings Conference Call,,,4:00 AM
................
----------------------

Also notice the large spaces between first value "HOGGF.PK" and
second "HOGG ROBINSON GROUP PLC". There are only a few spaces after
the first field in the original HTML. For what I could see so far, it
seems the empty values in the fields are not handled correctly. The
source code is at the end of the post.

Please advise the root cause and the fix.

Thanks in advance.

Nick

----------------------------------------------
Source code:

use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;

my $html = get 'http://www.earnings.com/conferencecall.asp?
client=cb';

my @headers =
(
'SYMBOL',
'COMPANY',
'EVENT TITLE',
'WEBCAST',
'TRANSCRIPT',
'TIME'
);

my $te = HTML::TableExtract->new( headers => \@headers );
$te->parse($html);

foreach my $ts ( $te->tables )
{
foreach my $row ( $ts->rows )
{
my $csv = join ',', @$row;
print "$csv\n";
}
}

sln · Nov 27, 2009

Hi,

I am trying to a comma-delimited file by parsing HTML from the
website "http://www.earnings.com/conferencecall.asp?client=cb"
using HTML::TableExtract module (Thanks for Tad McClellan for the
introduction). However, I got the following error message when running
my script at the end of the post:
----------------------
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
HOGGF.PK
,HOGG ROBINSON GROUP PLC,Half- Year HOGG ROBINSON GROUP PLC
Earnings Conference Call,,,4:00 AM
...............
----------------------

Also notice the large spaces between first value "HOGGF.PK" and
second "HOGG ROBINSON GROUP PLC". There are only a few spaces after
the first field in the original HTML. For what I could see so far, it
seems the empty values in the fields are not handled correctly. The
source code is at the end of the post.

Please advise the root cause and the fix.

Thanks in advance.

Nick

What have you done to find out what caused this rediculous
number of warnings? Nothing from your code it seems.
Something is off, WAY off! Something wrong with your content or
headers. Have to learn the module, actually you have to read the docs
for it. Then, plan ahead. Look at the source of the html.

This is not rocket science.

-sln

Martien Verbruggen · Nov 28, 2009

Hi,

I am trying to a comma-delimited file by parsing HTML from the
website "http://www.earnings.com/conferencecall.asp?client=cb"
using HTML::TableExtract module (Thanks for Tad McClellan for the
introduction). However, I got the following error message when running
my script at the end of the post:
----------------------
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
HOGGF.PK
Â ,HOGG ROBINSON GROUP PLC,Half- Year HOGG ROBINSON GROUP PLC
Earnings Conference Call,,,4:00 AM
...............

Tha is not the only output. I get more.

Also notice the large spaces between first value "HOGGF.PK" and
second "HOGG ROBINSON GROUP PLC". There are only a few spaces after
the first field in the original HTML. For what I could see so far, it

Check the 'original' HTML again. What's currently at that URL has the
spaces that you see. I guess they muct have changed it since you last
looked at it.

seems the empty values in the fields are not handled correctly. The
source code is at the end of the post.

Define 'correctly'. Or rather, find out what HTML::TableExtract defines
as correctly, and adjust your expectations to that. Cells without text
content seem to be returned as undefined values. It's your job to deal
with that in whichever way you think it should be dealt with.

Please advise the root cause and the fix.

If you want, I can send you a contract and rate card.

Martien

2 problems parsing output from HTML::TableExtract	8	Sep 1, 2009
HTML::TableExtract punctuation parsing	3	May 22, 2005
I need help making an html website	2	Aug 2, 2023
Perl HTML::TableExtract Question	3	Apr 17, 2005
Rookie: HTML::TableExtract test will not print	6	Oct 8, 2003
Problem Splitting Text String	2	Dec 29, 2022
Problem using TableExtract 1.08	0	Sep 8, 2003
HTML::TableExtract with headers constraint, exluding right-most column	0	May 16, 2005

Parsing HTML with HTML::TableExtract

Ninja Li

sln

Martien Verbruggen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads