HTML::TableExtract with headers constraint, exluding right-most column

J

Jim Monty

I'm using the fine module HTML::TableExtract v1.10 by Matt Sisk to
extract data from an HTML table, but I'm getting unexpected behavior.
When I use the headers constraint on a simple three-column table and
request all three columns, all's well. If I specify just the two
right-most columns, all's still well. But if I exclude the right-most
column, I get a bogus first row of empty values.

C:\>cat table.pl
#!/usr/bin/perl

use strict;
use warnings;
use HTML::TableExtract;
use Data::Dumper;

my $html = <<EOT;
<html><head><title>Names</titl­e></head>
<body>
<table>
<tr><td>LastName</td><td>First­Name</td><td>MI</td></tr>
<tr><td>Doe</td><td>Jane</td><­td></td></tr>
<tr><td>Doe</td><td>John</td><­td></td></tr>
<tr><td>Public</td><td>John</t­d><td>Q</td></tr>
</table>
</body>
</html>
EOT

my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);

$te->parse($html);

my @rows = $te->rows;
print Dumper @rows;

exit 0;

__END__

C:\>perl table.pl
$VAR1 = [
'Doe',
'Jane',
''
];
$VAR2 = [
'Doe',
'John',
''
];
$VAR3 = [
'Public',
'John',
'Q'
];

C:\>

If I change

my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);

to

my $te = HTML::TableExtract->new(
headers => [ qw( FirstName MI ) ]
);

I still get good results:

C:\>perl table.pl
$VAR1 = [
'Jane',
''
];
$VAR2 = [
'John',
''
];
$VAR3 = [
'John',
'Q'
];

C:\>

But when I exclude the right-most column (in this case, "MI")

my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName ) ]
);

I get an unexpected (and unwanted) empty first row:

C:\>perl table.pl
$VAR1 = [
'',
''
];
$VAR2 = [
'Doe',
'Jane'
];
$VAR3 = [
'Doe',
'John'
];
$VAR4 = [
'Public',
'John'
];

C:\>

¿Qué pasa?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top