J
Jim Monty
I'm using the fine module HTML::TableExtract v1.10 by Matt Sisk to
extract data from an HTML table, but I'm getting unexpected behavior.
When I use the headers constraint on a simple three-column table and
request all three columns, all's well. If I specify just the two
right-most columns, all's still well. But if I exclude the right-most
column, I get a bogus first row of empty values.
C:\>cat table.pl
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use Data:
umper;
my $html = <<EOT;
<html><head><title>Names</title></head>
<body>
<table>
<tr><td>LastName</td><td>FirstName</td><td>MI</td></tr>
<tr><td>Doe</td><td>Jane</td><td></td></tr>
<tr><td>Doe</td><td>John</td><td></td></tr>
<tr><td>Public</td><td>John</td><td>Q</td></tr>
</table>
</body>
</html>
EOT
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);
$te->parse($html);
my @rows = $te->rows;
print Dumper @rows;
exit 0;
__END__
C:\>perl table.pl
$VAR1 = [
'Doe',
'Jane',
''
];
$VAR2 = [
'Doe',
'John',
''
];
$VAR3 = [
'Public',
'John',
'Q'
];
C:\>
If I change
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);
to
my $te = HTML::TableExtract->new(
headers => [ qw( FirstName MI ) ]
);
I still get good results:
C:\>perl table.pl
$VAR1 = [
'Jane',
''
];
$VAR2 = [
'John',
''
];
$VAR3 = [
'John',
'Q'
];
C:\>
But when I exclude the right-most column (in this case, "MI")
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName ) ]
);
I get an unexpected (and unwanted) empty first row:
C:\>perl table.pl
$VAR1 = [
'',
''
];
$VAR2 = [
'Doe',
'Jane'
];
$VAR3 = [
'Doe',
'John'
];
$VAR4 = [
'Public',
'John'
];
C:\>
¿Qué pasa?
extract data from an HTML table, but I'm getting unexpected behavior.
When I use the headers constraint on a simple three-column table and
request all three columns, all's well. If I specify just the two
right-most columns, all's still well. But if I exclude the right-most
column, I get a bogus first row of empty values.
C:\>cat table.pl
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use Data:
my $html = <<EOT;
<html><head><title>Names</title></head>
<body>
<table>
<tr><td>LastName</td><td>FirstName</td><td>MI</td></tr>
<tr><td>Doe</td><td>Jane</td><td></td></tr>
<tr><td>Doe</td><td>John</td><td></td></tr>
<tr><td>Public</td><td>John</td><td>Q</td></tr>
</table>
</body>
</html>
EOT
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);
$te->parse($html);
my @rows = $te->rows;
print Dumper @rows;
exit 0;
__END__
C:\>perl table.pl
$VAR1 = [
'Doe',
'Jane',
''
];
$VAR2 = [
'Doe',
'John',
''
];
$VAR3 = [
'Public',
'John',
'Q'
];
C:\>
If I change
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);
to
my $te = HTML::TableExtract->new(
headers => [ qw( FirstName MI ) ]
);
I still get good results:
C:\>perl table.pl
$VAR1 = [
'Jane',
''
];
$VAR2 = [
'John',
''
];
$VAR3 = [
'John',
'Q'
];
C:\>
But when I exclude the right-most column (in this case, "MI")
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName ) ]
);
I get an unexpected (and unwanted) empty first row:
C:\>perl table.pl
$VAR1 = [
'',
''
];
$VAR2 = [
'Doe',
'Jane'
];
$VAR3 = [
'Doe',
'John'
];
$VAR4 = [
'Public',
'John'
];
C:\>
¿Qué pasa?