real, simple sample OOP intro text??!!

G

Geoff Cox

Also sprach Geoff Cox:

Tassilo,

many thanks for the corrections - will sort it out!

Cheers

Geoff


Tassilo

have used your code and my version works now! You will see that I have
extended it to work for <p> and that too works.

Glad to hear it.
I am not clear why the following line appears in the start, end and
text sub. I would have thought it would appear just once...I am not
following the logic??

print OUT ("<h2>$origtext</h2> \n") if $in_heading;

I don't think the above should appear like that in all three callbacks.
Cheers

Geoff

my $in_heading;
my $p;

sub start {

my ($self, $tagname, $attr, undef, undef, $origtext) = @_;

There is one undef too many. That means that $origtext will always be
undefined. Put 'use warnings;' in your code and perl will tell you that
you are printing an undefined value further below.
if ($tagname eq 'option') {
&getintro($attr->{ value });
}

if ($tagname eq 'h2') {
$in_heading = 1;
return;
}
print OUT ("<h2>$origtext</h2> \n") if $in_heading;

That's indeed not quite right. You create too many <h2>...</h2> pairs
with that. For this HTML snippet:

<h2><i>Heading</i></h2>

your parser spits out (assuming that you remove one of the above two
undefs):

<h2><i></h2><h2>Heading</h2><h2></i></h2>

This is because you wrap _everything_ inside <h2></h2> when $in_heading
is true.

If you want to include the heading tags in your output, then you have to do the
following:

sub start {
my ($self, $tagname, undef, undef, $origtext) = @_;
$in_heading = 1 if $tagname eq 'h2';
print $origtext if $in_heading;
}

sub text {
my ($self, $origtext) = @_;
print $origtext if $in_heading;
}

sub end {
my ($self, $tagname, $origtext) = @_;
print $origtext if $in_heading;
$in_heading = 0 if $tagname eq 'h2';
}

If you don't want to include them, you have to return from the
start/end functions without writing anything when a <h2> tag is
encountered.

As I wrote before: It takes a little time to get used to the way
HTML::parser does the job. You have to be clear about how HTML::parser
triggers the callbacks and what arguments are passed. Use a fixed font
for the following:

<tag attr="val"><tag1>some text</tag1></tag>
`--------------'`----'`-------'`-----'`----'
(1) (2) (3) (4) (5)

(1) start ($self,
'tag', # $tagname
{ attr => 'val' }, # $attr
[ 'attr' ], # $attrseq
'<tag attr="val">' # $origtext
);

(2) start ($self,
'tag1', # $tagname
{ }, # $attr
[ ], # $attrseq
'<tag1>' # $origtext
);

(3) text ($self,
'some text', # $origtext
0 # $is_cdata
;)

(4) end ($self,
'tag1', # $tagname
'</tag1>' # $origtext
);

(5) end ($self,
'tag', # $tagname
if ($tagname eq 'p') {
$p = 1;
return;
}

Unlike with the <h2> tags, here you do not print any <p> tags. So you
essentially just record when you are inside <p>, but you don't include
the <p> tags in your output.

Tassilo
 
G

Geoff Cox

On 4 Apr 2004 18:20:36 GMT, "Tassilo v. Parseval"

Tassilo,

having got to the point of having an OOP script that works for 1 html
file, is it possible to work with a series of similar files using
File::Find ? I am trying this but getting errors which I correct
yet...

Cheers

Geoff
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
On 4 Apr 2004 18:20:36 GMT, "Tassilo v. Parseval"

Tassilo,

having got to the point of having an OOP script that works for 1 html
file, is it possible to work with a series of similar files using
File::Find ? I am trying this but getting errors which I correct
yet...

Sure, why not? What you can do once, you can do twice (and more times)
as well. It's just a matter of calling '$parser->parse_file' for each
file you want to process.

Tassilo
 
G

Geoff Cox

Also sprach Geoff Cox:

Sure, why not? What you can do once, you can do twice (and more times)
as well. It's just a matter of calling '$parser->parse_file' for each
file you want to process.

Tassilo,

Thanks for that - in fact have had some success with multiple files
now!

Cheers

Geoff
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
Tassilo,

Thanks for that - in fact have had some success with multiple files
now!

Good. One thing you have to be careful with: When you use global
variables to keep track of the current state (like in which tag the
parser currently is), you need to reset them for each file.

That can be conveniently done by overriding the parse*() methods in your
parser subclass:

package MyParser;
use base qw/HTML::parser/;

my $in_heading;
my $p;

# and possibly the same for parse() and parse_chunk()
# if you use those
sub parse_file {
my $self = shift;
($in_heading, $p) = (0, 0);
$self->SUPER::parse_file(@_);
}

sub start {
...
}
...

SUPER is a metapackage specifier. Since HTML::parser::parse_file() is
overridden, it is no longer called when doing

$parser->parse_file(...);

In order to call it nonetheless after the resetting is done, this line

$self->SUPER::parse_file(@_);

refers to the superclass' parse_file() method. Using SUPER::method() is
a common way to call the original inherited method even when it has been
overridden by the subclass.

Once have you overridden these methods thusly, you no longer have to
worry when calling

$parser->parse_file("file.html");

because this method will now take care of resetting any global
variables.

Tassilo
 
G

Geoff Cox

Good. One thing you have to be careful with: When you use global
variables to keep track of the current state (like in which tag the
parser currently is), you need to reset them for each file.

That can be conveniently done by overriding the parse*() methods in your
parser subclass:

package MyParser;
use base qw/HTML::parser/;

my $in_heading;
my $p;

# and possibly the same for parse() and parse_chunk()
# if you use those
sub parse_file {
my $self = shift;
($in_heading, $p) = (0, 0);
$self->SUPER::parse_file(@_);
}

sub start {
...
}
...

SUPER is a metapackage specifier. Since HTML::parser::parse_file() is
overridden, it is no longer called when doing

$parser->parse_file(...);

In order to call it nonetheless after the resetting is done, this line

$self->SUPER::parse_file(@_);

refers to the superclass' parse_file() method. Using SUPER::method() is
a common way to call the original inherited method even when it has been
overridden by the subclass.

Once have you overridden these methods thusly, you no longer have to
worry when calling

$parser->parse_file("file.html");

because this method will now take care of resetting any global
variables.

Thanks Tassilo - will tale note...

Cheers

Geoff
 
D

David H. Adler

TvP> Also sprach Uri Guttman:
[snip]

TvP> I'd rather suggest he has a look at Randal's "Learning Perl
TvP> Objects, References & Modules. And unlike Damian's book, this
TvP> one comes with excercises for the reader.

i would expect that to be good too. i haven't snarfed a copy yet :)

Aha. Well, if you can push damian's book as a tech reviewer of it, I
can help out here. :) Yes, it's a good book.

Damian's book, although wonderful, is very much an in-depth look at OOP,
whereas Randal's book feels a bit more like an introduction - not wholly
surprising, as it's intended to act sort of as a follow up to Learning Perl.

As a comparison, consider Mastering Regular Expressions - someone who
just wants to learn about using regular expressions doesn't need to read
a rather large portion of the book. If you want to learn about regexen
in general, however, you probably can't do better. I seem to remember
that someone was working on a book on Perl regexen as such, but afaik
it's not actually available.

If you *really* want to understand all the ins and outs of OOP, get
Damian's book. If you just want to be able to *use* OOP, Randal's book
might work well for you (although I'd recommend reading damian's at some
point, as it really is quite good).

Also, if examples featuring characters from Gilligan's Island appeals,
Randal's book wins on that score. :)
but given the pricing of us books in london, i doubt the OP will get
much of any discount though i think PORM has a cheaper list price than
OOP.

Re: pricing - I don't know the reasons for it, but I have always found
the "same price but in sterling" to hold for US vs. UK. I remember
picking up a copy of Elvis Costello's My Aim Is True (on VINYL, so this
is a *long* time ago :) in London at one point and it was whatever it
cost here, but in pounds. So this neither new nor restricted to
computer books.

dha
 
U

Uri Guttman

TvP> Also sprach Uri Guttman:

DHA> [snip]

TvP> I'd rather suggest he has a look at Randal's "Learning Perl
TvP> Objects, References & Modules. And unlike Damian's book, this
TvP> one comes with excercises for the reader.
DHA> Aha. Well, if you can push damian's book as a tech reviewer of it, I
DHA> can help out here. :) Yes, it's a good book.

DHA> Damian's book, although wonderful, is very much an in-depth look
DHA> at OOP, whereas Randal's book feels a bit more like an
DHA> introduction - not wholly surprising, as it's intended to act
DHA> sort of as a follow up to Learning Perl.

the chapter of perl review in OOP is a classic. and the chapter on the 3
points that make perl into OO is also great. the rest of the book as you
say is in depth and not for beginners.

DHA> Also, if examples featuring characters from Gilligan's Island appeals,
DHA> Randal's book wins on that score. :)

yeah, OOP uses boring cd catalog examples.

uri
 
B

Bart Lateur

David said:
I seem to remember
that someone was working on a book on Perl regexen as such, but afaik
it's not actually available.

That must be Jeff Pinyan AKA japhy, author of the modules YAPE::*
alternative parser modules, including YAPE::Regex::Explain. His draft
Re: pricing - I don't know the reasons for it, but I have always found
the "same price but in sterling" to hold for US vs. UK. I remember
picking up a copy of Elvis Costello's My Aim Is True (on VINYL, so this
is a *long* time ago :) in London at one point and it was whatever it
cost here, but in pounds. So this neither new nor restricted to
computer books.

You should see what they charge for a copy of Dr. Dobb's Journal here...
its price is $4.95 in the USA, here in Belgium, loose copies cost about
10 Euro. Quite a steep price, especially if you reconsider that in the
USA, a subscription would cost you < $2 per magazine.
 
B

Bart Lateur

Geoff said:
wouldn't dream of suggesting you are personally responsible for this!!
if a book costs say $30 in the US and £30 here - does it really cost
£10 to get it into a shop the UK??

Don't forget about shipping. It's a long way from the USA to the UK...

Buy the book over the internet, if you must. If you order several at
once, you'll pay far less than you would in a local shop.
 
D

David H. Adler

Don't forget about shipping. It's a long way from the USA to the UK...

Also, there are probably some kind of import taxes involved.

Is anyone around who actually works with a publisher know the details?

dha
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top