HTML::Parser and <p> behaviour?

Discussion in 'Perl Misc' started by Geoff Cox, Oct 13, 2004.

  1. Geoff Cox

    Geoff Cox Guest

    Hello,

    I cannot seem to work out how HTML::parser deals with <p> text </p>
    from an html file ...

    It breaks up a paragraph by placing a </p> <p> inside a paragraph of
    text in what seems to me to be a random fashion...

    any rule at work here?

    Cheers

    Geoff
     
    Geoff Cox, Oct 13, 2004
    #1
    1. Advertising

  2. Geoff Cox <> wrote in
    news::

    > I cannot seem to work out how HTML::parser deals with <p> text </p>
    > from an html file ...
    >


    Please read the posting guidelines posted here regularly.

    You have asked a stupid question. A stupid question is one that cannot
    generate a useful answer. It is in your best interest to ask smart
    question, i.e. ones that contain enough information so that someone can
    actually help you.

    Of course, this is inapplicable if you are just posting for the heck of it
    and are not interested in actually getting your question answered.

    So, go back and post a small, self-contained script that still exhibits the
    problem.

    > It breaks up a paragraph by placing a </p> <p> inside a paragraph of
    > text in what seems to me to be a random fashion...
    >
    > any rule at work here?


    In the immortal words of MJD,

    If you have `some weird error', the problem is probably with your
    frobnitzer.

    (David, thanks for the link :)

    Sinan.
     
    A. Sinan Unur, Oct 13, 2004
    #2
    1. Advertising

  3. Geoff Cox

    Geoff Cox Guest

    On 13 Oct 2004 19:19:27 GMT, "A. Sinan Unur"
    <> wrote:

    >So, go back and post a small, self-contained script that still exhibits the
    >problem.


    OK - does the code below help? In fact there are 2 questions here.

    1. if I have a paragraph of text between <p> and </p> I find that the
    text is broken into two parts producing

    <p> jajlsdkjklasjdkljakdj </p><p> hakljd laksdj </p>

    2. I am trying to parse

    <ul>
    <li>
    <li>
    </ul>

    The code below produces text with <li> jkjkj </li> but I cannot see
    how to put the <li>'s between <ul> amd </ul>

    Cheers

    Geoff

    package MyParser;
    use base qw(HTML::parser);
    use strict;
    use diagnostics;

    my ($in_heading,$in_p,$in_li, $fh);

    sub register_fh {
    $fh = $_[1];
    }
    sub reset { ($in_heading,$in_p, $in_li)=(0,0)}

    sub start {

    my ( $self, $tagname, $attr, undef, $origtext ) = @_;

    if ( $tagname eq 'h2' ) {
    $in_heading = 1;
    return;
    }

    if ( $tagname eq 'p' ) {
    $in_p = 1;
    return;
    }

    if ( $tagname eq 'li' ) {
    $in_li = 1;
    return;
    }


    if ( $tagname eq 'option' ) {

    # print ("\$origtext has value $origtext \n");

    main::choice( $attr->{ value } );

    }

    }

    sub end {
    my ( $self, $tagname, $origtext ) = @_;
    if ( $tagname eq 'h2' ) {
    $in_heading = 0;
    return;
    }

    if ( $tagname eq 'p' ) {
    $in_p = 0;
    return;
    }

    if ( $tagname eq 'ul' ) {
    $in_li = 0;
    return;
    }


    }

    sub text {
    my ( $self, $origtext ) = @_;
    print $fh "<h2>$origtext</h2> \n" if $in_heading;
    print $fh "<p>$origtext</p> \n" if $in_p;
    print $fh "<li>$origtext</li> \n" if $in_li;


    }

    package main;

    use File::Find;













    >
    >> It breaks up a paragraph by placing a </p> <p> inside a paragraph of
    >> text in what seems to me to be a random fashion...
    >>
    >> any rule at work here?

    >
    >In the immortal words of MJD,
    >
    >If you have `some weird error', the problem is probably with your
    >frobnitzer.
    >
    >(David, thanks for the link :)
    >
    >Sinan.
     
    Geoff Cox, Oct 13, 2004
    #3
  4. Geoff Cox

    187 Guest

    A. Sinan Unur wrote:
    > Geoff Cox <> wrote in
    > news::
    >
    >> I cannot seem to work out how HTML::parser deals with <p> text </p>
    >> from an html file ...
    >>

    >
    > Please read the posting guidelines posted here regularly.
    >
    > You have asked a stupid question. A stupid question is one that cannot
    > generate a useful answer. It is in your best interest to ask smart
    > question, i.e. ones that contain enough information so that someone
    > can actually help you.
    >
    > Of course, this is inapplicable if you are just posting for the heck
    > of it and are not interested in actually getting your question
    > answered.
    >
    > So, go back and post a small, self-contained script that still
    > exhibits the problem.


    Grated soem sample code would of been nice, the situation was still
    described to the point wher someone who has worked with that module
    might be able to help.

    Your assine tone, as well as your direct insults to the OP, was
    completely unwarrented.

    You could of just rplied asking for more information, but if you could
    not tell the situation from the initial post then I doubt you would of
    been able to help. (I would attemt but I am not very familiar with this
    module).

    There is NO excuse for your tone in this thread.
     
    187, Oct 13, 2004
    #4
  5. Geoff Cox <> wrote in
    news::

    > On 13 Oct 2004 19:19:27 GMT, "A. Sinan Unur"
    > <> wrote:
    >
    >>So, go back and post a small, self-contained script that still
    >>exhibits the problem.

    >
    > OK - does the code below help? In fact there are 2 questions here.


    No.

    It is not self-contained. That is, I cannot, without doing extra work, just
    run it and see what happens.

    Sinan.

    --
    A. Sinan Unur
    d
    (remove '.invalid' and reverse each component for email address)
     
    A. Sinan Unur, Oct 13, 2004
    #5
  6. Geoff Cox <> wrote in
    news::

    > OK - does the code below help? In fact there are 2 questions here.
    >
    > 1. if I have a paragraph of text between <p> and </p> I find that the
    > text is broken into two parts producing


    Here is the relevant part from your code:

    > sub text {
    > my ( $self, $origtext ) = @_;
    > print $fh "<h2>$origtext</h2> \n" if $in_heading;
    > print $fh "<p>$origtext</p> \n" if $in_p;
    > print $fh "<li>$origtext</li> \n" if $in_li;
    >
    >
    > }


    I suspect the handler is being called multiple times, each time with a
    different part of the original text. You can test this hypothesis by
    putting a debug statement in here.

    --
    A. Sinan Unur
    d
    (remove '.invalid' and reverse each component for email address)
     
    A. Sinan Unur, Oct 13, 2004
    #6
  7. Geoff Cox

    Ben Morrow Guest

    Quoth "A. Sinan Unur" <>:
    > Geoff Cox <> wrote in
    > news::
    >
    > > OK - does the code below help? In fact there are 2 questions here.
    > >
    > > 1. if I have a paragraph of text between <p> and </p> I find that the
    > > text is broken into two parts producing

    >
    > Here is the relevant part from your code:
    >

    <snip>
    >
    > I suspect the handler is being called multiple times, each time with a
    > different part of the original text.


    ....as indeed the HTML::parser documentation says it will be. You can
    prevent this with the ->unborken_text method [typo left because it
    amused me :)]

    Ben

    --
    Like all men in Babylon I have been a proconsul; like all, a slave ... During
    one lunar year, I have been declared invisible; I shrieked and was not heard,
    I stole my bread and was not decapitated.
    ~ ~ Jorge Luis Borges, 'The Babylon Lottery'
     
    Ben Morrow, Oct 13, 2004
    #7
  8. Geoff Cox

    Tom Guest

    187 wrote:

    >
    >
    > Grated soem sample code would of been nice, the situation was still
    > described to the point wher someone who has worked with that module
    > might be able to help.


    No it wasn't.

    >
    > Your assine tone, as well as your direct insults to the OP, was
    > completely unwarrented.


    No they weren't. There are too many idiots like you who should be
    frequenting comp.lang.basic rather than this newsgroup.

    >
    > You could of just rplied asking for more information, but if you could
    > not tell the situation from the initial post then I doubt you would of
    > been able to help. (I would attemt but I am not very familiar with this
    > module).
    >
    > There is NO excuse for your tone in this thread.
    >
    >


    There is NO excuse for you and your bad spelling to frequent this
    esteemed newsgroup at all. The OP's question was stupid, and in fact
    incapable of being answered. Go away.
     
    Tom, Oct 13, 2004
    #8
  9. Geoff Cox

    Geoff Cox Guest

    On 13 Oct 2004 20:24:19 GMT, "A. Sinan Unur"
    <> wrote:

    >Geoff Cox <> wrote in
    >news::


    >I suspect the handler is being called multiple times, each time with a
    >different part of the original text. You can test this hypothesis by
    >putting a debug statement in here.


    You seem to be correct - I have simplified the code and placed a
    simple html file (below) in d:\fred and the result appears in
    d:\fred\jim and indeed the <p> ... </p>text is there twice. Any ideas
    why?

    Thanks

    Geoff

    package MyParser;
    use base qw(HTML::parser);
    use strict;
    use diagnostics;

    my ($in_heading,$in_p,$fh);

    sub register_fh {

    $fh = $_[1];
    }

    sub reset { ($in_heading,$in_p)=(0,0)}

    sub start {

    my ( $self, $tagname, $attr, undef, $origtext ) = @_;

    if ( $tagname eq 'h2' ) {
    $in_heading = 1;
    return;
    }

    if ( $tagname eq 'p' ) {
    $in_p = 1;
    return;
    }

    }

    sub end {
    my ( $self, $tagname, $origtext ) = @_;

    if ( $tagname eq 'h2' ) {
    $in_heading = 0;
    return;
    }

    if ( $tagname eq 'p' ) {
    $in_p = 0;
    return;
    }

    }

    sub text {
    my ( $self, $origtext ) = @_;

    print $fh "<h2>$origtext</h2> \n" if $in_heading;
    print $fh "<p>$origtext</p> \n" if $in_p;

    }

    package main;

    use File::Find;

    my $dir = "d:/fred";
    my $parser = MyParser->new;

    find sub {
    return if -d $_;

    my $name = $_;
    open( OUT, ">>d:/fred/jim/$name" )
    || die "can't open d:/fred/jim/$name: $!";

    print OUT ("<html><head><title>test</title>
    </head><body> \n");

    $parser->register_fh(\*OUT);
    $parser->parse_file($_);
    $parser->reset;

    print OUT ("</body></html> \n");

    }, $dir;


    --------------- html file ---------------------------
    <html>
    <head>
    <title>test</title>
    </head>

    <body>

    <h2>test file</h2>

    <p>The is some text which I am using to test whether para.pl using
    HTML::parser will output all of the text in this paragraph in one
    paragraph, or, in two smaller paragraphs.</p>


    </body>
    </html>
     
    Geoff Cox, Oct 14, 2004
    #9
  10. Geoff Cox <> wrote in
    news::

    > On 13 Oct 2004 20:24:19 GMT, "A. Sinan Unur"
    > <> wrote:
    >
    >>Geoff Cox <> wrote in
    >>news::

    >
    >>I suspect the handler is being called multiple times, each time with a
    >>different part of the original text. You can test this hypothesis by
    >>putting a debug statement in here.

    >
    > You seem to be correct - I have simplified the code and placed a
    > simple html file (below) in d:\fred and the result appears in
    > d:\fred\jim and indeed the <p> ... </p>text is there twice. Any ideas
    > why?


    I think you probably want to emit the start and end tags only when the
    start and end callbacks are invoked. I tried to shorten your script to deal
    only with the p case:

    use strict;
    use warnings;

    package MyParser;
    use base qw(HTML::parser);

    my ($in_p, $fh);

    sub register_fh { $fh = $_[1]; }

    sub start {
    my ($p, $t, $a, undef, $txt ) = @_;

    if ($t eq 'p') {
    $in_p = 1;
    print $fh '<p>';
    return;
    }
    }

    sub end {
    my ($p, $t, $txt) = @_;

    if ($t eq 'p') {
    $in_p = 0;
    print $fh "</p>\n";
    return;
    }
    }

    sub text {
    my ($p, $txt) = @_;
    print $fh $txt if ($in_p);
    }

    package main;

    my $p = MyParser->new;
    $p->register_fh(\*STDOUT);

    print <<HEADER;
    <html>
    <head>
    <title>Test Output</title>
    </head>
    <body>
    HEADER

    $p->parse_file(\*DATA);

    print <<FOOTER;
    </body>
    </html>
    FOOTER

    __DATA__
    <html>
    <head>
    <title>test</title>
    </head>

    <body>

    <h2>test file</h2>

    <p>The is some text which I am using to test whether para.pl using
    HTML::parser will output all of the text in this paragraph in one
    paragraph, or, in two smaller paragraphs.</p>


    </body>
    </html>
     
    A. Sinan Unur, Oct 14, 2004
    #10
  11. 187 <> wrote:
    > A. Sinan Unur wrote:
    >> Geoff Cox <> wrote in
    >> news::
    >>
    >>> I cannot seem to work out how HTML::parser deals with <p> text </p>
    >>> from an html file ...
    >>>

    >>
    >> Please read the posting guidelines posted here regularly.



    >> So, go back and post a small, self-contained script that still
    >> exhibits the problem.

    >
    > Grated soem sample code would of been nice,



    More than nice. It nearly guarantees a useable answer.

    It is in a poster's best interest to do what they can to get a
    useable answer.


    > Your assine tone, as well as your direct insults to the OP, was
    > completely unwarrented.



    I think you are lacking some pertinent information, and your
    conclusion makes you look foolish.


    > You could of just rplied asking for more information,



    We have done this dozens of times for this OP.

    We have asked this OP many times[1] to see the Posting Guidelines
    if he wants the best chance at getting an answer.

    There is a significant history here that you appear to be ignorant of.



    > There is NO excuse for your tone in this thread.



    There is NO excuse for acting like you know what has happened here
    when you have not been here to see what is happening here, so:

    *plonk*

    You are speaking from ignorance. Mighty embarrassing in such
    a public forum! Perhaps you should have waited until you could
    followup on something that you actually know something about.

    Geoff has proven a rather persistent disregard for the time
    of other people. We are here only to serve his needs, so it
    is no big deal if we have to work a little harder or ignore
    his threads.



    [1] Here are at least 6 such times:

    http://groups.google.com/groups?as_q="Geoff Cox" "Posting Guidelines"&as_ugroup=comp.lang.perl.misc

    I get the feeling we are "talking to the hand" here...

    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Oct 14, 2004
    #11
  12. Geoff Cox

    187 Guest

    Tom wrote:
    > 187 wrote:


    [ snip unwarrented and non-sequitor drivel ]

    FYI I've programming, working networks, IT, and help desks since the
    80's. You know nothing of me. It's people like /you/ who can only
    attempt to form an argument or rebuttal using personal attacks, and try
    to bait fights.

    I care not for you reply, as you have proven you have nothing of worth
    to say. We do no need sheer hate being spread here.
     
    187, Oct 14, 2004
    #12
  13. Geoff Cox <> wrote:

    > I cannot seem to work out how HTML::parser deals with <p> text </p>



    > It breaks up a paragraph by placing a </p> <p> inside a paragraph of
    > text in what seems to me to be a random fashion...



    If you show us your code, we might be able to help fix it.

    Make a short and complete program *that we can run* that shows
    this phantom "</p> <p>" thing, and we will explain to you
    how it got there.


    Why not follow the suggestions given in the Posting Guidelines?

    If you had, the question would have been answered already instead
    of another of your round-and-round threads where there is not
    enough information given to be able to solve the problem.

    Post a small program. We will fix it.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Oct 14, 2004
    #13
  14. Geoff Cox

    187 Guest

    Tad McClellan wrote:
    > 187 <> wrote:
    >> A. Sinan Unur wrote:
    >>> Geoff Cox <> wrote in
    >>> news::
    >>>
    >>>> I cannot seem to work out how HTML::parser deals with <p> text </p>
    >>>> from an html file ...
    >>>>
    >>>
    >>> Please read the posting guidelines posted here regularly.

    >
    >
    >>> So, go back and post a small, self-contained script that still
    >>> exhibits the problem.

    >>
    >> Grated soem sample code would of been nice,


    Sorry for my horrible typing. I've only had a total of 4 hours of sleep
    the past week and a half :(

    >
    > More than nice. It nearly guarantees a useable answer.
    >
    > It is in a poster's best interest to do what they can to get a
    > useable answer.


    I fully agree. It just seemed to me like the OP at least tried to
    explain his problem, though more in words and not so much in code.

    I once again apologize if I got a little "off the deep end" back there.
     
    187, Oct 14, 2004
    #14
  15. Geoff Cox

    Fred Canis Guest

    Tad McClellan wrote:
    > 187 <> wrote:
    >> A. Sinan Unur wrote:
    >>> Geoff Cox <> wrote in
    >>> news::
    >>>
    >>>> I cannot seem to work out how HTML::parser deals with <p> text </p>
    >>>> from an html file ...
    >>>>
    >>>
    >>> Please read the posting guidelines posted here regularly.

    >
    >
    >>> So, go back and post a small, self-contained script that still
    >>> exhibits the problem.

    >>
    >> Grated soem sample code would of been nice,

    >
    >
    > More than nice. It nearly guarantees a useable answer.
    >
    > It is in a poster's best interest to do what they can to get a
    > useable answer.


    True.

    >> Your assine tone, as well as your direct insults to the OP, was
    >> completely unwarrented.

    >
    >
    > I think you are lacking some pertinent information, and your
    > conclusion makes you look foolish.


    I beg to differ. The post in question here wasn't exactly blossoming
    with positive vibes, and as such not likely to help the OP. Come on,
    even you can surely admit this?

    >> You could of just rplied asking for more information,

    >
    >
    > We have done this dozens of times for this OP.
    >
    > We have asked this OP many times[1] to see the Posting Guidelines
    > if he wants the best chance at getting an answer.


    True, but most of which have bene clouded by the rather nagative vibes
    there within.

    > There is a significant history here that you appear to be ignorant of.


    If you mean the history of the OP, well I think it's hardly fair to
    exact everyone who posts or read here to keep track of the posting
    histroy of every poster. It doesn't mean it couldn't be checked, of
    course it could, but I do not think it right at all to give someone such
    a thorough beating for missing one. I hardly think this is been such a
    heinous offense.


    >> There is NO excuse for your tone in this thread.

    >
    >
    > There is NO excuse for acting like you know what has happened here
    > when you have not been here to see what is happening here, so:
    >
    > *plonk*


    Please don't start there. You and others can be great people with a
    wealth of knowlege, but some of you are *way* too quick to puul the
    plonk-trigger. I think what we have here is a mis understanding, at
    least that's how I see it.

    > You are speaking from ignorance. Mighty embarrassing in such
    > a public forum! Perhaps you should have waited until you could
    > followup on something that you actually know something about.



    Again, I don't see the point of chastising a poster for not knowing the
    posting history of every past poster in this group. It's almost absurd
    to expect everyone to know. I regularly frequent this group, rarely
    posting though, but a rather regular reader, and I too have missed what
    ever incident this OP was apparently involved in.


    Maybe best to let thread jsut die instead of going on so bitterly.
     
    Fred Canis, Oct 14, 2004
    #15
  16. Geoff Cox

    187 Guest

    Tad McClellan wrote:

    > You are speaking from ignorance. Mighty embarrassing in such
    > a public forum! Perhaps you should have waited until you could
    > followup on something that you actually know something about.
    >
    > Geoff has proven a rather persistent disregard for the time
    > of other people. We are here only to serve his needs, so it
    > is no big deal if we have to work a little harder or ignore
    > his threads.


    I am sorry I did nto know this at first. I do not have time to keep
    track of everything that goes on in the dozens of groups I suscribe in.
    It seems illogical to me to exact everyone to automatically be aware of
    any and all events. May I should go tosses the Op's name in google (and
    set the group to this one.) I might of done jsut that if I wasn't in
    such a rush.

    I once again want to pologize for how I came out in this. I'm not a bad
    person, I am in fact a tech as well. I love Perl and think it's the best
    thing for programming since sliced bread, and that it's one of the most
    intelligent language I've ver had the privilage to learn and become
    proficient in. Before Perl I was just a C/C++ programmer doing mostly
    freelance. I learned Perl and completely changed how I code many things
    in shell/cgi/etc for both work and personal tasks.

    I hope there are no hard feelings.
     
    187, Oct 14, 2004
    #16
  17. On 2004-10-14, Fred Canis <> wrote:
    > Tad McClellan wrote:
    >
    >> There is a significant history here that you appear to be ignorant of.

    >
    > If you mean the history of the OP, well I think it's hardly fair to
    > exact everyone who posts or read here to keep track of the posting
    > histroy of every poster. It doesn't mean it couldn't be checked, of
    > course it could, but I do not think it right at all to give someone such
    > a thorough beating for missing one. I hardly think this is been such a
    > heinous offense.


    I think tad's point is that if one is going to lambaste people for their
    response to a poster, one might check whether there is some reason for
    that response, rather than that one should check the posting history
    relevant to any given thread.

    dha

    --
    David H. Adler - <> - http://www.panix.com/~dha/
    You kids today have it easy. I remember when we had to write programs
    with an ice pick and index cards. <Note: This joke is not Y2k
    compliant. Soon people will ask, "What's an ice pick?"> - Lee Sharp
     
    David H. Adler, Oct 14, 2004
    #17
  18. "187" <> wrote in
    news::

    > Tad McClellan wrote:
    >
    >> You are speaking from ignorance. Mighty embarrassing in such
    >> a public forum! Perhaps you should have waited until you could
    >> followup on something that you actually know something about.
    >>
    >> Geoff has proven a rather persistent disregard for the time
    >> of other people. We are here only to serve his needs, so it
    >> is no big deal if we have to work a little harder or ignore
    >> his threads.

    >
    > I am sorry I did nto know this at first.


    ....

    > I'm not a bad person, I am in fact a tech as well. I love Perl


    Well, clearly. Why else would you read this group?

    OTOH, I would think that someone who wishes to appear friendly might want
    to avoid the nickname '187'.

    As for my reaction to the OP, I find it particularly unproductive and
    annoying to blame the package rather than look for an error in one's own
    code. This was especially so since the OP provided no code of his own which
    we could have looked at to test his assertion that the package was somehow
    at fault.

    Referring to the original post:

    Geoff Cox <> wrote in
    news::

    > I cannot seem to work out how HTML::parser deals with <p> text </p>
    > from an html file ...
    >
    > It breaks up a paragraph by placing a </p> <p> inside a paragraph of
    > text in what seems to me to be a random fashion...


    The "It" in the second paragraph surely refers to HTML::parser.

    I stand by my belief that this is a "stupid" way to approach programming
    problems. One should ask

    + Here is what I am doing
    + Here is what I would like to have happen
    + Instead, here is what is happening
    + What am I doing wrong?

    Repeatedly ignoring this advice brings to mind the extremely apropos maxim
    (I am probably misquoting this):

    once is happenstance, twice coincidence, thrice is enemy action

    and deserves a somewhat harsher than the usual mild reminder that the OP
    read the posting guidelines.

    Sinan.
     
    A. Sinan Unur, Oct 14, 2004
    #18
  19. Geoff Cox

    Geoff Cox Guest

    On 14 Oct 2004 00:07:44 GMT, "A. Sinan Unur"
    <> wrote:


    >I think you probably want to emit the start and end tags only when the
    >start and end callbacks are invoked. I tried to shorten your script to deal
    >only with the p case:


    Many thanks for the code below - it woks fine and I will try to get to
    grips with how it achieves this. You can no doubt see that I do not
    have a very good understanding of HTML::parser !

    Do you have any suggestions re possible HTML::parser tutorial type
    places on the net? I have looked but not found anything that starts
    far enough back ...

    I have now bought the Perl & LWP book by Sean Burke - any others
    spring to mind?

    Thanks again for your help.

    Cheers

    Geoff







    >use strict;
    >use warnings;
    >
    >package MyParser;
    >use base qw(HTML::parser);
    >
    >my ($in_p, $fh);
    >
    >sub register_fh { $fh = $_[1]; }
    >
    >sub start {
    > my ($p, $t, $a, undef, $txt ) = @_;
    >
    > if ($t eq 'p') {
    > $in_p = 1;
    > print $fh '<p>';
    > return;
    > }
    >}
    >
    >sub end {
    > my ($p, $t, $txt) = @_;
    >
    > if ($t eq 'p') {
    > $in_p = 0;
    > print $fh "</p>\n";
    > return;
    > }
    >}
    >
    >sub text {
    > my ($p, $txt) = @_;
    > print $fh $txt if ($in_p);
    >}
    >
    >package main;
    >
    >my $p = MyParser->new;
    >$p->register_fh(\*STDOUT);
    >
    >print <<HEADER;
    ><html>
    ><head>
    ><title>Test Output</title>
    ></head>
    ><body>
    >HEADER
    >
    >$p->parse_file(\*DATA);
    >
    >print <<FOOTER;
    ></body>
    ></html>
    >FOOTER
    >
    >__DATA__
    ><html>
    ><head>
    ><title>test</title>
    ></head>
    >
    ><body>
    >
    ><h2>test file</h2>
    >
    ><p>The is some text which I am using to test whether para.pl using
    >HTML::parser will output all of the text in this paragraph in one
    >paragraph, or, in two smaller paragraphs.</p>
    >
    >
    ></body>
    ></html>
     
    Geoff Cox, Oct 14, 2004
    #19
  20. Geoff Cox

    187 Guest

    A. Sinan Unur wrote:
    > "187" <> wrote in
    > news::
    >
    >> Tad McClellan wrote:
    >>
    >>> You are speaking from ignorance. Mighty embarrassing in such
    >>> a public forum! Perhaps you should have waited until you could
    >>> followup on something that you actually know something about.
    >>>
    >>> Geoff has proven a rather persistent disregard for the time
    >>> of other people. We are here only to serve his needs, so it
    >>> is no big deal if we have to work a little harder or ignore
    >>> his threads.

    >>
    >> I am sorry I did nto know this at first.

    >
    > ...
    >
    >> I'm not a bad person, I am in fact a tech as well. I love Perl

    >
    > Well, clearly. Why else would you read this group?


    Good point.

    > OTOH, I would think that someone who wishes to appear friendly might
    > want to avoid the nickname '187'.


    Why is that? This is how I uniquely identify myself. If "187" means
    something else that I am not awre of please let me know. Other wise I
    can go back ot posting under my name "Al" (which also appears in my
    email address.)

    > As for my reaction to the OP, I find it particularly unproductive and
    > annoying to blame the package rather than look for an error in one's
    > own code. This was especially so since the OP provided no code of his
    > own which we could have looked at to test his assertion that the
    > package was somehow at fault.


    True. I do admit I was quick to judge and I'm sorry for snapping at you.

    > Referring to the original post:
    >
    > Geoff Cox <> wrote in
    > news::


    That has got to be the longest message ID I've ever laid eyes on
    (between 'news:' amd '@ax.com'.)

    >> I cannot seem to work out how HTML::parser deals with <p> text </p>
    >> from an html file ...
    >>
    >> It breaks up a paragraph by placing a </p> <p> inside a paragraph of
    >> text in what seems to me to be a random fashion...

    >
    > The "It" in the second paragraph surely refers to HTML::parser.
    >
    > I stand by my belief that this is a "stupid" way to approach
    > programming problems. One should ask
    >
    > + Here is what I am doing
    > + Here is what I would like to have happen
    > + Instead, here is what is happening
    > + What am I doing wrong?
    >
    > Repeatedly ignoring this advice brings to mind the extremely apropos
    > maxim (I am probably misquoting this):



    Agreed. No arguements here.

    > once is happenstance, twice coincidence, thrice is enemy action
    >
    > and deserves a somewhat harsher than the usual mild reminder that the
    > OP read the posting guidelines.


    All understood. I am once again sorry, and hope there are no hard
    feelings.
     
    187, Oct 14, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    1
    Views:
    7,113
    Ice Demon
    Jul 15, 2003
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    831
    Paul King
    Oct 5, 2004
  3. Morten W. Petersen

    Behaviour of htmllib's HTML parser and formatter

    Morten W. Petersen, Mar 11, 2005, in forum: Python
    Replies:
    0
    Views:
    334
    Morten W. Petersen
    Mar 11, 2005
  4. Andy Chambers
    Replies:
    1
    Views:
    395
    Daniel Dyer
    May 14, 2007
  5. Zach Dennis

    HTML-Parser / SGML-Parser

    Zach Dennis, Oct 1, 2003, in forum: Ruby
    Replies:
    5
    Views:
    423
    Bernard Delmée
    Oct 1, 2003
Loading...

Share This Page