Problem with perlsax splitting the calls to characters callback

Discussion in 'Perl Misc' started by raga, Oct 13, 2008.

  1. raga

    raga Guest

    From the link given here :
    http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
    Perl sax seems to split the characters call for a single entity.
    Though this is wierd.(not sure if there is a genuine reason) it is
    fine.. as all belong to same entity, we can simply append all the
    characters calls.
    However ,sadly it just calls the characters api with an unwanted
    space.
    Eg: i've tag < tag1>mynameisrs</tag>
    it calls characters("myname") characters(" ") characters("isrs") ,
    It is not atall predictible why it is doing this way. coz the problem
    is when i append it becomes "myname isrs".
    Any help is appreciated.
    Thanks
    raga, Oct 13, 2008
    #1
    1. Advertising

  2. raga wrote:
    > From the link given here :
    > http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
    > Perl sax seems to split the characters call for a single entity.
    > Though this is wierd.(not sure if there is a genuine reason) it is
    > fine.. as all belong to same entity, we can simply append all the
    > characters calls.


    The URL you provide says this:

    "The Parser will call this method to report each chunk of character
    data. SAX parsers may return all contiguous character data in a single
    chunk, or they may split it into several chunks;"


    > However ,sadly it just calls the characters api with an unwanted
    > space.
    > Eg: i've tag < tag1>mynameisrs</tag>


    That isn't well formed XML and so cant be parsed.
    1. you have a space in front of the firts tag name.
    2. you open tag1 but close tag.


    > it calls characters("myname") characters(" ") characters("isrs") ,
    > It is not atall predictible why it is doing this way.


    In my experience it is always sufficiently predictable. Probably your
    mynameisrs data is split over several lines and you've not written your
    handler to take this into account.


    $ cat sax.pl
    #!/usr/local/bin/perl
    use strict;
    use warnings;
    use XML::parser::perlSAX;

    my $xml="<tag>mynameisrs</tag>";

    my $handler = MyHandler->new();
    my $parser = XML::parser::perlSAX->new(Handler=>$handler);

    $parser->parse($xml);


    package MyHandler;
    use strict;
    use warnings;
    use Data::Dumper;

    sub new {
    my $type = shift;
    return bless {}, $type;
    }

    my $current_element = '';

    sub start_element {
    my ($self, $element) = @_;
    $current_element = $element->{Name};
    print "Start: <$current_element>\n";
    }

    sub end_element {
    my ($self, $element) = @_;
    print "End: \n";
    }

    sub characters {
    my ($self, $characters) = @_;
    my $text = $characters->{Data};
    print "Characters: '$text'\n";
    }

    1;


    $ perl sax.pl
    Start: <tag>
    Characters: 'mynameisrs'
    End:



    --
    RGB
    RedGrittyBrick, Oct 13, 2008
    #2
    1. Advertising

  3. raga

    raga Guest

    On Oct 13, 5:06 pm, RedGrittyBrick <>
    wrote:
    > raga wrote:
    > > From the link given here :
    > >http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
    > > Perl sax seems to split the characters call for a single entity.
    > > Though this is wierd.(not sure if there is a genuine reason)  it is
    > > fine.. as all belong to same entity, we can simply append all the
    > > characters calls.

    >
    > The URL you provide says this:
    >
    > "The Parser will call this method to report each chunk of character
    > data. SAX parsers may return all contiguous character data in a single
    > chunk, or they may split it into several chunks;"
    >
    > > However ,sadly it just calls the characters api with an unwanted
    > > space.
    > > Eg: i've tag < tag1>mynameisrs</tag>

    >
    > That isn't well formed XML and so cant be parsed.
    > 1. you have a space in front of the firts tag name.
    > 2. you open tag1 but close tag.
    >
    > > it calls characters("myname") characters(" ") characters("isrs") ,
    > > It is not atall predictible why it is doing this way.

    >
    > In my experience it is always sufficiently predictable. Probably your
    > mynameisrs data is split over several lines and you've not written your
    > handler to take this into account.
    >
    > $ cat sax.pl
    > #!/usr/local/bin/perl
    > use strict;
    > use warnings;
    > use XML::parser::perlSAX;
    >
    > my $xml="<tag>mynameisrs</tag>";
    >
    > my $handler = MyHandler->new();
    > my $parser = XML::parser::perlSAX->new(Handler=>$handler);
    >
    > $parser->parse($xml);
    >
    > package MyHandler;
    > use strict;
    > use warnings;
    > use Data::Dumper;
    >
    > sub new {
    >    my $type = shift;
    >    return bless {}, $type;
    >
    > }
    >
    > my $current_element = '';
    >
    > sub start_element {
    >      my ($self, $element) = @_;
    >      $current_element = $element->{Name};
    >      print "Start: <$current_element>\n";
    >
    > }
    >
    > sub end_element {
    >      my ($self, $element) = @_;
    >      print "End: \n";
    >
    > }
    >
    > sub characters {
    >      my ($self, $characters) = @_;
    >      my $text = $characters->{Data};
    >      print "Characters: '$text'\n";
    >
    > }
    >
    > 1;
    >
    > $ perl sax.pl
    > Start: <tag>
    > Characters: 'mynameisrs'
    > End:
    >
    > --
    > RGB


    sorry for the wrong input provided earlier.. it was my hurry to type
    quickly
    i intended to type <tag>mynameisrs</tag>

    Yes, the perlsax occasionally splits the chars to multiple calls. ur
    snip doesnt seems to handle it!.
    My actual query is in addition to the calls made to the charchters api
    with the split chunks, it randomly calls the characters API with a
    unwanted space..
    Thanks again for ur earlier reply.
    raga, Oct 13, 2008
    #3
  4. raga wrote:
    > On Oct 13, 5:06 pm, RedGrittyBrick <>
    > wrote:
    >> raga wrote:
    >>> From the link given here :
    >>> http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
    >>> Perl sax seems to split the characters call for a single entity.
    >>> Though this is wierd.(not sure if there is a genuine reason) it is
    >>> fine.. as all belong to same entity, we can simply append all the
    >>> characters calls.

    >> The URL you provide says this:
    >>
    >> "The Parser will call this method to report each chunk of character
    >> data. SAX parsers may return all contiguous character data in a single
    >> chunk, or they may split it into several chunks;"
    >>
    >>> However ,sadly it just calls the characters api with an unwanted
    >>> space.
    >>> Eg: i've tag < tag1>mynameisrs</tag>

    >> That isn't well formed XML and so cant be parsed.
    >> 1. you have a space in front of the firts tag name.
    >> 2. you open tag1 but close tag.
    >>
    >>> it calls characters("myname") characters(" ") characters("isrs") ,
    >>> It is not atall predictible why it is doing this way.

    >> In my experience it is always sufficiently predictable. Probably your
    >> mynameisrs data is split over several lines and you've not written your
    >> handler to take this into account.
    >> [perl program omitted]

    >
    > sorry for the wrong input provided earlier.. it was my hurry to type
    > quickly
    > i intended to type <tag>mynameisrs</tag>
    >
    > Yes, the perlsax occasionally splits the chars to multiple calls. ur
    > snip doesnt seems to handle it!.


    My program wasn't intended to handle it, it was intended to show that no
    unexpected space characters are inserted.

    > My actual query is in addition to the calls made to the charchters api
    > with the split chunks, it randomly calls the characters API with a
    > unwanted space..


    It never does for me!

    Create and post a short working program that shows it!

    --
    RGB
    RedGrittyBrick, Oct 13, 2008
    #4
  5. raga <> wrote:


    > sorry for the wrong input provided earlier.. it was my hurry to type
    > quickly



    You should not attempt to type code or data at all.

    You should instead copy/paste it so that you do not insert
    errors that are not in your real code or data.

    Please see the Posting Guidelines that are posted here frequently.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Oct 13, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page