Problem with perlsax splitting the calls to characters callback

R

raga

From the link given here :
http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
Perl sax seems to split the characters call for a single entity.
Though this is wierd.(not sure if there is a genuine reason) it is
fine.. as all belong to same entity, we can simply append all the
characters calls.
However ,sadly it just calls the characters api with an unwanted
space.
Eg: i've tag < tag1>mynameisrs</tag>
it calls characters("myname") characters(" ") characters("isrs") ,
It is not atall predictible why it is doing this way. coz the problem
is when i append it becomes "myname isrs".
Any help is appreciated.
Thanks
 
R

RedGrittyBrick

raga said:
From the link given here :
http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
Perl sax seems to split the characters call for a single entity.
Though this is wierd.(not sure if there is a genuine reason) it is
fine.. as all belong to same entity, we can simply append all the
characters calls.

The URL you provide says this:

"The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks;"

However ,sadly it just calls the characters api with an unwanted
space.
Eg: i've tag < tag1>mynameisrs</tag>

That isn't well formed XML and so cant be parsed.
1. you have a space in front of the firts tag name.
2. you open tag1 but close tag.

it calls characters("myname") characters(" ") characters("isrs") ,
It is not atall predictible why it is doing this way.

In my experience it is always sufficiently predictable. Probably your
mynameisrs data is split over several lines and you've not written your
handler to take this into account.


$ cat sax.pl
#!/usr/local/bin/perl
use strict;
use warnings;
use XML::parser::perlSAX;

my $xml="<tag>mynameisrs</tag>";

my $handler = MyHandler->new();
my $parser = XML::parser::perlSAX->new(Handler=>$handler);

$parser->parse($xml);


package MyHandler;
use strict;
use warnings;
use Data::Dumper;

sub new {
my $type = shift;
return bless {}, $type;
}

my $current_element = '';

sub start_element {
my ($self, $element) = @_;
$current_element = $element->{Name};
print "Start: <$current_element>\n";
}

sub end_element {
my ($self, $element) = @_;
print "End: \n";
}

sub characters {
my ($self, $characters) = @_;
my $text = $characters->{Data};
print "Characters: '$text'\n";
}

1;


$ perl sax.pl
Start: <tag>
Characters: 'mynameisrs'
End:
 
R

raga

The URL you provide says this:

"The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks;"


That isn't well formed XML and so cant be parsed.
1. you have a space in front of the firts tag name.
2. you open tag1 but close tag.


In my experience it is always sufficiently predictable. Probably your
mynameisrs data is split over several lines and you've not written your
handler to take this into account.

$ cat sax.pl
#!/usr/local/bin/perl
use strict;
use warnings;
use XML::parser::perlSAX;

my $xml="<tag>mynameisrs</tag>";

my $handler = MyHandler->new();
my $parser = XML::parser::perlSAX->new(Handler=>$handler);

$parser->parse($xml);

package MyHandler;
use strict;
use warnings;
use Data::Dumper;

sub new {
   my $type = shift;
   return bless {}, $type;

}

my $current_element = '';

sub start_element {
     my ($self, $element) = @_;
     $current_element = $element->{Name};
     print "Start: <$current_element>\n";

}

sub end_element {
     my ($self, $element) = @_;
     print "End: \n";

}

sub characters {
     my ($self, $characters) = @_;
     my $text = $characters->{Data};
     print "Characters: '$text'\n";

}

1;

$ perl sax.pl
Start: <tag>
Characters: 'mynameisrs'
End:

sorry for the wrong input provided earlier.. it was my hurry to type
quickly
i intended to type <tag>mynameisrs</tag>

Yes, the perlsax occasionally splits the chars to multiple calls. ur
snip doesnt seems to handle it!.
My actual query is in addition to the calls made to the charchters api
with the split chunks, it randomly calls the characters API with a
unwanted space..
Thanks again for ur earlier reply.
 
R

RedGrittyBrick

raga said:
raga said:
From the link given here :
http://search.cpan.org/~kmacleod/libxml-perl-0.08/doc/PerlSAX.pod
Perl sax seems to split the characters call for a single entity.
Though this is wierd.(not sure if there is a genuine reason) it is
fine.. as all belong to same entity, we can simply append all the
characters calls.
The URL you provide says this:

"The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks;"
However ,sadly it just calls the characters api with an unwanted
space.
Eg: i've tag < tag1>mynameisrs</tag>
That isn't well formed XML and so cant be parsed.
1. you have a space in front of the firts tag name.
2. you open tag1 but close tag.
it calls characters("myname") characters(" ") characters("isrs") ,
It is not atall predictible why it is doing this way.
In my experience it is always sufficiently predictable. Probably your
mynameisrs data is split over several lines and you've not written your
handler to take this into account.
[perl program omitted]

sorry for the wrong input provided earlier.. it was my hurry to type
quickly
i intended to type <tag>mynameisrs</tag>

Yes, the perlsax occasionally splits the chars to multiple calls. ur
snip doesnt seems to handle it!.

My program wasn't intended to handle it, it was intended to show that no
unexpected space characters are inserted.
My actual query is in addition to the calls made to the charchters api
with the split chunks, it randomly calls the characters API with a
unwanted space..

It never does for me!

Create and post a short working program that shows it!
 
T

Tad J McClellan

sorry for the wrong input provided earlier.. it was my hurry to type
quickly


You should not attempt to type code or data at all.

You should instead copy/paste it so that you do not insert
errors that are not in your real code or data.

Please see the Posting Guidelines that are posted here frequently.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top