XML::Twig

C

c0rk

OK. I am now desperate. I have written a sub routine to slipt up large
(~2-3MB) XML documents into seperate documents. When I use $twig->
parsefile I get the following error:

"not well-formed (invalid token) at line 27072, column 1934, byte 878399
at C:/Perl/site/lib/XML/Parser.pm line 187"

When I change to $twig->safe_parsefile I can parse the document, but it
only gets a portion of the document (~38 of 83 elements).

I am the first to admit that I am not a Perl hack by trade, so please
don't rape me for my code sample. I should also mention that this code
worked great on smaller files ( <300k ).

Any help/suggestions would be greatly appreciated.

Brendan


sub splitFiles {
my $fPath = $_[0];
my $twig= new XML::Twig;
&logMessage("DEBUG - Build the Twig for " . $fPath);
$twig->safe_parsefile($fPath); # build the twig
&logMessage("DEBUG - I can parse the file");
my $root = $twig->root; # get the root of the twig
(vdf_metadata_list)
&logMessage("DEBUG - Videos: ". $root->children_count);
my @videos = $root->children; # put the vdf_metadata elements into
an array
if (scalar @videos > 0 ) {
&logMessage("DEBUG - Number of videos is " . scalar @videos);
my $i = 0;
foreach my $video (@videos) {
$i++;
my $timeStamp = gettimeofday;
my $tmpPath = "$tmpDir".$timeStamp.$i;
my $FH;
open($FH, ">$tmpPath") || die("cannot open file: " . $!);
$video->print($FH);
close (FH);
}
} else {
&logMessage("DEBUG - Skipping file " . $fPath);
}
}
 
B

Brian McCauley

c0rk said:
OK. I am now desperate. I have written a sub routine to slipt up large
(~2-3MB) XML documents into seperate documents. When I use $twig->
parsefile I get the following error:

"not well-formed (invalid token) at line 27072, column 1934, byte 878399
at C:/Perl/site/lib/XML/Parser.pm line 187"

Well, in the absense of any evidence to the contrary I'm be inclined to
accept that at face value.

Do you have a reason to disbelive it?
 
T

Tad McClellan

c0rk said:
When I use $twig->
parsefile I get the following error:

"not well-formed (invalid token) at line 27072, column 1934, byte 878399
at C:/Perl/site/lib/XML/Parser.pm line 187"


This message means that there is something wrong with the _data_
rather than with the code.

Open the data file to the 1934th character on the 27072nd line
and see what it is that makes it invalid XML.
 
C

c0rk

Well, in the absense of any evidence to the contrary I'm be inclined
to accept that at face value.

Do you have a reason to disbelive it?

Brian

You know - I have been working on this script since Thursday, trying to
determine _my_ problem. When I saw this error, I took it as there was an
error in my processing method (i.e. memory problem). For whatever reason, I
just didn't read the error message for what it was. Turns out that the XML
has bad characters in it. I replaced those characters and my script
processed a 3MB file in seconds.

Many thanks for your response!

-c
 
C

c0rk

This message means that there is something wrong with the _data_
rather than with the code.

Open the data file to the 1934th character on the 27072nd line
and see what it is that makes it invalid XML.

Tad,

thanks for the response. you are 100% correct. I replaced the bad
characters at the specified location, and life is good!!!

Thanks,

-c
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top