XML::Twig

Discussion in 'Perl Misc' started by c0rk, Sep 25, 2004.

  1. c0rk

    c0rk Guest

    OK. I am now desperate. I have written a sub routine to slipt up large
    (~2-3MB) XML documents into seperate documents. When I use $twig->
    parsefile I get the following error:

    "not well-formed (invalid token) at line 27072, column 1934, byte 878399
    at C:/Perl/site/lib/XML/Parser.pm line 187"

    When I change to $twig->safe_parsefile I can parse the document, but it
    only gets a portion of the document (~38 of 83 elements).

    I am the first to admit that I am not a Perl hack by trade, so please
    don't rape me for my code sample. I should also mention that this code
    worked great on smaller files ( <300k ).

    Any help/suggestions would be greatly appreciated.

    Brendan


    sub splitFiles {
    my $fPath = $_[0];
    my $twig= new XML::Twig;
    &logMessage("DEBUG - Build the Twig for " . $fPath);
    $twig->safe_parsefile($fPath); # build the twig
    &logMessage("DEBUG - I can parse the file");
    my $root = $twig->root; # get the root of the twig
    (vdf_metadata_list)
    &logMessage("DEBUG - Videos: ". $root->children_count);
    my @videos = $root->children; # put the vdf_metadata elements into
    an array
    if (scalar @videos > 0 ) {
    &logMessage("DEBUG - Number of videos is " . scalar @videos);
    my $i = 0;
    foreach my $video (@videos) {
    $i++;
    my $timeStamp = gettimeofday;
    my $tmpPath = "$tmpDir".$timeStamp.$i;
    my $FH;
    open($FH, ">$tmpPath") || die("cannot open file: " . $!);
    $video->print($FH);
    close (FH);
    }
    } else {
    &logMessage("DEBUG - Skipping file " . $fPath);
    }
    }
    c0rk, Sep 25, 2004
    #1
    1. Advertising

  2. c0rk wrote:
    > OK. I am now desperate. I have written a sub routine to slipt up large
    > (~2-3MB) XML documents into seperate documents. When I use $twig->
    > parsefile I get the following error:
    >
    > "not well-formed (invalid token) at line 27072, column 1934, byte 878399
    > at C:/Perl/site/lib/XML/Parser.pm line 187"


    Well, in the absense of any evidence to the contrary I'm be inclined to
    accept that at face value.

    Do you have a reason to disbelive it?
    Brian McCauley, Sep 25, 2004
    #2
    1. Advertising

  3. c0rk <> wrote:

    > When I use $twig->
    > parsefile I get the following error:
    >
    > "not well-formed (invalid token) at line 27072, column 1934, byte 878399
    > at C:/Perl/site/lib/XML/Parser.pm line 187"



    This message means that there is something wrong with the _data_
    rather than with the code.

    Open the data file to the 1934th character on the 27072nd line
    and see what it is that makes it invalid XML.



    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Sep 25, 2004
    #3
  4. c0rk

    c0rk Guest

    Brian McCauley <> wrote in
    news:cj46h1$v2m$:

    >
    >
    > c0rk wrote:
    >> OK. I am now desperate. I have written a sub routine to slipt up
    >> large (~2-3MB) XML documents into seperate documents. When I use
    >> $twig-> parsefile I get the following error:
    >>
    >> "not well-formed (invalid token) at line 27072, column 1934, byte
    >> 878399 at C:/Perl/site/lib/XML/Parser.pm line 187"

    >
    > Well, in the absense of any evidence to the contrary I'm be inclined
    > to accept that at face value.
    >
    > Do you have a reason to disbelive it?
    >


    Brian

    You know - I have been working on this script since Thursday, trying to
    determine _my_ problem. When I saw this error, I took it as there was an
    error in my processing method (i.e. memory problem). For whatever reason, I
    just didn't read the error message for what it was. Turns out that the XML
    has bad characters in it. I replaced those characters and my script
    processed a 3MB file in seconds.

    Many thanks for your response!

    -c
    c0rk, Sep 26, 2004
    #4
  5. c0rk

    c0rk Guest

    Tad McClellan <> wrote in
    news::

    > c0rk <> wrote:
    >
    >> When I use $twig->
    >> parsefile I get the following error:
    >>
    >> "not well-formed (invalid token) at line 27072, column 1934, byte
    >> 878399 at C:/Perl/site/lib/XML/Parser.pm line 187"

    >
    >
    > This message means that there is something wrong with the _data_
    > rather than with the code.
    >
    > Open the data file to the 1934th character on the 27072nd line
    > and see what it is that makes it invalid XML.
    >
    >
    >


    Tad,

    thanks for the response. you are 100% correct. I replaced the bad
    characters at the specified location, and life is good!!!

    Thanks,

    -c
    c0rk, Sep 26, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sherman Willden
    Replies:
    4
    Views:
    635
    Sherman Willden
    Aug 8, 2003
  2. Sherman Willden
    Replies:
    1
    Views:
    122
    Sisyphus
    Jul 25, 2003
  3. Sherman Willden
    Replies:
    3
    Views:
    163
    Sherman Willden
    Aug 8, 2003
  4. Andres Monroy-Hernandez

    XML::Twig constructor disregarding map_xmlns - bug in module?

    Andres Monroy-Hernandez, Aug 29, 2004, in forum: Perl Misc
    Replies:
    0
    Views:
    104
    Andres Monroy-Hernandez
    Aug 29, 2004
  5. alwaysonnet

    Get XML content using XML::Twig

    alwaysonnet, Apr 21, 2010, in forum: Perl Misc
    Replies:
    19
    Views:
    183
    Klaus
    Apr 29, 2010
Loading...

Share This Page