Copy characterdata from XML file to XML file

Discussion in 'Perl Misc' started by Eric van Oorschot, Dec 7, 2005.

  1. Hi,

    I'm writing a Perl script that has to copy a block of data (nodes numbers
    and coordinates) from one XML formatted file into another XML file.
    I'm using XML::parser to extract the data and XML::Writer to write the
    data into the second file.

    This does not work, since some of the numbers are corrupted after being
    read by XML::parser. Below I have copied a small bit that shows how the
    data is corrupted. It always happens at the same line(s) of data.

    67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    602E-001
    70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001

    My Perl script (I am not an experienced Perl programmer) is shown below.
    The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    data is read and copied into a hash %tables. When writing this hash in the
    output file the error shown above is found.

    If anyone has an idea, or needs more info, please reply.

    Regards,

    Eric


    use XML::parser;
    use IO::File;
    use Switch ;
    use XML::Writer;

    my $fmsfile = shift ; # fms output file
    my $reffile = shift ; # Exchange output deck
    my $outfile = shift ; # Output file

    die "Cannot find fms output file \"$xmlfile\""
    unless -f $fmsfile;

    die "Cannot find xml input deck \"$reffile\""
    unless -f $reffile;

    my $output = new IO::File(">$outfile");
    my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );

    #
    # Find tmax in fms file
    #
    my $tmax = 0.00 ;
    open ( IN, $fmsfile ) ;
    while ( <IN> ) {
    if ( /TIME/ ) {
    ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    $tmax = $ti if ( $ti > $tmax ) ;
    }
    }
    close (IN) ;

    $tag = "";

    my %tables ; # hash with coordinates from fms file
    my $model ; # naam van het FE model
    my $i = 0 ; #
    # Readfile to create hash of the coordinate tables
    #
    my $parser = new XML::parser;

    $parser->setHandlers( Char => \&ReadCharacterData,
    Default => \&default);
    print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);

    ## Check info read in fms file
    #foreach $i ( keys %tables ) {
    # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    # }

    my $coords = 0 ;

    #
    # Read reffile and replace coordinate tables with data from fms file
    #
    my $bparser = new XML::parser;
    $bparser->setHandlers( XMLDecl => \&XmlDecl,
    Doctype => \&DocType,
    Start => \&startElement,
    End => \&endElement,
    Char => \&characterData,
    CdataStart => \&cdatastart,
    CdataEnd => \&cdataend,
    Default => \&default);
    print "Reading ($reffile) and writing ($outfile) \n" ;
    $bparser->parsefile($reffile);

    $writer->end() ;

    #
    ########################################################################
    #

    sub XmlDecl {
    my( $parseinst, $version, $encoding, $standalone ) = @_;
    $writer->xmlDecl( $encoding, $standalone );
    }

    sub DocType {
    my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    $writer->doctype( $name, $pub, $sysid );
    }

    sub startElement {
    # Reading xml data
    my( $parseinst, $element, %attrs ) = @_;
    SWITCH: {
    if ($element eq "FE_MODEL") {
    $model = $attrs{'NAME'} ;
    $tag = "DEFINE";
    # print "FE model $model\n" ;
    last SWITCH;
    }
    if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    $coords = 1 ;
    # print "$coords - TABLE COORDINATES\n" ;
    }
    last SWITCH ;
    }
    $writer -> startTag( $element , %attrs );
    }

    sub endElement {

    my( $parseinst, $element ) = @_;
    $coords = 0 ;
    $writer -> endTag( $element ) ;
    }

    sub ReadCharacterData {
    my( $parseinst, $data ) = @_;
    SWITCH: {
    if ( $data =~ /^\s*$/ ) {
    last ;
    };
    if ( $data =~ /TIME/ ) {
    ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    # print "Timepoint ", $ti, "\n" ;
    last ;
    } ;
    if ( $data =~ /FE MODEL/ ) {
    ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    # print $txt, "\n" ;
    $tables{$txt} = ' ' ;
    last ;
    } ;
    if ( $ti == $tmax ) {
    # print $data ;
    $tables{$txt} .= $data . "\n" ;
    last ;
    } ;
    }
    }


    sub characterData {
    my( $parseinst, $data ) = @_;
    if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    $writer -> characters ( $tables{$model} ) ;
    $tables{$model} = ' ' ; #empty table
    }
    elsif ( ! $coords ) {
    # print "Coords $coords : $data";
    $writer -> characters( $data ) ;
    }
    }

    sub cdatastart {
    $writer -> raw( "<![CDATA[\n" );
    }
    sub cdataend {
    $writer -> raw( "]]>\n" );
    }
    sub default {

    # do nothing, but stay quiet

    }
     
    Eric van Oorschot, Dec 7, 2005
    #1
    1. Advertising

  2. Eric van Oorschot

    John Bokma Guest

    Eric van Oorschot <4all.nl> wrote:

    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes
    > numbers and coordinates) from one XML formatted file into another XML
    > file. I'm using XML::parser to extract the data and XML::Writer to
    > write the data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after
    > being read by XML::parser. Below I have copied a small bit that shows
    > how the data is corrupted. It always happens at the same line(s) of
    > data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001


    Be aware that character data, as also stated in the documentation, is
    *not* available in your handler in one big string. The handler might get
    called several times. A quick glance showed me that you're not aware of
    this / not doing it right. What you should do:

    if an start element is found: reset the global string buffer
    if char data is found: glue it to the global buffer
    if an end element is found, process the global string buffer

    HTH,


    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    I ploink googlegroups.com :)
     
    John Bokma, Dec 7, 2005
    #2
    1. Advertising

  3. Eric van Oorschot

    Guest

    Eric van Oorschot wrote:
    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes numbers
    > and coordinates) from one XML formatted file into another XML file.
    > I'm using XML::parser to extract the data and XML::Writer to write the
    > data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after being
    > read by XML::parser. Below I have copied a small bit that shows how the
    > data is corrupted. It always happens at the same line(s) of data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001
    >
    > My Perl script (I am not an experienced Perl programmer) is shown below.
    > The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    > data is read and copied into a hash %tables. When writing this hash in the
    > output file the error shown above is found.
    >
    > If anyone has an idea, or needs more info, please reply.
    >
    > Regards,
    >
    > Eric
    >
    >
    > use XML::parser;
    > use IO::File;
    > use Switch ;
    > use XML::Writer;
    >
    > my $fmsfile = shift ; # fms output file
    > my $reffile = shift ; # Exchange output deck
    > my $outfile = shift ; # Output file
    >
    > die "Cannot find fms output file \"$xmlfile\""
    > unless -f $fmsfile;
    >
    > die "Cannot find xml input deck \"$reffile\""
    > unless -f $reffile;
    >
    > my $output = new IO::File(">$outfile");
    > my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );
    >
    > #
    > # Find tmax in fms file
    > #
    > my $tmax = 0.00 ;
    > open ( IN, $fmsfile ) ;
    > while ( <IN> ) {
    > if ( /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    > $tmax = $ti if ( $ti > $tmax ) ;
    > }
    > }
    > close (IN) ;
    >
    > $tag = "";
    >
    > my %tables ; # hash with coordinates from fms file
    > my $model ; # naam van het FE model
    > my $i = 0 ; #
    > # Readfile to create hash of the coordinate tables
    > #
    > my $parser = new XML::parser;
    >
    > $parser->setHandlers( Char => \&ReadCharacterData,
    > Default => \&default);
    > print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);
    >
    > ## Check info read in fms file
    > #foreach $i ( keys %tables ) {
    > # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    > # }
    >
    > my $coords = 0 ;
    >
    > #
    > # Read reffile and replace coordinate tables with data from fms file
    > #
    > my $bparser = new XML::parser;
    > $bparser->setHandlers( XMLDecl => \&XmlDecl,
    > Doctype => \&DocType,
    > Start => \&startElement,
    > End => \&endElement,
    > Char => \&characterData,
    > CdataStart => \&cdatastart,
    > CdataEnd => \&cdataend,
    > Default => \&default);
    > print "Reading ($reffile) and writing ($outfile) \n" ;
    > $bparser->parsefile($reffile);
    >
    > $writer->end() ;
    >
    > #
    > ########################################################################
    > #
    >
    > sub XmlDecl {
    > my( $parseinst, $version, $encoding, $standalone ) = @_;
    > $writer->xmlDecl( $encoding, $standalone );
    > }
    >
    > sub DocType {
    > my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    > $writer->doctype( $name, $pub, $sysid );
    > }
    >
    > sub startElement {
    > # Reading xml data
    > my( $parseinst, $element, %attrs ) = @_;
    > SWITCH: {
    > if ($element eq "FE_MODEL") {
    > $model = $attrs{'NAME'} ;
    > $tag = "DEFINE";
    > # print "FE model $model\n" ;
    > last SWITCH;
    > }
    > if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    > $coords = 1 ;
    > # print "$coords - TABLE COORDINATES\n" ;
    > }
    > last SWITCH ;
    > }
    > $writer -> startTag( $element , %attrs );
    > }
    >
    > sub endElement {
    >
    > my( $parseinst, $element ) = @_;
    > $coords = 0 ;
    > $writer -> endTag( $element ) ;
    > }
    >
    > sub ReadCharacterData {
    > my( $parseinst, $data ) = @_;
    > SWITCH: {
    > if ( $data =~ /^\s*$/ ) {
    > last ;
    > };
    > if ( $data =~ /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    > # print "Timepoint ", $ti, "\n" ;
    > last ;
    > } ;
    > if ( $data =~ /FE MODEL/ ) {
    > ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    > ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    > # print $txt, "\n" ;
    > $tables{$txt} = ' ' ;
    > last ;
    > } ;
    > if ( $ti == $tmax ) {
    > # print $data ;
    > $tables{$txt} .= $data . "\n" ;
    > last ;
    > } ;
    > }
    > }
    >
    >
    > sub characterData {
    > my( $parseinst, $data ) = @_;
    > if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    > $writer -> characters ( $tables{$model} ) ;
    > $tables{$model} = ' ' ; #empty table
    > }
    > elsif ( ! $coords ) {
    > # print "Coords $coords : $data";
    > $writer -> characters( $data ) ;
    > }
    > }
    >
    > sub cdatastart {
    > $writer -> raw( "<![CDATA[\n" );
    > }
    > sub cdataend {
    > $writer -> raw( "]]>\n" );
    > }
    > sub default {
    >
    > # do nothing, but stay quiet
    >
    > }


    3rd try:
    I agree with John Bokma. You have to have the start and end handlers as
    well.
    Posibly something like this --

    my $RD_xml = '';
    my $last_content = '';
    my $RD_xml = '';
    my $special_tag = 0;

    sub default_start_handler
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';

    ## Check for start of singular tag data capture
    ## -----------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT')
    { $special_tag = 1; }

    ## Check for start of XML chunk data capture
    ## -----------------------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 1;
    }
    if ($capturing_Is_part_of_larger_xml)
    { $RD_xml .= $p->original_string; }
    }

    sub default_content_handler
    {
    my ($p, $str) = @_;

    ## Use original for entities, incase reparse
    ## --------------------------------------------
    $str = $p->original_string;

    ## Remove leading/trailing space, newline, tab
    ## if you want to do this now....
    ## -----------------------------------------------
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;

    ## Capture what is necessary. Last content is
    ## always captured by default
    ## -----------------------------------------------
    if (length ($str) > 0) {
    $last_content .= $str;
    $RD_xml .= $str if ($capturing_Is_part_of_larger_xml);
    }
    }

    sub default_end_handler
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## Handle singular capture of special tag data
    ## ---------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT') {
    ProcessContent ($last_content) if ($special_tag);
    $special_tag = 0;
    }
    $last_content = '';

    ## Handle larger capture XML chunks
    ## -----------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    if ($capturing_Is_part_of_larger_xml) {
    $RD_xml .= $p->original_string;
    ProcessXmlChunk ($RD_xml);
    }
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 0;
    }
    }
     
    , Dec 8, 2005
    #3
  4. Eric van Oorschot

    Guest

    Eric van Oorschot wrote:
    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes numbers
    > and coordinates) from one XML formatted file into another XML file.
    > I'm using XML::parser to extract the data and XML::Writer to write the
    > data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after being
    > read by XML::parser. Below I have copied a small bit that shows how the
    > data is corrupted. It always happens at the same line(s) of data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001
    >
    > My Perl script (I am not an experienced Perl programmer) is shown below.
    > The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    > data is read and copied into a hash %tables. When writing this hash in the
    > output file the error shown above is found.
    >
    > If anyone has an idea, or needs more info, please reply.
    >
    > Regards,
    >
    > Eric
    >
    >
    > use XML::parser;
    > use IO::File;
    > use Switch ;
    > use XML::Writer;
    >
    > my $fmsfile = shift ; # fms output file
    > my $reffile = shift ; # Exchange output deck
    > my $outfile = shift ; # Output file
    >
    > die "Cannot find fms output file \"$xmlfile\""
    > unless -f $fmsfile;
    >
    > die "Cannot find xml input deck \"$reffile\""
    > unless -f $reffile;
    >
    > my $output = new IO::File(">$outfile");
    > my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );
    >
    > #
    > # Find tmax in fms file
    > #
    > my $tmax = 0.00 ;
    > open ( IN, $fmsfile ) ;
    > while ( <IN> ) {
    > if ( /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    > $tmax = $ti if ( $ti > $tmax ) ;
    > }
    > }
    > close (IN) ;
    >
    > $tag = "";
    >
    > my %tables ; # hash with coordinates from fms file
    > my $model ; # naam van het FE model
    > my $i = 0 ; #
    > # Readfile to create hash of the coordinate tables
    > #
    > my $parser = new XML::parser;
    >
    > $parser->setHandlers( Char => \&ReadCharacterData,
    > Default => \&default);
    > print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);
    >
    > ## Check info read in fms file
    > #foreach $i ( keys %tables ) {
    > # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    > # }
    >
    > my $coords = 0 ;
    >
    > #
    > # Read reffile and replace coordinate tables with data from fms file
    > #
    > my $bparser = new XML::parser;
    > $bparser->setHandlers( XMLDecl => \&XmlDecl,
    > Doctype => \&DocType,
    > Start => \&startElement,
    > End => \&endElement,
    > Char => \&characterData,
    > CdataStart => \&cdatastart,
    > CdataEnd => \&cdataend,
    > Default => \&default);
    > print "Reading ($reffile) and writing ($outfile) \n" ;
    > $bparser->parsefile($reffile);
    >
    > $writer->end() ;
    >
    > #
    > ########################################################################
    > #
    >
    > sub XmlDecl {
    > my( $parseinst, $version, $encoding, $standalone ) = @_;
    > $writer->xmlDecl( $encoding, $standalone );
    > }
    >
    > sub DocType {
    > my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    > $writer->doctype( $name, $pub, $sysid );
    > }
    >
    > sub startElement {
    > # Reading xml data
    > my( $parseinst, $element, %attrs ) = @_;
    > SWITCH: {
    > if ($element eq "FE_MODEL") {
    > $model = $attrs{'NAME'} ;
    > $tag = "DEFINE";
    > # print "FE model $model\n" ;
    > last SWITCH;
    > }
    > if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    > $coords = 1 ;
    > # print "$coords - TABLE COORDINATES\n" ;
    > }
    > last SWITCH ;
    > }
    > $writer -> startTag( $element , %attrs );
    > }
    >
    > sub endElement {
    >
    > my( $parseinst, $element ) = @_;
    > $coords = 0 ;
    > $writer -> endTag( $element ) ;
    > }
    >
    > sub ReadCharacterData {
    > my( $parseinst, $data ) = @_;
    > SWITCH: {
    > if ( $data =~ /^\s*$/ ) {
    > last ;
    > };
    > if ( $data =~ /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    > # print "Timepoint ", $ti, "\n" ;
    > last ;
    > } ;
    > if ( $data =~ /FE MODEL/ ) {
    > ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    > ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    > # print $txt, "\n" ;
    > $tables{$txt} = ' ' ;
    > last ;
    > } ;
    > if ( $ti == $tmax ) {
    > # print $data ;
    > $tables{$txt} .= $data . "\n" ;
    > last ;
    > } ;
    > }
    > }
    >
    >
    > sub characterData {
    > my( $parseinst, $data ) = @_;
    > if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    > $writer -> characters ( $tables{$model} ) ;
    > $tables{$model} = ' ' ; #empty table
    > }
    > elsif ( ! $coords ) {
    > # print "Coords $coords : $data";
    > $writer -> characters( $data ) ;
    > }
    > }
    >
    > sub cdatastart {
    > $writer -> raw( "<![CDATA[\n" );
    > }
    > sub cdataend {
    > $writer -> raw( "]]>\n" );
    > }
    > sub default {
    >
    > # do nothing, but stay quiet
    >
    > }


    I have to agree with John Bokma, get a better strategy
    for capturing content data. Separate the processing
    from the event handling as much as possible.
    You can't rely upon parsing, nor the parser to
    return content as in the source form. And you can't
    really tell where content data begins and ends
    without processing start and end events as well.

    Print out the xml with indents so you can visually
    see what is being sent to the handlers. A form
    like this may help -

    my $RD_xml = '';
    my $last_content = '';
    my $RD_xml = '';
    my $special_tag = 0;

    sub default_start_handler
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';

    ## Check for start of singular tag data capture
    ## -----------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT')
    { $special_tag = 1; }

    ## Check for start of XML chunk data capture
    ## -----------------------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 1;
    }
    if ($capturing_Is_part_of_larger_xml)
    { $RD_xml .= $p->original_string; }
    }

    sub default_content_handler
    {
    my ($p, $str) = @_;

    ## Use original for entities, incase reparse
    ## --------------------------------------------
    $str = $p->original_string;

    ## Remove leading/trailing space, newline, tab
    ## if you want to do this now....
    ## -----------------------------------------------
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;

    ## Capture what is necessary. Last content is
    ## always captured by default
    ## -----------------------------------------------
    if (length ($str) > 0) {
    $last_content .= $str;
    $RD_xml .= $str if ($capturing_Is_part_of_larger_xml);
    }
    }

    sub default_end_handler
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## Handle singular capture of special tag data
    ## ---------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT') {
    ProcessContent ($last_content) if ($special_tag);
    $special_tag = 0;
    }
    $last_content = '';

    ## Handle larger capture XML chunks
    ## -----------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    if ($capturing_Is_part_of_larger_xml) {
    $RD_xml .= $p->original_string;
    ProcessXmlChunk ($RD_xml);
    }
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 0;
    }
    }
     
    , Dec 8, 2005
    #4
  5. Eric van Oorschot

    Guest

    Eric van Oorschot wrote:
    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes numbers
    > and coordinates) from one XML formatted file into another XML file.
    > I'm using XML::parser to extract the data and XML::Writer to write the
    > data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after being
    > read by XML::parser. Below I have copied a small bit that shows how the
    > data is corrupted. It always happens at the same line(s) of data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001
    >
    > My Perl script (I am not an experienced Perl programmer) is shown below.
    > The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    > data is read and copied into a hash %tables. When writing this hash in the
    > output file the error shown above is found.
    >
    > If anyone has an idea, or needs more info, please reply.
    >
    > Regards,
    >
    > Eric
    >
    >
    > use XML::parser;
    > use IO::File;
    > use Switch ;
    > use XML::Writer;
    >
    > my $fmsfile = shift ; # fms output file
    > my $reffile = shift ; # Exchange output deck
    > my $outfile = shift ; # Output file
    >
    > die "Cannot find fms output file \"$xmlfile\""
    > unless -f $fmsfile;
    >
    > die "Cannot find xml input deck \"$reffile\""
    > unless -f $reffile;
    >
    > my $output = new IO::File(">$outfile");
    > my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );
    >
    > #
    > # Find tmax in fms file
    > #
    > my $tmax = 0.00 ;
    > open ( IN, $fmsfile ) ;
    > while ( <IN> ) {
    > if ( /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    > $tmax = $ti if ( $ti > $tmax ) ;
    > }
    > }
    > close (IN) ;
    >
    > $tag = "";
    >
    > my %tables ; # hash with coordinates from fms file
    > my $model ; # naam van het FE model
    > my $i = 0 ; #
    > # Readfile to create hash of the coordinate tables
    > #
    > my $parser = new XML::parser;
    >
    > $parser->setHandlers( Char => \&ReadCharacterData,
    > Default => \&default);
    > print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);
    >
    > ## Check info read in fms file
    > #foreach $i ( keys %tables ) {
    > # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    > # }
    >
    > my $coords = 0 ;
    >
    > #
    > # Read reffile and replace coordinate tables with data from fms file
    > #
    > my $bparser = new XML::parser;
    > $bparser->setHandlers( XMLDecl => \&XmlDecl,
    > Doctype => \&DocType,
    > Start => \&startElement,
    > End => \&endElement,
    > Char => \&characterData,
    > CdataStart => \&cdatastart,
    > CdataEnd => \&cdataend,
    > Default => \&default);
    > print "Reading ($reffile) and writing ($outfile) \n" ;
    > $bparser->parsefile($reffile);
    >
    > $writer->end() ;
    >
    > #
    > ########################################################################
    > #
    >
    > sub XmlDecl {
    > my( $parseinst, $version, $encoding, $standalone ) = @_;
    > $writer->xmlDecl( $encoding, $standalone );
    > }
    >
    > sub DocType {
    > my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    > $writer->doctype( $name, $pub, $sysid );
    > }
    >
    > sub startElement {
    > # Reading xml data
    > my( $parseinst, $element, %attrs ) = @_;
    > SWITCH: {
    > if ($element eq "FE_MODEL") {
    > $model = $attrs{'NAME'} ;
    > $tag = "DEFINE";
    > # print "FE model $model\n" ;
    > last SWITCH;
    > }
    > if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    > $coords = 1 ;
    > # print "$coords - TABLE COORDINATES\n" ;
    > }
    > last SWITCH ;
    > }
    > $writer -> startTag( $element , %attrs );
    > }
    >
    > sub endElement {
    >
    > my( $parseinst, $element ) = @_;
    > $coords = 0 ;
    > $writer -> endTag( $element ) ;
    > }
    >
    > sub ReadCharacterData {
    > my( $parseinst, $data ) = @_;
    > SWITCH: {
    > if ( $data =~ /^\s*$/ ) {
    > last ;
    > };
    > if ( $data =~ /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    > # print "Timepoint ", $ti, "\n" ;
    > last ;
    > } ;
    > if ( $data =~ /FE MODEL/ ) {
    > ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    > ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    > # print $txt, "\n" ;
    > $tables{$txt} = ' ' ;
    > last ;
    > } ;
    > if ( $ti == $tmax ) {
    > # print $data ;
    > $tables{$txt} .= $data . "\n" ;
    > last ;
    > } ;
    > }
    > }
    >
    >
    > sub characterData {
    > my( $parseinst, $data ) = @_;
    > if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    > $writer -> characters ( $tables{$model} ) ;
    > $tables{$model} = ' ' ; #empty table
    > }
    > elsif ( ! $coords ) {
    > # print "Coords $coords : $data";
    > $writer -> characters( $data ) ;
    > }
    > }
    >
    > sub cdatastart {
    > $writer -> raw( "<![CDATA[\n" );
    > }
    > sub cdataend {
    > $writer -> raw( "]]>\n" );
    > }
    > sub default {
    >
    > # do nothing, but stay quiet
    >
    > }


    I have to agree with John Bokma, get a better strategy
    for capturing content data. Separate the processing
    from the event handling as much as possible.
    You can't rely upon parsing, nor the parser to
    return content as in the source form. And you can't
    really tell where content data begins and ends
    without processing start and end events as well.

    Print out the xml with indents so you can visually
    see what is being sent to the handlers. A form
    like this may help -

    my $RD_xml = '';
    my $last_content = '';
    my $RD_xml = '';
    my $special_tag = 0;

    sub default_start_handler
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';

    ## Check for start of singular tag data capture
    ## -----------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT')
    { $special_tag = 1; }

    ## Check for start of XML chunk data capture
    ## -----------------------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 1;
    }
    if ($capturing_Is_part_of_larger_xml)
    { $RD_xml .= $p->original_string; }
    }

    sub default_content_handler
    {
    my ($p, $str) = @_;

    ## Use original for entities, incase reparse
    ## --------------------------------------------
    $str = $p->original_string;

    ## Remove leading/trailing space, newline, tab
    ## if you want to do this now....
    ## -----------------------------------------------
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;

    ## Capture what is necessary. Last content is
    ## always captured by default
    ## -----------------------------------------------
    if (length ($str) > 0) {
    $last_content .= $str;
    $RD_xml .= $str if ($capturing_Is_part_of_larger_xml);
    }
    }

    sub default_end_handler
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## Handle singular capture of special tag data
    ## ---------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT') {
    ProcessContent ($last_content) if ($special_tag);
    $special_tag = 0;
    }
    $last_content = '';

    ## Handle larger capture XML chunks
    ## -----------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    if ($capturing_Is_part_of_larger_xml) {
    $RD_xml .= $p->original_string;
    ProcessXmlChunk ($RD_xml);
    }
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 0;
    }
    }
     
    , Dec 8, 2005
    #5
  6. Eric van Oorschot

    Guest

    Eric van Oorschot wrote:
    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes numbers
    > and coordinates) from one XML formatted file into another XML file.
    > I'm using XML::parser to extract the data and XML::Writer to write the
    > data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after being
    > read by XML::parser. Below I have copied a small bit that shows how the
    > data is corrupted. It always happens at the same line(s) of data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001
    >
    > My Perl script (I am not an experienced Perl programmer) is shown below.
    > The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    > data is read and copied into a hash %tables. When writing this hash in the
    > output file the error shown above is found.
    >
    > If anyone has an idea, or needs more info, please reply.
    >
    > Regards,
    >
    > Eric
    >
    >
    > use XML::parser;
    > use IO::File;
    > use Switch ;
    > use XML::Writer;
    >
    > my $fmsfile = shift ; # fms output file
    > my $reffile = shift ; # Exchange output deck
    > my $outfile = shift ; # Output file
    >
    > die "Cannot find fms output file \"$xmlfile\""
    > unless -f $fmsfile;
    >
    > die "Cannot find xml input deck \"$reffile\""
    > unless -f $reffile;
    >
    > my $output = new IO::File(">$outfile");
    > my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );
    >
    > #
    > # Find tmax in fms file
    > #
    > my $tmax = 0.00 ;
    > open ( IN, $fmsfile ) ;
    > while ( <IN> ) {
    > if ( /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    > $tmax = $ti if ( $ti > $tmax ) ;
    > }
    > }
    > close (IN) ;
    >
    > $tag = "";
    >
    > my %tables ; # hash with coordinates from fms file
    > my $model ; # naam van het FE model
    > my $i = 0 ; #
    > # Readfile to create hash of the coordinate tables
    > #
    > my $parser = new XML::parser;
    >
    > $parser->setHandlers( Char => \&ReadCharacterData,
    > Default => \&default);
    > print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);
    >
    > ## Check info read in fms file
    > #foreach $i ( keys %tables ) {
    > # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    > # }
    >
    > my $coords = 0 ;
    >
    > #
    > # Read reffile and replace coordinate tables with data from fms file
    > #
    > my $bparser = new XML::parser;
    > $bparser->setHandlers( XMLDecl => \&XmlDecl,
    > Doctype => \&DocType,
    > Start => \&startElement,
    > End => \&endElement,
    > Char => \&characterData,
    > CdataStart => \&cdatastart,
    > CdataEnd => \&cdataend,
    > Default => \&default);
    > print "Reading ($reffile) and writing ($outfile) \n" ;
    > $bparser->parsefile($reffile);
    >
    > $writer->end() ;
    >
    > #
    > ########################################################################
    > #
    >
    > sub XmlDecl {
    > my( $parseinst, $version, $encoding, $standalone ) = @_;
    > $writer->xmlDecl( $encoding, $standalone );
    > }
    >
    > sub DocType {
    > my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    > $writer->doctype( $name, $pub, $sysid );
    > }
    >
    > sub startElement {
    > # Reading xml data
    > my( $parseinst, $element, %attrs ) = @_;
    > SWITCH: {
    > if ($element eq "FE_MODEL") {
    > $model = $attrs{'NAME'} ;
    > $tag = "DEFINE";
    > # print "FE model $model\n" ;
    > last SWITCH;
    > }
    > if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    > $coords = 1 ;
    > # print "$coords - TABLE COORDINATES\n" ;
    > }
    > last SWITCH ;
    > }
    > $writer -> startTag( $element , %attrs );
    > }
    >
    > sub endElement {
    >
    > my( $parseinst, $element ) = @_;
    > $coords = 0 ;
    > $writer -> endTag( $element ) ;
    > }
    >
    > sub ReadCharacterData {
    > my( $parseinst, $data ) = @_;
    > SWITCH: {
    > if ( $data =~ /^\s*$/ ) {
    > last ;
    > };
    > if ( $data =~ /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    > # print "Timepoint ", $ti, "\n" ;
    > last ;
    > } ;
    > if ( $data =~ /FE MODEL/ ) {
    > ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    > ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    > # print $txt, "\n" ;
    > $tables{$txt} = ' ' ;
    > last ;
    > } ;
    > if ( $ti == $tmax ) {
    > # print $data ;
    > $tables{$txt} .= $data . "\n" ;
    > last ;
    > } ;
    > }
    > }
    >
    >
    > sub characterData {
    > my( $parseinst, $data ) = @_;
    > if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    > $writer -> characters ( $tables{$model} ) ;
    > $tables{$model} = ' ' ; #empty table
    > }
    > elsif ( ! $coords ) {
    > # print "Coords $coords : $data";
    > $writer -> characters( $data ) ;
    > }
    > }
    >
    > sub cdatastart {
    > $writer -> raw( "<![CDATA[\n" );
    > }
    > sub cdataend {
    > $writer -> raw( "]]>\n" );
    > }
    > sub default {
    >
    > # do nothing, but stay quiet
    >
    > }


    2nd try to send.
    I agree with John Bokma, you need the start and end handlers as well.
    Something like this --

    my $RD_xml = '';
    my $last_content = '';
    my $RD_xml = '';
    my $special_tag = 0;

    sub default_start_handler
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';

    ## Check for start of singular tag data capture
    ## -----------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT')
    { $special_tag = 1; }

    ## Check for start of XML chunk data capture
    ## -----------------------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 1;
    }
    if ($capturing_Is_part_of_larger_xml)
    { $RD_xml .= $p->original_string; }
    }

    sub default_content_handler
    {
    my ($p, $str) = @_;

    ## Use original for entities, incase reparse
    ## --------------------------------------------
    $str = $p->original_string;

    ## Remove leading/trailing space, newline, tab
    ## if you want to do this now....
    ## -----------------------------------------------
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;

    ## Capture what is necessary. Last content is
    ## always captured by default
    ## -----------------------------------------------
    if (length ($str) > 0) {
    $last_content .= $str;
    $RD_xml .= $str if ($capturing_Is_part_of_larger_xml);
    }
    }

    sub default_end_handler
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## Handle singular capture of special tag data
    ## ---------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT') {
    ProcessContent ($last_content) if ($special_tag);
    $special_tag = 0;
    }
    $last_content = '';

    ## Handle larger capture XML chunks
    ## -----------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    if ($capturing_Is_part_of_larger_xml) {
    $RD_xml .= $p->original_string;
    ProcessXmlChunk ($RD_xml);
    }
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 0;
    }
    }
     
    , Dec 8, 2005
    #6
  7. Eric van Oorschot

    Guest

    Eric van Oorschot wrote:
    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes numbers
    > and coordinates) from one XML formatted file into another XML file.
    > I'm using XML::parser to extract the data and XML::Writer to write the
    > data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after being
    > read by XML::parser. Below I have copied a small bit that shows how the
    > data is corrupted. It always happens at the same line(s) of data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001
    >
    > My Perl script (I am not an experienced Perl programmer) is shown below.
    > The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    > data is read and copied into a hash %tables. When writing this hash in the
    > output file the error shown above is found.
    >
    > If anyone has an idea, or needs more info, please reply.
    >
    > Regards,
    >
    > Eric
    >
    >
    > use XML::parser;
    > use IO::File;
    > use Switch ;
    > use XML::Writer;
    >
    > my $fmsfile = shift ; # fms output file
    > my $reffile = shift ; # Exchange output deck
    > my $outfile = shift ; # Output file
    >
    > die "Cannot find fms output file \"$xmlfile\""
    > unless -f $fmsfile;
    >
    > die "Cannot find xml input deck \"$reffile\""
    > unless -f $reffile;
    >
    > my $output = new IO::File(">$outfile");
    > my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );
    >
    > #
    > # Find tmax in fms file
    > #
    > my $tmax = 0.00 ;
    > open ( IN, $fmsfile ) ;
    > while ( <IN> ) {
    > if ( /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    > $tmax = $ti if ( $ti > $tmax ) ;
    > }
    > }
    > close (IN) ;
    >
    > $tag = "";
    >
    > my %tables ; # hash with coordinates from fms file
    > my $model ; # naam van het FE model
    > my $i = 0 ; #
    > # Readfile to create hash of the coordinate tables
    > #
    > my $parser = new XML::parser;
    >
    > $parser->setHandlers( Char => \&ReadCharacterData,
    > Default => \&default);
    > print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);
    >
    > ## Check info read in fms file
    > #foreach $i ( keys %tables ) {
    > # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    > # }
    >
    > my $coords = 0 ;
    >
    > #
    > # Read reffile and replace coordinate tables with data from fms file
    > #
    > my $bparser = new XML::parser;
    > $bparser->setHandlers( XMLDecl => \&XmlDecl,
    > Doctype => \&DocType,
    > Start => \&startElement,
    > End => \&endElement,
    > Char => \&characterData,
    > CdataStart => \&cdatastart,
    > CdataEnd => \&cdataend,
    > Default => \&default);
    > print "Reading ($reffile) and writing ($outfile) \n" ;
    > $bparser->parsefile($reffile);
    >
    > $writer->end() ;
    >
    > #
    > ########################################################################
    > #
    >
    > sub XmlDecl {
    > my( $parseinst, $version, $encoding, $standalone ) = @_;
    > $writer->xmlDecl( $encoding, $standalone );
    > }
    >
    > sub DocType {
    > my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    > $writer->doctype( $name, $pub, $sysid );
    > }
    >
    > sub startElement {
    > # Reading xml data
    > my( $parseinst, $element, %attrs ) = @_;
    > SWITCH: {
    > if ($element eq "FE_MODEL") {
    > $model = $attrs{'NAME'} ;
    > $tag = "DEFINE";
    > # print "FE model $model\n" ;
    > last SWITCH;
    > }
    > if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    > $coords = 1 ;
    > # print "$coords - TABLE COORDINATES\n" ;
    > }
    > last SWITCH ;
    > }
    > $writer -> startTag( $element , %attrs );
    > }
    >
    > sub endElement {
    >
    > my( $parseinst, $element ) = @_;
    > $coords = 0 ;
    > $writer -> endTag( $element ) ;
    > }
    >
    > sub ReadCharacterData {
    > my( $parseinst, $data ) = @_;
    > SWITCH: {
    > if ( $data =~ /^\s*$/ ) {
    > last ;
    > };
    > if ( $data =~ /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    > # print "Timepoint ", $ti, "\n" ;
    > last ;
    > } ;
    > if ( $data =~ /FE MODEL/ ) {
    > ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    > ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    > # print $txt, "\n" ;
    > $tables{$txt} = ' ' ;
    > last ;
    > } ;
    > if ( $ti == $tmax ) {
    > # print $data ;
    > $tables{$txt} .= $data . "\n" ;
    > last ;
    > } ;
    > }
    > }
    >
    >
    > sub characterData {
    > my( $parseinst, $data ) = @_;
    > if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    > $writer -> characters ( $tables{$model} ) ;
    > $tables{$model} = ' ' ; #empty table
    > }
    > elsif ( ! $coords ) {
    > # print "Coords $coords : $data";
    > $writer -> characters( $data ) ;
    > }
    > }
    >
    > sub cdatastart {
    > $writer -> raw( "<![CDATA[\n" );
    > }
    > sub cdataend {
    > $writer -> raw( "]]>\n" );
    > }
    > sub default {
    >
    > # do nothing, but stay quiet
    >
    > }


    2nd try to send.
    I agree with John Bokma, you need the start and end handlers as well.
    Something like this --

    my $RD_xml = '';
    my $last_content = '';
    my $RD_xml = '';
    my $special_tag = 0;

    sub default_start_handler
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';

    ## Check for start of singular tag data capture
    ## -----------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT')
    { $special_tag = 1; }

    ## Check for start of XML chunk data capture
    ## -----------------------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 1;
    }
    if ($capturing_Is_part_of_larger_xml)
    { $RD_xml .= $p->original_string; }
    }

    sub default_content_handler
    {
    my ($p, $str) = @_;

    ## Use original for entities, incase reparse
    ## --------------------------------------------
    $str = $p->original_string;

    ## Remove leading/trailing space, newline, tab
    ## if you want to do this now....
    ## -----------------------------------------------
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;

    ## Capture what is necessary. Last content is
    ## always captured by default
    ## -----------------------------------------------
    if (length ($str) > 0) {
    $last_content .= $str;
    $RD_xml .= $str if ($capturing_Is_part_of_larger_xml);
    }
    }

    sub default_end_handler
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## Handle singular capture of special tag data
    ## ---------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT') {
    ProcessContent ($last_content) if ($special_tag);
    $special_tag = 0;
    }
    $last_content = '';

    ## Handle larger capture XML chunks
    ## -----------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    if ($capturing_Is_part_of_larger_xml) {
    $RD_xml .= $p->original_string;
    ProcessXmlChunk ($RD_xml);
    }
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 0;
    }
    }
     
    , Dec 8, 2005
    #7
  8. Eric van Oorschot

    Guest

    Eric van Oorschot wrote:
    > Hi,
    >
    > I'm writing a Perl script that has to copy a block of data (nodes numbers
    > and coordinates) from one XML formatted file into another XML file.
    > I'm using XML::parser to extract the data and XML::Writer to write the
    > data into the second file.
    >
    > This does not work, since some of the numbers are corrupted after being
    > read by XML::parser. Below I have copied a small bit that shows how the
    > data is corrupted. It always happens at the same line(s) of data.
    >
    > 67 2.9005093479606E+000 3.6637104002418E-001 7.9522656092442E-001
    > 68 2.8852994122583E+000 3.5353599488296E-001 7.7516591265738E-001
    > 69 2.9109259023248E+000 3.5272037818926E-001 8.1765470045
    > 602E-001
    > 70 2.9014248453522E+000 3.4032368974452E-001 7.9417266267164E-001
    > 71 2.8849923984542E+000 3.2706829720117E-001 7.7537618002780E-001
    >
    > My Perl script (I am not an experienced Perl programmer) is shown below.
    > The error occurs in the sub 'ReadCharacterData'. In this subroutine the
    > data is read and copied into a hash %tables. When writing this hash in the
    > output file the error shown above is found.
    >
    > If anyone has an idea, or needs more info, please reply.
    >
    > Regards,
    >
    > Eric
    >
    >
    > use XML::parser;
    > use IO::File;
    > use Switch ;
    > use XML::Writer;
    >
    > my $fmsfile = shift ; # fms output file
    > my $reffile = shift ; # Exchange output deck
    > my $outfile = shift ; # Output file
    >
    > die "Cannot find fms output file \"$xmlfile\""
    > unless -f $fmsfile;
    >
    > die "Cannot find xml input deck \"$reffile\""
    > unless -f $reffile;
    >
    > my $output = new IO::File(">$outfile");
    > my $writer = new XML::Writer( OUTPUT => $output, UNSAFE => 1 );
    >
    > #
    > # Find tmax in fms file
    > #
    > my $tmax = 0.00 ;
    > open ( IN, $fmsfile ) ;
    > while ( <IN> ) {
    > if ( /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/ ;
    > $tmax = $ti if ( $ti > $tmax ) ;
    > }
    > }
    > close (IN) ;
    >
    > $tag = "";
    >
    > my %tables ; # hash with coordinates from fms file
    > my $model ; # naam van het FE model
    > my $i = 0 ; #
    > # Readfile to create hash of the coordinate tables
    > #
    > my $parser = new XML::parser;
    >
    > $parser->setHandlers( Char => \&ReadCharacterData,
    > Default => \&default);
    > print "Reading fms file ($fmsfile)\n" ; $parser->parsefile($fmsfile);
    >
    > ## Check info read in fms file
    > #foreach $i ( keys %tables ) {
    > # print "Table $i\n",$tables{$i},"\n End table $i\n\n";
    > # }
    >
    > my $coords = 0 ;
    >
    > #
    > # Read reffile and replace coordinate tables with data from fms file
    > #
    > my $bparser = new XML::parser;
    > $bparser->setHandlers( XMLDecl => \&XmlDecl,
    > Doctype => \&DocType,
    > Start => \&startElement,
    > End => \&endElement,
    > Char => \&characterData,
    > CdataStart => \&cdatastart,
    > CdataEnd => \&cdataend,
    > Default => \&default);
    > print "Reading ($reffile) and writing ($outfile) \n" ;
    > $bparser->parsefile($reffile);
    >
    > $writer->end() ;
    >
    > #
    > ########################################################################
    > #
    >
    > sub XmlDecl {
    > my( $parseinst, $version, $encoding, $standalone ) = @_;
    > $writer->xmlDecl( $encoding, $standalone );
    > }
    >
    > sub DocType {
    > my( $parseinst, $name, $sysid, $pub, $internal ) = @_;
    > $writer->doctype( $name, $pub, $sysid );
    > }
    >
    > sub startElement {
    > # Reading xml data
    > my( $parseinst, $element, %attrs ) = @_;
    > SWITCH: {
    > if ($element eq "FE_MODEL") {
    > $model = $attrs{'NAME'} ;
    > $tag = "DEFINE";
    > # print "FE model $model\n" ;
    > last SWITCH;
    > }
    > if ($element eq "TABLE" && $attrs{'TYPE'} =~ /COORDINATE/ ) {
    > $coords = 1 ;
    > # print "$coords - TABLE COORDINATES\n" ;
    > }
    > last SWITCH ;
    > }
    > $writer -> startTag( $element , %attrs );
    > }
    >
    > sub endElement {
    >
    > my( $parseinst, $element ) = @_;
    > $coords = 0 ;
    > $writer -> endTag( $element ) ;
    > }
    >
    > sub ReadCharacterData {
    > my( $parseinst, $data ) = @_;
    > SWITCH: {
    > if ( $data =~ /^\s*$/ ) {
    > last ;
    > };
    > if ( $data =~ /TIME/ ) {
    > ( $dum, $dum, $dum, $ti ) = split /\s+/, $data ;
    > # print "Timepoint ", $ti, "\n" ;
    > last ;
    > } ;
    > if ( $data =~ /FE MODEL/ ) {
    > ($dum, $dum, $dum, $dum, $dum ) = split /\s+/, $data ;
    > ( $txt = $dum ) =~ s/\/.*\/// ; # strip system numbering
    > # print $txt, "\n" ;
    > $tables{$txt} = ' ' ;
    > last ;
    > } ;
    > if ( $ti == $tmax ) {
    > # print $data ;
    > $tables{$txt} .= $data . "\n" ;
    > last ;
    > } ;
    > }
    > }
    >
    >
    > sub characterData {
    > my( $parseinst, $data ) = @_;
    > if ( $writer->within_element('FE_MODEL') && $writer->within_element('TABLE') && $coords && $data != /^\s*$/ ) {
    > $writer -> characters ( $tables{$model} ) ;
    > $tables{$model} = ' ' ; #empty table
    > }
    > elsif ( ! $coords ) {
    > # print "Coords $coords : $data";
    > $writer -> characters( $data ) ;
    > }
    > }
    >
    > sub cdatastart {
    > $writer -> raw( "<![CDATA[\n" );
    > }
    > sub cdataend {
    > $writer -> raw( "]]>\n" );
    > }
    > sub default {
    >
    > # do nothing, but stay quiet
    >
    > }


    2nd try to send.
    I agree with John Bokma, you need the start and end handlers as well.
    Something like this --

    my $RD_xml = '';
    my $last_content = '';
    my $RD_xml = '';
    my $special_tag = 0;

    sub default_start_handler
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';

    ## Check for start of singular tag data capture
    ## -----------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT')
    { $special_tag = 1; }

    ## Check for start of XML chunk data capture
    ## -----------------------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 1;
    }
    if ($capturing_Is_part_of_larger_xml)
    { $RD_xml .= $p->original_string; }
    }

    sub default_content_handler
    {
    my ($p, $str) = @_;

    ## Use original for entities, incase reparse
    ## --------------------------------------------
    $str = $p->original_string;

    ## Remove leading/trailing space, newline, tab
    ## if you want to do this now....
    ## -----------------------------------------------
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;

    ## Capture what is necessary. Last content is
    ## always captured by default
    ## -----------------------------------------------
    if (length ($str) > 0) {
    $last_content .= $str;
    $RD_xml .= $str if ($capturing_Is_part_of_larger_xml);
    }
    }

    sub default_end_handler
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## Handle singular capture of special tag data
    ## ---------------------------------------------
    if ($element eq 'SPECIAL_TAG_ELEMENT') {
    ProcessContent ($last_content) if ($special_tag);
    $special_tag = 0;
    }
    $last_content = '';

    ## Handle larger capture XML chunks
    ## -----------------------------------
    if ($element eq 'CAPTURE_ALL_OF_ME') {
    if ($capturing_Is_part_of_larger_xml) {
    $RD_xml .= $p->original_string;
    ProcessXmlChunk ($RD_xml);
    }
    $RD_xml = '';
    $capturing_Is_part_of_larger_xml = 0;
    }
    }
     
    , Dec 8, 2005
    #8
  9. Eric van Oorschot

    John Bokma Guest

    "" <> wrote:

    > I agree with John Bokma,


    Something is broke on your side, or mine, but I see this message the 6th
    time or so :-D.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    I ploink googlegroups.com :)
     
    John Bokma, Dec 8, 2005
    #9
  10. Eric van Oorschot

    robic0 Guest

    On 8 Dec 2005 05:20:17 GMT, John Bokma <> wrote:

    >"" <> wrote:
    >
    >> I agree with John Bokma,

    >
    >Something is broke on your side, or mine, but I see this message the 6th
    >time or so :-D.

    Yup, thanks to Google (again)..
     
    robic0, Dec 8, 2005
    #10
  11. Eric van Oorschot

    Dr.Ruud Guest

    [OT] Re: Copy characterdata from XML file to XML file

    John Bokma:

    > Something is broke on your side, or mine, but I see this message the
    > 6th time or so :-D.


    I assume your side, because I didn't even see a single one.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Dec 8, 2005
    #11
  12. Eric van Oorschot

    John Bokma Guest

    Re: [OT] Re: Copy characterdata from XML file to XML file

    "Dr.Ruud" <> wrote:

    > John Bokma:
    >
    >> Something is broke on your side, or mine, but I see this message the
    >> 6th time or so :-D.

    >
    > I assume your side, because I didn't even see a single one.


    Lucky you :-D

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    I ploink googlegroups.com :)
     
    John Bokma, Dec 8, 2005
    #12
  13. Eric van Oorschot

    robic0 Guest

    [OT] Re: Copy characterdata from XML file to XML file
    I wouldn't wan't this to get lost now, make sure the
    victim isin't using Forte products ->

    John Bokma:

    > Something is broke on your side, or mine, but I see this message the
    > 6th time or so :-D.


    I assume your side, because I didn't even see a single one.

    --
    Affijn, Ruud

    "Gewoon is een tijger."


    "Dr.Ruud" <> wrote:

    > John Bokma:
    >
    >> Something is broke on your side, or mine, but I see this message the
    >> 6th time or so :-D.

    >
    > I assume your side, because I didn't even see a single one.


    Lucky you :-D

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    I ploink googlegroups.com :)
     
    robic0, Dec 10, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve Richter
    Replies:
    4
    Views:
    5,563
    Steve Richter
    Apr 18, 2005
  2. Alex
    Replies:
    2
    Views:
    1,287
  3. Replies:
    26
    Views:
    2,175
    Roland Pibinger
    Sep 1, 2006
  4. dave
    Replies:
    6
    Views:
    1,092
  5. MaggotChild

    File::Copy::copy With File Handles

    MaggotChild, Oct 18, 2011, in forum: Perl Misc
    Replies:
    2
    Views:
    525
    Ilya Zakharevich
    Oct 22, 2011
Loading...

Share This Page