Windows ActiveState Perl: MSXML transformNodeToObject finally succeeded

T

tuser

I have finally found a solution for my long-standing problem
with Xslt-transformation under Windows ActiveState Perl and
I thought that other people might have the same problem so I
would like to share my solution with the group. I hope you
don't mind this long post, here is the story:

I had read an article by Shawn Ribordy on
http://www.perl.com/pub/a/2001/04/17/msxml.html
('MSXML, It's Not Just for VB Programmers Anymore')
in which he described how to do Xslt-transform on XML-files
using the "transformNodeToObject" method of a Win32::OLE
object.

The following lines are copied straight from his article:

"Great...", I thought, "...let's try this at home".

So I sat down at my Windows XP computer (with Activestate
v5.8.7 and the latest Msxml2.DOMDocument.4.0/SP2 installed),
fired up notepad.exe and pasted Shawn's example straight
into my perl program, and his example worked -- but that
was as far as it got!

When I started to use my own xslt-stylesheet, things went
seriously wrong. Well, I knew that my own xslt-stylesheets
had some problems, but I hoped (and expected) that the
transformNodeToObject() method would throw something useful
at me (which unfortunately it did not!) The problem was
that Shawn's example did not have any error handling
whatsoever.

I googled every possible combination of (perl, xslt, msxml,
win32, errorhandling) under the sun and I searched CPAN to
destruction, but to no avail.

Finally, after months of "pulling out my hair" I finally
stumbled upon the following variables/functions which
allowed me to correctly and reliably test for (almost)
every possible error condition.
- Win32::OLE::LastError()
- $doc->{parseError}->{reason}
- $doc->{parseError}->{line};
- $doc->{parseError}->{linePos};
- $doc->{parseError}->{srcText};

With the improved error-handling, I was now able to
experiment with different situations in my xslt-stylesheets.
Here is what I experienced:

XML-input-files: Use <?xml version='1.0' encoding='...'?>
=========================================================
In your XML-Input-Files, always specify the encoding in the
first line <?xml version='1.0' encoding='...'?>. This is
'ISO-8859-1' for plain old ASCII, but could also be 'UTF-8'
or 'UTF-16' if your XML-Input-File is set-up this way.
If you don't respect the correct encoding, you will end up
with an error ("An invalid character was found in text
content")

XSLT-files: Use <?xml version='1.0' encoding='...'?>
====================================================
In your XSLT-Files, always specify the encoding in the first
line <?xml version='1.0' encoding='...'?>.
Strictly speaking it is not necessary to specify the encoding
in the first line of the XSLT-file, a simple
<?xml version='1.0'?> is enough. but by doing so, you let
Microsoft guess the encoding, which it does correctly in 95%
of the cases. However, in the remaining 5% of the cases,
Microsoft gets it wrong and you end up with an error
("Switch from current encoding to specified encoding not
supported"). Consequently, I suggest to always specify the
actual encoding directly in the first line of the XSLT-file.

XSLT-files: Use <xsl:eek:utput encoding='ISO-8859-1'/>
===================================================
It is more convenient to use
<xsl:eek:utput encoding='ISO-8859-1'/> in your XSLT-file. This
works very well, even with accented characters and Umlaute.
You can use other encodings (such as
<xsl:eek:utput encoding='UTF-8'/>), and the XML-Output-File
will be displayed correctly in Internet Explorer, but then
you will find it inconvenient that Notepad does not display
the XML-Output-file correctly any more.

XSLT-files: Use <xsl:eek:utput method='xml'/>
==========================================
If you want to generate Html, you can do so easily by
generating an XML file with its tags in Html-syntax
(such as <p>, <table>, <hr/>, etc...). However, do not
attempt to use <xsl:eek:utput method='html'/> in your XSLT-file,
use <xsl:eek:utput method='xml'/> instead (even if you want to
generate 'Html', think of 'XHtml' and use
<xsl:eek:utput method='xml'/>). You may end in up tears when you
discover that by using <xsl:eek:utput method='html'/>, your
encoding does not work the way you want to. And you might
even discover that '&#160' and/or '&nbsp' will cause an error
after having erased your output-file! - Why is that so? - I
don't know.
The ultimate rule is: Never use 'html' as your method in
<xsl:eek:utput method='...'/>, you must use
<xsl:eek:utput method='xml'/> at all times.

XSLT-files: Use <xsl:eek:utput indent='yes'/>
==========================================
This advice is more for convenience than anything else. If
you specify <xsl:eek:utput indent='yes'/> and you look at your
XML-Output-file with Notepad, you will find that its
linebreaks are more conveniently located than they would
have been without <xsl:eek:utput indent='yes'/>. It is still
not perfect, but it is better. So finally, the
<xsl:eek:utput... /> line in your XSLT-file should look like
this:
<xsl:eek:utput method='xml' indent='yes' encoding='ISO-8859-1'/>

In XSLT-files: Use ' ' instead of '&nbsp;'
===============================================
The instruction '&nbsp;' does not work with MSXML. If you
want your XSLT-file to generate a non-breaking space, use
' ' instead.


....that's the end of my list.

For those of you who want to try, here is a test program:

use strict;
use warnings;
use Win32::OLE;

my $MxErr;

testcase(1, 'transformation succeeds');
testcase(2, 'unbalanced tags in *.xml');
testcase(3, 'unbalanced tags in *.xsl');
testcase(4, 'syntax error in *.xsl');
testcase(5, 'output method=html fails');

sub testcase {
my ($Case, $Description) = @_;

makefiles($Case);

system('cls');
print "Testcase no $Case: $Description\n";

print "\n\nThis is the xml file 'test$Case.xml':\n";
print "=============================================\n";
system("type test$Case.xml");
print "=============================================\n";
system('pause');

print "\n\nThis is the xsl file 'trf$Case.xsl':\n";
print "=============================================\n";
system("type trf$Case.xsl");
print "=============================================\n";
system('pause');

my $success = TransformXslt(xml => "test$Case.xml",
xslt => "trf$Case.xsl",
out => "output$Case.html");

if ($success) {
print "\n\nTransformXslt succeeded, result:\n";
print "=========================================\n";
system("type output$Case.html");
print "=========================================\n";
}
else {
print "\n\nProblem with TransformXslt:\n";
print "=========================================\n";
print "$MxErr\n";
print "=========================================\n";
}
system('pause');
print "\n";
}

sub makefiles {
my ($Case) = @_;

my $XData = ($Case == 2 ? 'data1' : 'data');
my $XTitle = ($Case == 3 ? 'title1' : 'title');
my $XFunc = ($Case == 4 ? 'r([?' : '.');
my $XMethod = ($Case == 5 ? 'html' : 'xml');

open OFL, '>', "test$Case.xml"
or die "err write test$Case.xml: $!";
print OFL qq{<?xml version="1.0"}.
qq{ encoding="ISO-8859-1"?>\n};
print OFL qq{<index>\n};
print OFL qq{ <data>aaaa</$XData>\n};
print OFL qq{ <data>bbbb</data>\n};
print OFL qq{</index>\n};
close OFL;

open OFL, '>', "trf$Case.xsl"
or die "err write trf$Case.xsl: $!";
print OFL qq{<?xml version="1.0"}.
qq{ encoding="ISO-8859-1"?>\n};
print OFL qq{<xsl:stylesheet version="1.0"\n};
print OFL qq{xmlns:xsl="http://www.w3.org/1999}.
qq{/XSL/Transform">\n};
print OFL qq{ <xsl:eek:utput method="$XMethod" indent=}.
qq{"yes" encoding="ISO-8859-1"/>\n};
print OFL qq{ <xsl:template match="/">\n};
print OFL qq{ <html>\n};
print OFL qq{ <body>\n};
print OFL qq{ <title>Test</$XTitle>\n};
print OFL qq{ <p>nonbreaking space</p>\n};
print OFL qq{ <hr/>\n};
print OFL qq{ <xsl:for-each select="index/data">\n};
print OFL qq{ <p>Test: *** <xsl:value-of}.
qq{ select="$XFunc"/> ***</p>\n};
print OFL qq{ </xsl:for-each>\n};
print OFL qq{ </body>\n};
print OFL qq{ </html>\n};
print OFL qq{ </xsl:template>\n};
print OFL qq{</xsl:stylesheet>\n};
close OFL;
}

sub TransformXslt {
my ($xml_input_file, $xslt_file, $xml_output_file)
= ($_[1], $_[3], $_[5]);
$MxErr = '';
my $DomDocument = 'Msxml2.DOMDocument.4.0';

# Load the document (Xml-Input-File)
my $xml_input_doc = Win32::OLE->new($DomDocument);
unless ($xml_input_doc) {
$MxErr = qq{Mx-0040: Couldn't create Win32::OLE}.
qq{ $DomDocument for XML-Input-File}.
qq{ "$xml_input_file"};
return undef;
}

$xml_input_doc->{async} = 'False';
$xml_input_doc->{validateOnParse} = 'True';
if (!$xml_input_doc->Load($xml_input_file)) {
my $Rs = $xml_input_doc->{parseError}->{reason};
$Rs =~ s/\r//; chomp $Rs;
my $Ln = $xml_input_doc->{parseError}->{line};
my $Ps = $xml_input_doc->{parseError}->{linePos};
my $Tx = $xml_input_doc->{parseError}->{srcText};
$MxErr = qq{Mx-0060: XML-Input-File}.
qq{ "$xml_input_file"}.
qq{ did not load for $DomDocument at line}.
qq{ $Ln, pos $Ps, reason: $Rs, text: '$Tx'};
return undef;
}

# create Output-object
my $xml_output_doc = Win32::OLE->new($DomDocument);
unless ($xml_output_doc) {
$MxErr = qq{Mx-0055: Couldn't create Win32::OLE}.
qq{ $DomDocument for XML-Output-File}.
qq{ "$xml_output_file"};
return undef;
}

# Load the Stylesheet (Xsl-File)
my $xslt_doc = Win32::OLE->new($DomDocument);
unless ($xslt_doc) {
$MxErr = qq{Mx-0050: Couldn't create Win32::OLE}.
qq{ $DomDocument for XSLT-File "$xslt_file"};
return undef;
}

$xslt_doc->{async} = 'False';
$xslt_doc->{validateOnParse} = 'True';
if (!$xslt_doc->Load($xslt_file)) {
my $Rs = $xslt_doc->{parseError}->{reason};
$Rs =~ s/\r//; chomp $Rs;
my $Ln = $xslt_doc->{parseError}->{line};
my $Ps = $xslt_doc->{parseError}->{linePos};
my $Tx = $xslt_doc->{parseError}->{srcText};
$MxErr = qq{Mx-0070: XSLT-file "$xslt_file" did not}.
qq{ load for $DomDocument at line}.
qq{ $Ln, pos $Ps, reason: $Rs, text: '$Tx'};
return undef;
}

# Do the work: transform xml using an xslt stylesheet
$xml_input_doc->transformNodeToObject($xslt_doc,
$xml_output_doc);
if (Win32::OLE::LastError()) {
my $Rs = Win32::OLE::LastError(); $Rs =~s/\s+/ /g;
$MxErr = qq{Mx-0080: XSLT-file "$xslt_file" has}.
qq{ syntax-errors for $DomDocument, }.
qq{reason: $Rs};
return undef;
}

# Save the done work to the output-file
$xml_output_doc->save($xml_output_file);
if (Win32::OLE::LastError()) {
my $Rs = Win32::OLE::LastError(); $Rs =~s/\s+/ /g;
$MxErr = qq{Mx-0090: Can't save to output-file}.
qq{ "$xml_output_file" for $DomDocument, }.
qq{reason: $Rs};
return undef;
}

# "-z" tests for empty file, which is considered to be
# a fatal error
if (-z $xml_output_file) {
$MxErr = qq{Mx-0100: A fatal error occured in either}.
qq{ your XSLT-file "$xslt_file", or in}.
qq{ your XML-input-file "$xml_input_file",}.
qq{ the output-file "$xml_output_file" will}.
qq{ be empty.};
return undef;
}

return 1;
}
 
S

Samwyse

tuser said:
I have finally found a solution for my long-standing problem
with Xslt-transformation under Windows ActiveState Perl and
I thought that other people might have the same problem so I
would like to share my solution with the group.

Thank you very much for posting this. While I doubt that I will ever
have any need for your specific solution, I hope that you will serve as
an example to others. Many times when researching a problem, I will
find USENET posts from people with the same problem as me, but never any
hint of how it was eventually solved. I sincerely wish that others will
remember this post and share whatever solutions they find for their
problems.
 
R

robic0

I have finally found a solution for my long-standing problem
with Xslt-transformation under Windows ActiveState Perl and
I thought that other people might have the same problem so I
would like to share my solution with the group. I hope you
don't mind this long post, here is the story:

I had read an article by Shawn Ribordy on
http://www.perl.com/pub/a/2001/04/17/msxml.html
('MSXML, It's Not Just for VB Programmers Anymore')
in which he described how to do Xslt-transform on XML-files
using the "transformNodeToObject" method of a Win32::OLE
object.
Good job! You have used a bunch of modules.
Style sheet transforms? I'm willing to bet you don't know
a rats ass about markup at all !! You've quoted code and
folks that do though...
I wouldn't hire you to clean the toilets!
 
T

Tad McClellan

sub TransformXslt {
my ($xml_input_file, $xslt_file, $xml_output_file)
= ($_[1], $_[3], $_[5]);


An "array slice" would make that much prettier:

my ($xml_input_file, $xslt_file, $xml_output_file) = @_[1,3,5];
 
T

tuser

Tad said:
sub TransformXslt {
my ($xml_input_file, $xslt_file, $xml_output_file)
= ($_[1], $_[3], $_[5]);


An "array slice" would make that much prettier:

my ($xml_input_file, $xslt_file, $xml_output_file) = @_[1,3,5];

Thanks for your input, I haven't thought of using array slices in perl
before.
I will use that in my program.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top