Encoding problem with automation of Word by Perl

D

Dave

In the code snippet below @foutput is an array of 'paragraphs': which are
arrays of 'text items': which are arrays with two elements, the first being
a text string and the second another text string which holds formatting
information (currently 'b' for bold 's' for superscript and '' for normal).

The code works in than it opens a Word document and produces formatted text
therein. The problem is that non-ascii Unicode characters do not transfer
cleanly. I expect that Perl and Word are making different assumptions about
what encoding is in use (Word seems to be recieving utf-8 but interpreting
it as 'code page 1252') but I don't know how change it.

Here is the code snippet (use strict and use warnings are in operation at
the top of the file) :

elsif ($opt{w}) {
use Win32::OLE;
my $word = CreateObject Win32::OLE 'Word.Application' or die $!;
$word->{'Visible'} = 1;
my $document = $word->Documents->Add;
my $selection = $word->Selection;
my $i = 0;
foreach my $para (@foutput) {
$i++; last if $i == 5; # just a few for debugging
foreach (@{$para}) {
if (@{$_}[1] eq "") {
$selection->TypeText(@{$_}[0]);
}
elsif (@{$_}[1] eq "b") {
$selection->Font->{Bold} = 1;
$selection->TypeText(@{$_}[0]);
$selection->Font->{Bold} = 0;
}
elsif (@{$_}[1] eq "s") {
$selection->Font->{Superscript} = 1;
$selection->TypeText(@{$_}[0]);
$selection->Font->{Superscript} = 0;
}
else {
die "Unknown formatting: " . @{$_}[1];
}
}
$selection -> TypeParagraph;
}
 
D

Dave

Dave said:
In the code snippet below @foutput is an array of 'paragraphs': which are
arrays of 'text items': which are arrays with two elements, the first
being a text string and the second another text string which holds
formatting information (currently 'b' for bold 's' for superscript and ''
for normal).

The code works in than it opens a Word document and produces formatted
text therein. The problem is that non-ascii Unicode characters do not
transfer cleanly. I expect that Perl and Word are making different
assumptions about what encoding is in use (Word seems to be recieving
utf-8 but interpreting it as 'code page 1252') but I don't know how change
it.

Here is the code snippet (use strict and use warnings are in operation at
the top of the file) :

elsif ($opt{w}) {
use Win32::OLE;
my $word = CreateObject Win32::OLE 'Word.Application' or die $!;
$word->{'Visible'} = 1;
my $document = $word->Documents->Add;
my $selection = $word->Selection;
my $i = 0;
foreach my $para (@foutput) {
$i++; last if $i == 5; # just a few for debugging
foreach (@{$para}) {
if (@{$_}[1] eq "") {
$selection->TypeText(@{$_}[0]);
}
elsif (@{$_}[1] eq "b") {
$selection->Font->{Bold} = 1;
$selection->TypeText(@{$_}[0]);
$selection->Font->{Bold} = 0;
}
elsif (@{$_}[1] eq "s") {
$selection->Font->{Superscript} = 1;
$selection->TypeText(@{$_}[0]);
$selection->Font->{Superscript} = 0;
}
else {
die "Unknown formatting: " . @{$_}[1];
}
}
$selection -> TypeParagraph;
}

Sorry, posted too soon as I found the answer shortly afterwards in the Docs
to Activestate perl. Adding:

Win32::OLE->Option(CP => Win32::OLE::CP_UTF8());below the line use
Win32::OLE;solves the problem.Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top