LWP::Simple and utf8 problem

T

Thomas Götz

Hi,

I want to retrieve a webpage that includes unicode characters using the
LWP::Simple module. But how can I tell LWP::Simple which coding it should
use as I haven't found anything concerning coding in the docs.

I use the following:

---
#!/usr/bin/perl -w

use strict;
use warnings;
use LWP::Simple;

my $file = "tmpfile";
my $url;

$url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
$url .= "db=Pubmed&retmax=500&id=15017969&retmode=xml";

getstore($url, $file);
exit;
-----

As it seems, the unicode characters are not correctly stored in the file. As
I'm not very familiar with utf8 stuff, I'd like to ask for a hint on how to
correctly store utf8-encoded webpages in a local file...!?

Tom
 
J

John

If you are using Perl 5.8, use the Encode package. Although I don't know
exactly what you are getting from LWP, you probably need to use
_utf8_on($string) to tell Perl to treat the sequence of bytes as
UTF8-encoded characters.

If you are using Perl 5.6, you need to use pack and unpack. Here is a
subroutine that will do the same thing as "_utf8_on" in perl 5.6

sub unicode_semantics_56 {
my($string) = @_;
my $bad;
local $SIG{'__WARN__'} = sub {confess "Bad unicode string ($string):
".shift;};

my @charnumbers = unpack("U*", $string);

my $res = pack("U*", @charnumbers);

die if $bad;

return $res;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top