LWP::Simple and utf8 problem

Discussion in 'Perl Misc' started by Thomas Götz, Apr 19, 2004.

  1. Thomas Götz

    Thomas Götz Guest

    Hi,

    I want to retrieve a webpage that includes unicode characters using the
    LWP::Simple module. But how can I tell LWP::Simple which coding it should
    use as I haven't found anything concerning coding in the docs.

    I use the following:

    ---
    #!/usr/bin/perl -w

    use strict;
    use warnings;
    use LWP::Simple;

    my $file = "tmpfile";
    my $url;

    $url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
    $url .= "db=Pubmed&retmax=500&id=15017969&retmode=xml";

    getstore($url, $file);
    exit;
    -----

    As it seems, the unicode characters are not correctly stored in the file. As
    I'm not very familiar with utf8 stuff, I'd like to ask for a hint on how to
    correctly store utf8-encoded webpages in a local file...!?

    Tom
     
    Thomas Götz, Apr 19, 2004
    #1
    1. Advertising

  2. Thomas Götz

    John Guest

    If you are using Perl 5.8, use the Encode package. Although I don't know
    exactly what you are getting from LWP, you probably need to use
    _utf8_on($string) to tell Perl to treat the sequence of bytes as
    UTF8-encoded characters.

    If you are using Perl 5.6, you need to use pack and unpack. Here is a
    subroutine that will do the same thing as "_utf8_on" in perl 5.6

    sub unicode_semantics_56 {
    my($string) = @_;
    my $bad;
    local $SIG{'__WARN__'} = sub {confess "Bad unicode string ($string):
    ".shift;};

    my @charnumbers = unpack("U*", $string);

    my $res = pack("U*", @charnumbers);

    die if $bad;

    return $res;
    }

    "Thomas Götz" <> wrote in message
    news:R5Mgc.2539219$...
    > Hi,
    >
    > I want to retrieve a webpage that includes unicode characters using the
    > LWP::Simple module. But how can I tell LWP::Simple which coding it should
    > use as I haven't found anything concerning coding in the docs.
    >
    > I use the following:
    >
    > ---
    > #!/usr/bin/perl -w
    >
    > use strict;
    > use warnings;
    > use LWP::Simple;
    >
    > my $file = "tmpfile";
    > my $url;
    >
    > $url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
    > $url .= "db=Pubmed&retmax=500&id=15017969&retmode=xml";
    >
    > getstore($url, $file);
    > exit;
    > -----
    >
    > As it seems, the unicode characters are not correctly stored in the file.

    As
    > I'm not very familiar with utf8 stuff, I'd like to ask for a hint on how

    to
    > correctly store utf8-encoded webpages in a local file...!?
    >
    > Tom
    >
     
    John, Apr 19, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas =?ISO-8859-15?Q?G=F6tz?=

    LWP::Simple and utf8 problem

    Thomas =?ISO-8859-15?Q?G=F6tz?=, Apr 19, 2004, in forum: Perl
    Replies:
    0
    Views:
    724
    Thomas =?ISO-8859-15?Q?G=F6tz?=
    Apr 19, 2004
  2. gry
    Replies:
    2
    Views:
    767
    Alf P. Steinbach
    Mar 13, 2012
  3. Hon Guin Lee - Web Producer - SMI Marketing

    LWP::Simple get() problem

    Hon Guin Lee - Web Producer - SMI Marketing, Sep 26, 2003, in forum: Perl Misc
    Replies:
    1
    Views:
    102
    Michael Budash
    Sep 26, 2003
  4. Guest

    XML::Simple and utf8 woes

    Guest, Mar 18, 2006, in forum: Perl Misc
    Replies:
    16
    Views:
    994
    Guest
    Mar 29, 2006
  5. lwp and utf8 characters

    , Sep 2, 2006, in forum: Perl Misc
    Replies:
    1
    Views:
    134
    Matt Garrish
    Sep 2, 2006
Loading...

Share This Page