lwp and utf8 characters

Discussion in 'Perl Misc' started by devs@usa.net, Sep 2, 2006.

  1. Guest

    hello,
    i am trying to write a bot to download wkipedia artictles using
    WWW:Wikipedia, a subclass of LWP::UserAgent. pages returned by the
    wikipedia
    server contains utf8 characters such as LATIN CAPITAL LETTER O WITH
    DIAERESIS. however, i see that the lwp module is not handling the
    search
    results as utf8 encoded. i see that th e character Ö is treated as
    three
    individual bytes and not a single character. how do i specify that the
    lwp useragent must handle utf8 chars?

    thanks in advance,
    dave
    , Sep 2, 2006
    #1
    1. Advertising

  2. Matt Garrish Guest

    wrote:

    > hello,
    > i am trying to write a bot to download wkipedia artictles using
    > WWW:Wikipedia, a subclass of LWP::UserAgent. pages returned by the
    > wikipedia
    > server contains utf8 characters such as LATIN CAPITAL LETTER O WITH
    > DIAERESIS. however, i see that the lwp module is not handling the
    > search
    > results as utf8 encoded. i see that th e character Ö is treated as
    > three
    > individual bytes and not a single character. how do i specify that the
    > lwp useragent must handle utf8 chars?
    >


    You need to disable header parsing. You have read and are adhering to
    the free documentation license that applies to all content on that site
    in creating this bot, right?

    Matt
    Matt Garrish, Sep 2, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas =?ISO-8859-15?Q?G=F6tz?=

    LWP::Simple and utf8 problem

    Thomas =?ISO-8859-15?Q?G=F6tz?=, Apr 19, 2004, in forum: Perl
    Replies:
    0
    Views:
    707
    Thomas =?ISO-8859-15?Q?G=F6tz?=
    Apr 19, 2004
  2. gry
    Replies:
    2
    Views:
    705
    Alf P. Steinbach
    Mar 13, 2012
  3. Thomas Götz

    LWP::Simple and utf8 problem

    Thomas Götz, Apr 19, 2004, in forum: Perl Misc
    Replies:
    1
    Views:
    138
  4. T Conti

    LWP::UserAgent and 8bit characters

    T Conti, Sep 7, 2004, in forum: Perl Misc
    Replies:
    1
    Views:
    105
    T Conti
    Sep 8, 2004
  5. Dirk Heinrichs

    regex and utf8 characters (german umlauts)

    Dirk Heinrichs, Aug 10, 2006, in forum: Perl Misc
    Replies:
    11
    Views:
    917
    Ted Zlatanov
    Aug 14, 2006
Loading...

Share This Page