lwp and utf8 characters

D

devs

hello,
i am trying to write a bot to download wkipedia artictles using
WWW:Wikipedia, a subclass of LWP::UserAgent. pages returned by the
wikipedia
server contains utf8 characters such as LATIN CAPITAL LETTER O WITH
DIAERESIS. however, i see that the lwp module is not handling the
search
results as utf8 encoded. i see that th e character Ö is treated as
three
individual bytes and not a single character. how do i specify that the
lwp useragent must handle utf8 chars?

thanks in advance,
dave
 
M

Matt Garrish

hello,
i am trying to write a bot to download wkipedia artictles using
WWW:Wikipedia, a subclass of LWP::UserAgent. pages returned by the
wikipedia
server contains utf8 characters such as LATIN CAPITAL LETTER O WITH
DIAERESIS. however, i see that the lwp module is not handling the
search
results as utf8 encoded. i see that th e character Ö is treated as
three
individual bytes and not a single character. how do i specify that the
lwp useragent must handle utf8 chars?

You need to disable header parsing. You have read and are adhering to
the free documentation license that applies to all content on that site
in creating this bot, right?

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top