How to detect text charset (UTF-8 or Latin-1)

T

Thomas Armstrong

Hi.

I'm creating a Perl script extracting text from a webpage using LWP,
and want to check if text is UTF-8 or Latin-1 encoded?

Is there any known function? I don't know if "use utf8;" is enough

Thank you very much in advance.
 
S

smallpond

Hi.

I'm creating a Perl script extracting text from a webpage using LWP,
and want to check if text is UTF-8 or Latin-1 encoded?

Is there any known function? I don't know if "use utf8;" is enough

Thank you very much in advance.


Parse the Content-type header, for example:
content="text/html; charset=UTF-8"

Web pages that lie or omit the Content-type are not scarce,
unfortunately.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top