Zlib gzip Iconv, what is going on with UTF-8

  • Thread starter Piotr MÄ…sior
  • Start date
P

Piotr MÄ…sior

Hi. I googled rly hard but nothing is working.

I got ruby 1.8.6 (test has been performed on windows)


My problem is when I try receive gziped response from some sites which
is not in UTF-8 itself.

For instance this one is correct:
response = Net::HTTP.get_with_head('http://www.wp.pl/',
{'Accept-Encoding' => 'gzip;q=1.0, identity;', 'Accept-Charset' =>
'utf-8'})

so when I try unpack it everything works fine
if response['Content-Encoding']
body_io = StringIO.new(response.body)
html = Zlib::GzipReader.new(body_io).read()
html = Iconv.conv('utf-8//IGNORE', encoding, html)
else
#html = response.body
end

problem APPERAS when site which I want to receive has got other charset
than UTF-8, so changing first line to(other server):
response = Net::HTTP.get_with_head('http://www.interia.pl/',
{'Accept-Encoding' => 'gzip;q=1.0, identity;', 'Accept-Charset' =>
'utf-8'})

give me site without ANY PROPER UTF-8 character

when I comment out Iconv line

my output has got abnormal characters in the utf-8 character's place
like: "?" (every character is replaced with some sort of question mark)


It is fault of server Gzip-way is content packed or my fault (I unpack
it in bad way)

Regards
 
P

Piotr MÄ…sior

I want to add, everything works fine when I turn off gzip
so following code works fine for every site:


response = Net::HTTP.get_with_head('http://www.interia.pl/',
{'Accept-Charset' => 'utf-8'})

that = Nokogiri::HTML(Iconv.conv('utf-8//IGNORE', encoding, html))



and 'that' is proper formated UTF-8 characterset



When gzip is present makes it ���



regards
 
P

Piotr MÄ…sior

Piotr said:
I want to add, everything works fine when I turn off gzip
so following code works fine for every site:


response = Net::HTTP.get_with_head('http://www.interia.pl/',
{'Accept-Charset' => 'utf-8'})

that = Nokogiri::HTML(Iconv.conv('utf-8//IGNORE', encoding, html))



and 'that' is proper formated UTF-8 characterset



When gzip is present makes it ���



regards


Problem SOLVED, I had bad condition what caused problem. I always was
giving "utf-8" to Iconv as encoding



regards
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top