J
James Marshall
I'm writing an HTTP client that handles gzip'd content as well as UTF-8
text, including when a response body is both gzip'd and in UTF-8.
I'm newish to both compression and PerlIO layers, so I'd like a second
opinion from someone who knows them better than I do. Does the code below
look correct? The goal is to end up with the uncompressed body in $body,
and interpreted as UTF-8 if identified as such by "charset".
I appreciate not wanting to use utf8::upgrade() ; is there a better way to
handle it in this case, or is this one of those cases where it's
legitimately needed?
Finally, does anyone know if Compress::Zlib::memGzip() handles UTF-8 input
correctly, or do I need to "utf8::downgrade($body)" before compressing it?
=======================================================
use Compress::Zlib ;
# Assume S is the socket, and $is_gzipped and $is_utf8 are set correctly
# from the HTTP response headers, which have just been read from S.
if ($is_gzipped) {
$body= &read_full_body(S) ;
$body= Compress::Zlib::memGunzip($body) ;
if ($is_utf8) {
utf8::upgrade($body) ;
}
} else { # not gzip'd
if ($is_utf8) {
binmode(S, ':encoding(utf8)') ;
}
$body= &read_full_body(S) ;
}
# $body should now contain response body in workable format.
text, including when a response body is both gzip'd and in UTF-8.
I'm newish to both compression and PerlIO layers, so I'd like a second
opinion from someone who knows them better than I do. Does the code below
look correct? The goal is to end up with the uncompressed body in $body,
and interpreted as UTF-8 if identified as such by "charset".
I appreciate not wanting to use utf8::upgrade() ; is there a better way to
handle it in this case, or is this one of those cases where it's
legitimately needed?
Finally, does anyone know if Compress::Zlib::memGzip() handles UTF-8 input
correctly, or do I need to "utf8::downgrade($body)" before compressing it?
=======================================================
use Compress::Zlib ;
# Assume S is the socket, and $is_gzipped and $is_utf8 are set correctly
# from the HTTP response headers, which have just been read from S.
if ($is_gzipped) {
$body= &read_full_body(S) ;
$body= Compress::Zlib::memGunzip($body) ;
if ($is_utf8) {
utf8::upgrade($body) ;
}
} else { # not gzip'd
if ($is_utf8) {
binmode(S, ':encoding(utf8)') ;
}
$body= &read_full_body(S) ;
}
# $body should now contain response body in workable format.