How to read gzipped utf8 file in Python?

Discussion in 'Python' started by John Nagle, Nov 22, 2007.

  1. John Nagle

    John Nagle Guest

    I have a large (gigabytes) file which is encoded in UTF-8 and then
    compressed with gzip. I'd like to read it with the "gzip" module
    and "utf8" decoding. The obvious approach is

    fd = gzip.open(fname, 'rb',encoding='utf8')

    But "gzip.open" doesn't support an "encoding" parameter. (It
    probably should, for consistency.) Is there some way to do this?
    Is it possible to express "unzip, then decode utf8" via
    "codecs.open"?

    John Nagle
    John Nagle, Nov 22, 2007
    #1
    1. Advertising

  2. > I have a large (gigabytes) file which is encoded in UTF-8 and then
    > compressed with gzip. I'd like to read it with the "gzip" module
    > and "utf8" decoding.


    You didn't specify the processing you want to perform. For example,
    this should work just fine

    fd = gzip.open(fname, 'rb')
    for line in fd.readline():
    pass

    For that processing, it is not even necessary to know what the encoding
    of the file is, except that it is an ASCII superset (which UTF-8 is).

    > The obvious approach is
    >
    > fd = gzip.open(fname, 'rb',encoding='utf8')
    >
    > But "gzip.open" doesn't support an "encoding" parameter. (It
    > probably should, for consistency.)


    I think I disagree. The builtin open function does not support an
    encoding argument, either (in Python 2.x). Conceptually, gzip operates
    on byte streams, not character streams.

    > Is it possible to express "unzip, then decode utf8" via
    > "codecs.open"?


    If that's the processing you want to do - sure

    fd0 = gzip.open(fname, 'rb')
    fd = codecs.getreader("utf-8")(fd0)
    data = fd.readline()

    You can combine that to

    fd = codecs.getreader("utf-8")(gzip.open(fname))

    HTH,
    Martin
    Martin v. Löwis, Nov 22, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Smith

    Can sqlite read gzipped databases?

    Paul Smith, Mar 21, 2007, in forum: Python
    Replies:
    1
    Views:
    960
    John Nagle
    Mar 21, 2007
  2. Guido de Melo

    trouble reading a gzipped xml-file

    Guido de Melo, Nov 14, 2005, in forum: Ruby
    Replies:
    4
    Views:
    140
    Guido de Melo
    Nov 14, 2005
  3. gry
    Replies:
    2
    Views:
    728
    Alf P. Steinbach
    Mar 13, 2012
  4. DmitryB
    Replies:
    2
    Views:
    372
    Michele Dondi
    Jun 1, 2007
  5. Replies:
    0
    Views:
    968
Loading...

Share This Page