Missing codecs in Python 3.0

Discussion in 'Python' started by samwyse, Jun 3, 2009.

  1. samwyse

    samwyse Guest

    I have a Python 2.6 program (a code generator, actually) that tries
    several methods of compressing a string and chooses the most compact.
    It then writes out something like this:
    { encoding='bz2_codec', data = '...'}

    I'm having two problems converting this to Py3. First is the absence
    of the bz2_codec, among others. It was very convenient for my program
    to delay selection of the decoding method until run-time and then have
    an easy way to load the appropriate code. Is this gone forever from
    the standard libraries?

    Second, I would write my data out using the 'string_escape' codec.
    It, too, has been removed; there's a 'unicode_escape' codec which is
    similar, but I would rather use a 'byte_escape' codec to produce
    literals of the form b'asdf'. Unfortunately, there isn't one that I
    can find. I could use the repr function, but that seems less
    efficient. Does anyone have any ideas? Thanks.
     
    samwyse, Jun 3, 2009
    #1
    1. Advertising

  2. samwyse

    Chris Rebert Guest

    On Tue, Jun 2, 2009 at 7:15 PM, samwyse <> wrote:
    > I have a Python 2.6 program (a code generator, actually) that tries
    > several methods of compressing a string and chooses the most compact.
    > It then writes out something like this:
    >  { encoding='bz2_codec', data = '...'}
    >
    > I'm having two problems converting this to Py3.  First is the absence
    > of the bz2_codec, among others.  It was very convenient for my program
    > to delay selection of the decoding method until run-time and then have
    > an easy way to load the appropriate code.  Is this gone forever from
    > the standard libraries?


    That appears to be the case. "bz2" is not listed on
    http://docs.python.org/3.0/library/codecs.html , but it is listed on
    the counterpart 2.6 doc page.
    You can always use the `bz2` module instead. Or write your own
    encoder/decoder for bz2 and register it with the `codecs` module.

    > Second, I would write my data out using the 'string_escape' codec.
    > It, too, has been removed; there's a 'unicode_escape' codec which is
    > similar, but I would rather use a 'byte_escape' codec to produce
    > literals of the form b'asdf'.  Unfortunately, there isn't one that I
    > can find.  I could use the repr function, but that seems less
    > efficient.  Does anyone have any ideas?  Thanks.


    Well, if you can guarantee the string contains only ASCII, you can
    just unicode_escape it, and then prepend a "b".
    On the other hand, I don't see any reason why repr() would be
    inefficient as compared to the codec method.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
     
    Chris Rebert, Jun 3, 2009
    #2
    1. Advertising

  3. samwyse

    Carl Banks Guest

    On Jun 2, 7:35 pm, Chris Rebert <> wrote:
    > On Tue, Jun 2, 2009 at 7:15 PM, samwyse <> wrote:
    > > I have a Python 2.6 program (a code generator, actually) that tries
    > > several methods of compressing a string and chooses the most compact.
    > > It then writes out something like this:
    > >  { encoding='bz2_codec', data = '...'}

    >
    > > I'm having two problems converting this to Py3.  First is the absence
    > > of the bz2_codec, among others.  It was very convenient for my program
    > > to delay selection of the decoding method until run-time and then have
    > > an easy way to load the appropriate code.  Is this gone forever from
    > > the standard libraries?

    >
    > That appears to be the case. "bz2" is not listed onhttp://docs.python.org/3.0/library/codecs.html, but it is listed on
    > the counterpart 2.6 doc page.
    > You can always use the `bz2` module instead. Or write your own
    > encoder/decoder for bz2 and register it with the `codecs` module.


    IIRC, they decided the codecs would only be used for bytes<->unicode
    encodings in Python 3.0 (which was their intended use all along),
    moving other mappings (like bz2) elsewhere. Not sure where they all
    went, though.

    It was convenient, admittedly, but also confusing to throw all the
    other codecs in with Unicode codecs.



    Carl Banks
     
    Carl Banks, Jun 3, 2009
    #3
  4. samwyse wrote:
    > I have a Python 2.6 program (a code generator, actually) that tries
    > several methods of compressing a string and chooses the most compact.
    > It then writes out something like this:
    > { encoding='bz2_codec', data = '...'}
    >
    > I'm having two problems converting this to Py3. First is the absence
    > of the bz2_codec, among others. It was very convenient for my program
    > to delay selection of the decoding method until run-time and then have
    > an easy way to load the appropriate code. Is this gone forever from
    > the standard libraries?


    bz2 compression is certainly not gone from the standard library; it
    is still available from the bz2 module.

    I recommend that you write it like

    { decompressor = bz2.decompress, data = '...'}

    Then you can still defer invocation of the decompressor until you
    need the data.

    > Second, I would write my data out using the 'string_escape' codec.
    > It, too, has been removed; there's a 'unicode_escape' codec which is
    > similar, but I would rather use a 'byte_escape' codec to produce
    > literals of the form b'asdf'. Unfortunately, there isn't one that I
    > can find. I could use the repr function, but that seems less
    > efficient. Does anyone have any ideas?


    Why does the repr() function seem less efficient? Did you measure
    anything to make it seem so?

    I would recommend to use repr() exactly.

    Regards,
    Martin
     
    Martin v. Löwis, Jun 3, 2009
    #4
  5. samwyse <samwyse <at> gmail.com> writes:

    >
    > I have a Python 2.6 program (a code generator, actually) that tries
    > several methods of compressing a string and chooses the most compact.
    > It then writes out something like this:
    > { encoding='bz2_codec', data = '...'}


    In 3.x, all codecs which don't directly map between unicode and bytestrings have
    been removed.

    >
    > I'm having two problems converting this to Py3. First is the absence
    > of the bz2_codec, among others. It was very convenient for my program
    > to delay selection of the decoding method until run-time and then have
    > an easy way to load the appropriate code. Is this gone forever from
    > the standard libraries?


    No, just use the bz2 module in the stdlib.

    >
    > Second, I would write my data out using the 'string_escape' codec.


    Why does the repr seem less efficient?
     
    Benjamin Peterson, Jun 3, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Radovan Garabik

    how to register private python codecs?

    Radovan Garabik, Jul 1, 2003, in forum: Python
    Replies:
    1
    Views:
    767
    Steven Taschuk
    Jul 1, 2003
  2. Eric Brunel
    Replies:
    3
    Views:
    572
    Richard Brodie
    Jun 28, 2005
  3. Mike Currie

    Python UTF-8 and codecs

    Mike Currie, Jun 27, 2006, in forum: Python
    Replies:
    7
    Views:
    1,200
    Serge Orlov
    Jun 28, 2006
  4. David Hughes
    Replies:
    1
    Views:
    724
    Peter Otten
    Jan 3, 2007
  5. Karl Knechtel
    Replies:
    2
    Views:
    385
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page