decoding a byte array that is unicode escaped?

Discussion in 'Python' started by sam, Nov 6, 2009.

  1. sam

    sam Guest

    I have a byte stream read over the internet:

    responseByteStream = urllib.request.urlopen( httpRequest );
    responseByteArray = responseByteStream.read();

    The characters are encoded with unicode escape sequences, for example
    a copyright symbol appears in the stream as the bytes:

    5C 75 30 30 61 39

    which translates to:
    \u00a9

    which is unicode for the copyright symbol.

    I am simply trying to display this copyright symbol on a webpage, so
    how do I encode the byte array to utf-8 given that it is 'escape
    encoded' in the above way? I tried:

    responseByteArray.decode('utf-8')
    and responseByteArray.decode('unicode_escape')
    and str(responseByteArray).

    I am using Python 3.1.
    sam, Nov 6, 2009
    #1
    1. Advertising

  2. sam

    Peter Otten Guest

    sam wrote:

    > I have a byte stream read over the internet:
    >
    > responseByteStream = urllib.request.urlopen( httpRequest );
    > responseByteArray = responseByteStream.read();
    >
    > The characters are encoded with unicode escape sequences, for example
    > a copyright symbol appears in the stream as the bytes:
    >
    > 5C 75 30 30 61 39
    >
    > which translates to:
    > \u00a9
    >
    > which is unicode for the copyright symbol.
    >
    > I am simply trying to display this copyright symbol on a webpage, so
    > how do I encode the byte array to utf-8 given that it is 'escape
    > encoded' in the above way? I tried:
    >
    > responseByteArray.decode('utf-8')
    > and responseByteArray.decode('unicode_escape')
    > and str(responseByteArray).
    >
    > I am using Python 3.1.


    Convert the bytes to unicode first:

    >>> u = b"\\u00a9".decode("unicode-escape")
    >>> u

    '©'

    Then convert the string to bytes:

    >>> u.encode("utf-8")

    b'\xc2\xa9'
    Peter Otten, Nov 6, 2009
    #2
    1. Advertising

  3. sam

    Guest

    пÑтница, 6 ноÑÐ±Ñ€Ñ 2009 г., 12:48:47 UTC+4 пользователь sam напиÑал:

    > I am simply trying to display this copyright symbol on a webpage, so
    > how do I encode the byte array to utf-8 given that it is 'escape
    > encoded' in the above way? I tried:
    >
    > responseByteArray.decode('utf-8')
    > and responseByteArray.decode('unicode_escape')
    > and str(responseByteArray).
    >
    > I am using Python 3.1.

    I had some problem with reading zip archive in raw (binary) mode.
    I solve it this way
    .....
    open (filename, 'rb').read ().encode('string_escape')
    # now we had strings with strange symbols are escaped
    # than we can handle it without decoding excepions for example:
    body = '\r\n'.join (lines)
    .....
    # if we have unescaped strings we can get an exception there
    # after opertions, we needed we must unescape all content
    # and drop it out to network (in my case)
    body = body.decode('string-escape')
    .....
    # then we can send so to the server
    connection.request('POST', upload_url, body, headers)

    BR)
    , Aug 13, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page