Webpy and UnicodeDecodeError

Discussion in 'Python' started by Oscar Del Ben, Dec 18, 2009.

  1. So I'm trying to send a file through webpy and urllib2 but I can't get
    around these UnicodeErrors. Here's the code:

    # controller

    x = web.input(video_original={})
    params = {'foo': x['foo']}

    files = (('video[original]', 'test', x['video_original'].file.read
    ()),)
    client.upload(upload_url, params, files, access_token())

    # client library

    def __encodeMultipart(self, fields, files):
    """
    fields is a sequence of (name, value) elements for regular
    form fields.
    files is a sequence of (name, filename, value) elements for
    data to be uploaded as files
    Return (content_type, body) ready for httplib.HTTP instance
    """
    boundary = mimetools.choose_boundary()
    crlf = '\r\n'

    l = []
    for k, v in fields.iteritems():
    l.append('--' + boundary)
    l.append('Content-Disposition: form-data; name="%s"' % k)
    l.append('')
    l.append(v)
    for (k, f, v) in files:
    l.append('--' + boundary)
    l.append('Content-Disposition: form-data; name="%s";
    filename="%s"' % (k, f))
    l.append('Content-Type: %s' % self.__getContentType(f))
    l.append('')
    l.append(v)
    l.append('--' + boundary + '--')
    l.append('')
    body = crlf.join(l)

    return boundary, body

    def __getContentType(self, filename):
    return mimetypes.guess_type(filename)[0] or 'application/octet-
    stream'

    def upload(self, path, post_params, files, token=None):

    if token:
    token = oauth.OAuthToken.from_string(token)

    url = "http://%s%s" % (self.authority, path)

    (boundary, body) = self.__encodeMultipart(post_params, files)

    headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
    boundary,
    'Content-Length': str(len(body))
    }

    request = oauth.OAuthRequest.from_consumer_and_token(
    self.consumer,
    token,
    http_method='POST',
    http_url=url,
    parameters=post_params
    )

    request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
    self.consumer, token)

    request = urllib2.Request(request.http_url, postdata=body,
    headers=headers)
    request.get_method = lambda: 'POST'

    return urllib2.urlopen(request)

    Unfortunately I get two kinds of unicode error, the first one in the
    crlf.join(l):

    Traceback (most recent call last):
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 242, in process
    return self.handle()
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 233, in handle
    return self._delegate(fn, self.fvars, args)
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 412, in _delegate
    return handle_class(cls)
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 387, in handle_class
    return tocall(*args)
    File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
    POST
    return simplejson.load(client.upload(upload_url, params, files,
    access_token()))
    File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    131, in upload
    (boundary, body) = self.__encodeMultipart(post_params, files)
    File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    111, in __encodeMultipart
    body = crlf.join(l)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
    42: ordinal not in range(128)


    And here's another one:

    Traceback (most recent call last):
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 242, in process
    return self.handle()
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 233, in handle
    return self._delegate(fn, self.fvars, args)
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 412, in _delegate
    return handle_class(cls)
    File "/Users/oscar/projects/work/whitelabel/web/application.py",
    line 387, in handle_class
    return tocall(*args)
    File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
    POST
    return simplejson.load(client.upload(upload_url, params, files,
    access_token()))
    File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    131, in upload
    (boundary, body) = self.__encodeMultipart(post_params, files)
    File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    111, in __encodeMultipart
    body = crlf.join(l)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
    42: ordinal not in range(128)

    Does anyone know why this errors happens and what I should do to
    prevent them? Many thanks.

    Oscar
     
    Oscar Del Ben, Dec 18, 2009
    #1
    1. Advertising

  2. Oscar Del Ben

    Dave Angel Guest

    Oscar Del Ben wrote:
    > So I'm trying to send a file through webpy and urllib2 but I can't get
    > around these UnicodeErrors. Here's the code:
    >
    > # controller
    >
    > x = web.input(video_original={})
    > params = {'foo': x['foo']}
    >
    > files = (('video[original]', 'test', x['video_original'].file.read
    > ()),)
    > client.upload(upload_url, params, files, access_token())
    >
    > # client library
    >
    > def __encodeMultipart(self, fields, files):
    > """
    > fields is a sequence of (name, value) elements for regular
    > form fields.
    > files is a sequence of (name, filename, value) elements for
    > data to be uploaded as files
    > Return (content_type, body) ready for httplib.HTTP instance
    > """
    > boundary = mimetools.choose_boundary()
    > crlf = '\r\n'
    >
    > l = []
    > for k, v in fields.iteritems():
    > l.append('--' + boundary)
    > l.append('Content-Disposition: form-data; name="%s"' % k)
    > l.append('')
    > l.append(v)
    > for (k, f, v) in files:
    > l.append('--' + boundary)
    > l.append('Content-Disposition: form-data; name="%s";
    > filename="%s"' % (k, f))
    > l.append('Content-Type: %s' % self.__getContentType(f))
    > l.append('')
    > l.append(v)
    > l.append('--' + boundary + '--')
    > l.append('')
    > body = crlf.join(l)
    >
    > return boundary, body
    >
    > def __getContentType(self, filename):
    > return mimetypes.guess_type(filename)[0] or 'application/octet-
    > stream'
    >
    > def upload(self, path, post_params, files, token=None):
    >
    > if token:
    > token = oauth.OAuthToken.from_string(token)
    >
    > url = "http://%s%s" % (self.authority, path)
    >
    > (boundary, body) = self.__encodeMultipart(post_params, files)
    >
    > headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
    > boundary,
    > 'Content-Length': str(len(body))
    > }
    >
    > request = oauth.OAuthRequest.from_consumer_and_token(
    > self.consumer,
    > token,
    > http_method='POST',
    > http_url=url,
    > parameters=post_params
    > )
    >
    > request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
    > self.consumer, token)
    >
    > request = urllib2.Request(request.http_url, postdata=body,
    > headers=headers)
    > request.get_method = lambda: 'POST'
    >
    > return urllib2.urlopen(request)
    >
    > Unfortunately I get two kinds of unicode error, the first one in the
    > crlf.join(l):
    >
    > Traceback (most recent call last):
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 242, in process
    > return self.handle()
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 233, in handle
    > return self._delegate(fn, self.fvars, args)
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 412, in _delegate
    > return handle_class(cls)
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 387, in handle_class
    > return tocall(*args)
    > File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
    > POST
    > return simplejson.load(client.upload(upload_url, params, files,
    > access_token()))
    > File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > 131, in upload
    > (boundary, body) = self.__encodeMultipart(post_params, files)
    > File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > 111, in __encodeMultipart
    > body = crlf.join(l)
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
    > 42: ordinal not in range(128)
    >
    >
    > And here's another one:
    >
    > Traceback (most recent call last):
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 242, in process
    > return self.handle()
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 233, in handle
    > return self._delegate(fn, self.fvars, args)
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 412, in _delegate
    > return handle_class(cls)
    > File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > line 387, in handle_class
    > return tocall(*args)
    > File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
    > POST
    > return simplejson.load(client.upload(upload_url, params, files,
    > access_token()))
    > File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > 131, in upload
    > (boundary, body) = self.__encodeMultipart(post_params, files)
    > File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > 111, in __encodeMultipart
    > body = crlf.join(l)
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
    > 42: ordinal not in range(128)
    >
    > Does anyone know why this errors happens and what I should do to
    > prevent them? Many thanks.
    >
    > Oscar
    >
    >

    I did a short test to demonstrate the likely problem, without all the
    other libraries and complexity.

    lst = ["abc"]
    lst.append("def")
    lst.append(u"abc")
    lst.append("g\x48\x82\x94i")
    print lst
    print "**".join(lst)


    That fragment of code generates (in Python 2.6) the following output and
    traceback:

    ['abc', 'def', u'abc', 'gH\x82\x94i']
    Traceback (most recent call last):
    File "M:\Programming\Python\sources\dummy\stuff2.py", line 10, in <module>
    print "**".join(lst)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
    ordinal not in range(128)


    You'll notice that one of the strings is a unicode one, and another one
    has the character 0x82 in it. Once join() discovers Unicode, it needs
    to produce a Unicode string, and by default, it uses the ASCII codec to
    get it.

    If you print your 'l' list (bad name, by the way, looks too much like a
    '1'), you can see which element is Unicode, and which one has the \xb7
    in position 42. You'll have to decide which is the problem, and solve
    it accordingly. Was the fact that one of the strings is unicode an
    oversight? Or did you think that all characters would be 0x7f or less?
    Or do you want to handle all possible characters, and if so, with what
    encoding?

    DaveA
     
    Dave Angel, Dec 18, 2009
    #2
    1. Advertising

  3. On Dec 18, 4:43 pm, Dave Angel <> wrote:
    > Oscar Del Ben wrote:
    > > So I'm trying to send a file through webpy and urllib2 but I can't get
    > > around these UnicodeErrors. Here's the code:

    >
    > > # controller

    >
    > > x = web.input(video_original={})
    > > params = {'foo': x['foo']}

    >
    > > files = (('video[original]', 'test', x['video_original'].file.read
    > > ()),)
    > > client.upload(upload_url, params, files, access_token())

    >
    > > # client library

    >
    > > def __encodeMultipart(self, fields, files):
    > >         """
    > >         fields is a sequence of (name, value) elements for regular
    > > form fields.
    > >         files is a sequence of (name, filename, value) elements for
    > > data to be uploaded as files
    > >         Return (content_type, body) ready for httplib.HTTP instance
    > >         """
    > >         boundary = mimetools.choose_boundary()
    > >         crlf = '\r\n'

    >
    > >         l = []
    > >         for k, v in fields.iteritems():
    > >             l.append('--' + boundary)
    > >             l.append('Content-Disposition: form-data; name="%s"' % k)
    > >             l.append('')
    > >             l.append(v)
    > >         for (k, f, v) in files:
    > >             l.append('--' + boundary)
    > >             l.append('Content-Disposition: form-data; name="%s";
    > > filename="%s"' % (k, f))
    > >             l.append('Content-Type: %s' % self.__getContentType(f))
    > >             l.append('')
    > >             l.append(v)
    > >         l.append('--' + boundary + '--')
    > >         l.append('')
    > >         body = crlf.join(l)

    >
    > >         return boundary, body

    >
    > >     def __getContentType(self, filename):
    > >         return mimetypes.guess_type(filename)[0] or 'application/octet-
    > > stream'

    >
    > >     def upload(self, path, post_params, files, token=None):

    >
    > >       if token:
    > >         token = oauth.OAuthToken.from_string(token)

    >
    > >       url = "http://%s%s" % (self.authority, path)

    >
    > >       (boundary, body) = self.__encodeMultipart(post_params, files)

    >
    > >       headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
    > > boundary,
    > >           'Content-Length': str(len(body))
    > >           }

    >
    > >       request = oauth.OAuthRequest.from_consumer_and_token(
    > >         self.consumer,
    > >         token,
    > >         http_method='POST',
    > >         http_url=url,
    > >         parameters=post_params
    > >       )

    >
    > >       request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
    > > self.consumer, token)

    >
    > >       request = urllib2.Request(request.http_url, postdata=body,
    > > headers=headers)
    > >       request.get_method = lambda: 'POST'

    >
    > >       return urllib2.urlopen(request)

    >
    > > Unfortunately I get two kinds of unicode error, the first one in the
    > > crlf.join(l):

    >
    > > Traceback (most recent call last):
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 242, in process
    > >     return self.handle()
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 233, in handle
    > >     return self._delegate(fn, self.fvars, args)
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 412, in _delegate
    > >     return handle_class(cls)
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 387, in handle_class
    > >     return tocall(*args)
    > >   File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
    > > POST
    > >     return simplejson.load(client.upload(upload_url, params, files,
    > > access_token()))
    > >   File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > > 131, in upload
    > >     (boundary, body) = self.__encodeMultipart(post_params, files)
    > >   File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > > 111, in __encodeMultipart
    > >     body = crlf.join(l)
    > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
    > > 42: ordinal not in range(128)

    >
    > > And here's another one:

    >
    > > Traceback (most recent call last):
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 242, in process
    > >     return self.handle()
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 233, in handle
    > >     return self._delegate(fn, self.fvars, args)
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 412, in _delegate
    > >     return handle_class(cls)
    > >   File "/Users/oscar/projects/work/whitelabel/web/application.py",
    > > line 387, in handle_class
    > >     return tocall(*args)
    > >   File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
    > > POST
    > >     return simplejson.load(client.upload(upload_url, params, files,
    > > access_token()))
    > >   File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > > 131, in upload
    > >     (boundary, body) = self.__encodeMultipart(post_params, files)
    > >   File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
    > > 111, in __encodeMultipart
    > >     body = crlf.join(l)
    > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
    > > 42: ordinal not in range(128)

    >
    > > Does anyone know why this errors happens and what I should do to
    > > prevent them? Many thanks.

    >
    > > Oscar

    >
    > I did a short test to demonstrate the likely problem, without all the
    > other libraries and complexity.
    >
    > lst = ["abc"]
    > lst.append("def")
    > lst.append(u"abc")
    > lst.append("g\x48\x82\x94i")
    > print lst
    > print "**".join(lst)
    >
    > That fragment of code generates (in Python 2.6) the following output and
    > traceback:
    >
    > ['abc', 'def', u'abc', 'gH\x82\x94i']
    > Traceback (most recent call last):
    >   File "M:\Programming\Python\sources\dummy\stuff2.py", line 10, in <module>
    >     print "**".join(lst)
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
    > ordinal not in range(128)
    >
    > You'll notice that one of the strings is a unicode one, and another one
    > has the character 0x82 in it.  Once join() discovers Unicode, it needs
    > to produce a Unicode string, and by default, it uses the ASCII codec to
    > get it.
    >
    > If you print your 'l' list (bad name, by the way, looks too much like a
    > '1'), you can see which element is Unicode, and which one has the \xb7
    > in position 42.  You'll have to decide which is the problem, and solve
    > it accordingly.  Was the fact that one of the strings is unicode an
    > oversight?  Or did you think that all characters would be 0x7f or less?  
    > Or do you want to handle all possible characters, and if so, with what
    > encoding?
    >
    > DaveA


    Thanks for your reply DaveA.

    Since I'm dealing with file uploads, I guess I should only care about
    those. I understand the fact that I'm trying to concatenate a unicode
    string with a binary, but I don't know how to deal with this. Perhaps
    the uploaded file should be encoded in some way? I don't think this is
    the case though.
     
    Oscar Del Ben, Dec 18, 2009
    #3
  4. Oscar Del Ben

    Dave Angel Guest

    Oscar Del Ben wrote:
    > <snip>
    >> You'll notice that one of the strings is a unicode one, and another one
    >> has the character 0x82 in it. Once join() discovers Unicode, it needs
    >> to produce a Unicode string, and by default, it uses the ASCII codec to
    >> get it.
    >>
    >> If you print your 'l' list (bad name, by the way, looks too much like a
    >> '1'), you can see which element is Unicode, and which one has the \xb7
    >> in position 42. You'll have to decide which is the problem, and solve
    >> it accordingly. Was the fact that one of the strings is unicode an
    >> oversight? Or did you think that all characters would be 0x7f or less?
    >> Or do you want to handle all possible characters, and if so, with what
    >> encoding?
    >>
    >> DaveA
    >>

    >
    > Thanks for your reply DaveA.
    >
    > Since I'm dealing with file uploads, I guess I should only care about
    > those. I understand the fact that I'm trying to concatenate a unicode
    > string with a binary, but I don't know how to deal with this. Perhaps
    > the uploaded file should be encoded in some way? I don't think this is
    > the case though.
    >
    >

    You have to decide what the format of the file is to be. If you have
    some in bytes, and some in Unicode, you have to be explicit about how
    you merge them. And that depends who's going to use the file, and for
    what purpose.

    Before you try to do a join(), you have to do a conversion of the
    Unicode string(s) to bytes. Try str.encode(), where you get to specify
    what encoding to use.

    In general, you want to use the same encoding for all the bytes in a
    given file. But as I just said, that's entirely up to you.

    DaveA
     
    Dave Angel, Dec 19, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. a

    psyco+webpy

    a, Jun 26, 2006, in forum: Python
    Replies:
    3
    Views:
    534
    Amaury Forgeot d'Arc
    Jun 26, 2006
  2. Webpy vs Django?

    , Jun 3, 2008, in forum: Python
    Replies:
    4
    Views:
    930
    cirfu
    Jun 3, 2008
  3. Alf P. Steinbach
    Replies:
    18
    Views:
    1,576
    Nobody
    Dec 5, 2009
  4. Îίκος

    Making a pass form cgi => webpy framework

    Îίκος, Jun 23, 2013, in forum: Python
    Replies:
    18
    Views:
    706
  5. Silvio Siefke

    webpy

    Silvio Siefke, Nov 28, 2013, in forum: Python
    Replies:
    0
    Views:
    102
    Silvio Siefke
    Nov 28, 2013
Loading...

Share This Page