Webpy and UnicodeDecodeError

O

Oscar Del Ben

So I'm trying to send a file through webpy and urllib2 but I can't get
around these UnicodeErrors. Here's the code:

# controller

x = web.input(video_original={})
params = {'foo': x['foo']}

files = (('video[original]', 'test', x['video_original'].file.read
()),)
client.upload(upload_url, params, files, access_token())

# client library

def __encodeMultipart(self, fields, files):
"""
fields is a sequence of (name, value) elements for regular
form fields.
files is a sequence of (name, filename, value) elements for
data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
boundary = mimetools.choose_boundary()
crlf = '\r\n'

l = []
for k, v in fields.iteritems():
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s"' % k)
l.append('')
l.append(v)
for (k, f, v) in files:
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s";
filename="%s"' % (k, f))
l.append('Content-Type: %s' % self.__getContentType(f))
l.append('')
l.append(v)
l.append('--' + boundary + '--')
l.append('')
body = crlf.join(l)

return boundary, body

def __getContentType(self, filename):
return mimetypes.guess_type(filename)[0] or 'application/octet-
stream'

def upload(self, path, post_params, files, token=None):

if token:
token = oauth.OAuthToken.from_string(token)

url = "http://%s%s" % (self.authority, path)

(boundary, body) = self.__encodeMultipart(post_params, files)

headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
boundary,
'Content-Length': str(len(body))
}

request = oauth.OAuthRequest.from_consumer_and_token(
self.consumer,
token,
http_method='POST',
http_url=url,
parameters=post_params
)

request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
self.consumer, token)

request = urllib2.Request(request.http_url, postdata=body,
headers=headers)
request.get_method = lambda: 'POST'

return urllib2.urlopen(request)

Unfortunately I get two kinds of unicode error, the first one in the
crlf.join(l):

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)


And here's another one:

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)

Does anyone know why this errors happens and what I should do to
prevent them? Many thanks.

Oscar
 
D

Dave Angel

Oscar said:
So I'm trying to send a file through webpy and urllib2 but I can't get
around these UnicodeErrors. Here's the code:

# controller

x = web.input(video_original={})
params = {'foo': x['foo']}

files = (('video[original]', 'test', x['video_original'].file.read
()),)
client.upload(upload_url, params, files, access_token())

# client library

def __encodeMultipart(self, fields, files):
"""
fields is a sequence of (name, value) elements for regular
form fields.
files is a sequence of (name, filename, value) elements for
data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
boundary = mimetools.choose_boundary()
crlf = '\r\n'

l = []
for k, v in fields.iteritems():
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s"' % k)
l.append('')
l.append(v)
for (k, f, v) in files:
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s";
filename="%s"' % (k, f))
l.append('Content-Type: %s' % self.__getContentType(f))
l.append('')
l.append(v)
l.append('--' + boundary + '--')
l.append('')
body = crlf.join(l)

return boundary, body

def __getContentType(self, filename):
return mimetypes.guess_type(filename)[0] or 'application/octet-
stream'

def upload(self, path, post_params, files, token=None):

if token:
token = oauth.OAuthToken.from_string(token)

url = "http://%s%s" % (self.authority, path)

(boundary, body) = self.__encodeMultipart(post_params, files)

headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
boundary,
'Content-Length': str(len(body))
}

request = oauth.OAuthRequest.from_consumer_and_token(
self.consumer,
token,
http_method='POST',
http_url=url,
parameters=post_params
)

request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
self.consumer, token)

request = urllib2.Request(request.http_url, postdata=body,
headers=headers)
request.get_method = lambda: 'POST'

return urllib2.urlopen(request)

Unfortunately I get two kinds of unicode error, the first one in the
crlf.join(l):

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)


And here's another one:

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)

Does anyone know why this errors happens and what I should do to
prevent them? Many thanks.

Oscar
I did a short test to demonstrate the likely problem, without all the
other libraries and complexity.

lst = ["abc"]
lst.append("def")
lst.append(u"abc")
lst.append("g\x48\x82\x94i")
print lst
print "**".join(lst)


That fragment of code generates (in Python 2.6) the following output and
traceback:

['abc', 'def', u'abc', 'gH\x82\x94i']
Traceback (most recent call last):
File "M:\Programming\Python\sources\dummy\stuff2.py", line 10, in <module>
print "**".join(lst)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
ordinal not in range(128)


You'll notice that one of the strings is a unicode one, and another one
has the character 0x82 in it. Once join() discovers Unicode, it needs
to produce a Unicode string, and by default, it uses the ASCII codec to
get it.

If you print your 'l' list (bad name, by the way, looks too much like a
'1'), you can see which element is Unicode, and which one has the \xb7
in position 42. You'll have to decide which is the problem, and solve
it accordingly. Was the fact that one of the strings is unicode an
oversight? Or did you think that all characters would be 0x7f or less?
Or do you want to handle all possible characters, and if so, with what
encoding?

DaveA
 
O

Oscar Del Ben

Oscar said:
So I'm trying to send a file through webpy and urllib2 but I can't get
around these UnicodeErrors. Here's the code:
# controller
x = web.input(video_original={})
params = {'foo': x['foo']}
files = (('video[original]', 'test', x['video_original'].file.read
()),)
client.upload(upload_url, params, files, access_token())
# client library
def __encodeMultipart(self, fields, files):
        """
        fields is a sequence of (name, value) elements for regular
form fields.
        files is a sequence of (name, filename, value) elements for
data to be uploaded as files
        Return (content_type, body) ready for httplib.HTTP instance
        """
        boundary = mimetools.choose_boundary()
        crlf = '\r\n'
        l = []
        for k, v in fields.iteritems():
            l.append('--' + boundary)
            l.append('Content-Disposition: form-data; name="%s"' % k)
            l.append('')
            l.append(v)
        for (k, f, v) in files:
            l.append('--' + boundary)
            l.append('Content-Disposition: form-data; name="%s";
filename="%s"' % (k, f))
            l.append('Content-Type: %s' % self.__getContentType(f))
            l.append('')
            l.append(v)
        l.append('--' + boundary + '--')
        l.append('')
        body = crlf.join(l)
        return boundary, body
    def __getContentType(self, filename):
        return mimetypes.guess_type(filename)[0] or 'application/octet-
stream'
    def upload(self, path, post_params, files, token=None):
      if token:
        token = oauth.OAuthToken.from_string(token)
      url = "http://%s%s" % (self.authority, path)
      (boundary, body) = self.__encodeMultipart(post_params, files)
      headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
boundary,
          'Content-Length': str(len(body))
          }
      request = oauth.OAuthRequest.from_consumer_and_token(
        self.consumer,
        token,
        http_method='POST',
        http_url=url,
        parameters=post_params
      )
      request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
self.consumer, token)
      request = urllib2.Request(request.http_url, postdata=body,
headers=headers)
      request.get_method = lambda: 'POST'
      return urllib2.urlopen(request)
Unfortunately I get two kinds of unicode error, the first one in the
crlf.join(l):
Traceback (most recent call last):
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
    return self.handle()
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
    return self._delegate(fn, self.fvars, args)
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
    return handle_class(cls)
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
    return tocall(*args)
  File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
    return simplejson.load(client.upload(upload_url, params, files,
access_token()))
  File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
    (boundary, body) = self.__encodeMultipart(post_params, files)
  File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
    body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)
And here's another one:
Traceback (most recent call last):
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
    return self.handle()
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
    return self._delegate(fn, self.fvars, args)
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
    return handle_class(cls)
  File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
    return tocall(*args)
  File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
    return simplejson.load(client.upload(upload_url, params, files,
access_token()))
  File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
    (boundary, body) = self.__encodeMultipart(post_params, files)
  File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
    body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)
Does anyone know why this errors happens and what I should do to
prevent them? Many thanks.

I did a short test to demonstrate the likely problem, without all the
other libraries and complexity.

lst = ["abc"]
lst.append("def")
lst.append(u"abc")
lst.append("g\x48\x82\x94i")
print lst
print "**".join(lst)

That fragment of code generates (in Python 2.6) the following output and
traceback:

['abc', 'def', u'abc', 'gH\x82\x94i']
Traceback (most recent call last):
  File "M:\Programming\Python\sources\dummy\stuff2.py", line 10, in <module>
    print "**".join(lst)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
ordinal not in range(128)

You'll notice that one of the strings is a unicode one, and another one
has the character 0x82 in it.  Once join() discovers Unicode, it needs
to produce a Unicode string, and by default, it uses the ASCII codec to
get it.

If you print your 'l' list (bad name, by the way, looks too much like a
'1'), you can see which element is Unicode, and which one has the \xb7
in position 42.  You'll have to decide which is the problem, and solve
it accordingly.  Was the fact that one of the strings is unicode an
oversight?  Or did you think that all characters would be 0x7f or less?  
Or do you want to handle all possible characters, and if so, with what
encoding?

DaveA

Thanks for your reply DaveA.

Since I'm dealing with file uploads, I guess I should only care about
those. I understand the fact that I'm trying to concatenate a unicode
string with a binary, but I don't know how to deal with this. Perhaps
the uploaded file should be encoded in some way? I don't think this is
the case though.
 
D

Dave Angel

Oscar said:
Thanks for your reply DaveA.

Since I'm dealing with file uploads, I guess I should only care about
those. I understand the fact that I'm trying to concatenate a unicode
string with a binary, but I don't know how to deal with this. Perhaps
the uploaded file should be encoded in some way? I don't think this is
the case though.
You have to decide what the format of the file is to be. If you have
some in bytes, and some in Unicode, you have to be explicit about how
you merge them. And that depends who's going to use the file, and for
what purpose.

Before you try to do a join(), you have to do a conversion of the
Unicode string(s) to bytes. Try str.encode(), where you get to specify
what encoding to use.

In general, you want to use the same encoding for all the bytes in a
given file. But as I just said, that's entirely up to you.

DaveA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top