Unicode error


D

dirknbr

I am having some problems with unicode from json.

This is the error I get

UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
position 61: ordinal not in range(128)

I have kind of developped this but obviously it's not nice, any better
ideas?

try:
text=texts
text=text.encode('latin-1')
text=text.encode('utf-8')
except:
text=' '

Dirk
 
Ad

Advertisements

S

Steven D'Aprano

I am having some problems with unicode from json.

This is the error I get

UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
position 61: ordinal not in range(128)

I have kind of developped this but obviously it's not nice, any better
ideas?

try:
text=texts
text=text.encode('latin-1')
text=text.encode('utf-8')
except:
text=' '


Don't write bare excepts, always catch the error you want and nothing
else. As you've written it, the result of encoding with latin-1 is thrown
away, even if it succeeds.


text = texts # Don't hide errors here.
try:
text = text.encode('latin-1')
except UnicodeEncodeError:
try:
text = text.encode('utf-8')
except UnicodeEncodeError:
text = ' '
do_something_with(text)


Another thing you might consider is setting the error handler:

text = text.encode('utf-8', errors='ignore')

Other error handlers are 'strict' (the default), 'replace' and
'xmlcharrefreplace'.
 
C

Chris Rebert

I am having some problems with unicode from json.

This is the error I get

UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
position 61: ordinal not in range(128)

Please include the full Traceback and the actual code that's causing
the error! We aren't mind readers.

This error basically indicates that you're incorrectly mixing byte
strings and Unicode strings somewhere.

Cheers,
Chris
 
D

dirknbr

To give a bit of context. I am using twython which is a wrapper for
the JSON API


search=twitter.searchTwitter(s,rpp=100,page=str(it),result_type='recent',lang='en')
for u in search[u'results']:
ids.append(u[u'id'])
texts.append(u[u'text'])

This is where texts comes from.

When I then want to write texts to a file I get the unicode error.

Dirk
 
T

Thomas Jollans

To give a bit of context. I am using twython which is a wrapper for
the JSON API


search=twitter.searchTwitter(s,rpp=100,page=str(it),result_type='recent',lang='en')
for u in search[u'results']:
ids.append(u[u'id'])
texts.append(u[u'text'])

This is where texts comes from.

When I then want to write texts to a file I get the unicode error.

So your data is unicode? Good.

Well, files are just streams of bytes, so to write unicode data to one
you have to encode it. Since Python can't know which encoding you want
to use (utf-8, by the way, if you ask me), you have to do it manually.

something like:

outfile.write(text.encode('utf-8'))
 
N

Nobody

Don't write bare excepts, always catch the error you want and nothing
else.

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.
 
Ad

Advertisements

T

Thomas Jollans

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.

In practice, at least in Python, it tends to be better to work the
"other way around": first, write code without exception handlers. Test.
If you get an exception, there are really two possible reactions:


1. "WHAT??"
=> This shouldn't be happening. Rather than catching everything,
fix your code, or think it through until you reach conclusion
#2 below.

2. "Ah, yes. Of course. I should check for that."
=> No problem! You're staring at a traceback right now, so you
know the exception raised.

If you know there should be an exception, but you don't know which one,
it should be trivial to create condition in which the exception arises,
should it not? Then, you can handle it properly, without resorting to
guesswork or over-generalisations.
 
B

Benjamin Kaplan

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.

You still don't want to use bare excepts.People tend to get rather
annoyed when you handle KeyboardInterrupts and SystemExits like you
would a UnicodeError. Use Exception if you don't know what exceptions
can be raised.
 
T

Terry Reedy

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.

I intend to bring that issue up on pydev list sometime. But in the
meanwhile, once you get an error, you know what it is. You can
intentionally feed code bad data and see what you get. And then maybe
add a test to make sure your code traps such errors.
 
S

Steven D'Aprano

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.

Aside: that's an awfully sweeping generalisation for all third-party
libraries.

Yes, the documentation is sometimes weak, but that doesn't stop you from
being sensible. Catching any exception, no matter what, whether you've
heard of it or seen it before or not, is almost never a good idea. The
two problems with bare excepts are:

* They mask user generated keyboard interrupts, which is rude.

* They hide unexpected errors and disguise them as expected errors.

You want unexpected errors to raise an exception as early as possible,
because they probably indicate a bug in your code, and the earlier you
see the exception, the easier it is to debug.

And even if they don't indicate a bug in your code, but merely an under-
documented function, it's still better to find out what that is rather
than sweep it under the carpet. You will have learned something new ("oh,
the httplib functions can raise socket.error as well can they?") which
makes you a better programmer, you have the opportunity to improve the
documentation, you might want to handle it differently ("should I try
again, or just give up now, or reset the flubbler?").

If you decide to just mask the exception, rather than handle it in some
other way, it is easy enough to add an extra check to the except clause.
 
J

John Machin

dirknbr said:
I have kind of developped this but obviously it's not nice, any better
ideas?

try:
text=texts
text=text.encode('latin-1')
text=text.encode('utf-8')
except:
text=' '


As Steven has pointed out, if the .encode('latin-1') works, the result is thrown
away. This would be very fortunate.

It appears that your goal was to encode the text in latin1 if possible,
otherwise in UTF-8, with no indication of which encoding was used. Your second
posting confirmed that you were doing this in a loop, ending up with the
possibility that your output file would have records with mixed encodings.

Did you consider what a programmer writing code to READ your output file would
need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to
latin1??? Did you consider what would be the result of sending a stream of
mixed-encoding text to a display device?

As already advised, the short answer to avoid all of that hassle; just encode in
UTF-8.
 
Ad

Advertisements

N

Nobody

But in the
meanwhile, once you get an error, you know what it is. You can
intentionally feed code bad data and see what you get. And then maybe
add a test to make sure your code traps such errors.

That doesn't really help with exceptions which are triggered by external
factors rather than explicit inputs.

Also, if you're writing libraries (rather than self-contained programs),
you have no control over the arguments. Coupled with the fact that
duck typing is quite widely advocated in Python circles, you're stuck with
the possibility that any method call on any argument can raise any
exception. This is even true for calls to standard library functions or
methods of standard classes if you're passing caller-supplied objects as
arguments.
 
S

Steven D'Aprano

That doesn't really help with exceptions which are triggered by external
factors rather than explicit inputs.

Huh? What do you mean by "external factors"? Do you mean like power
supply fluctuations, cosmic rays flipping bits in memory, bad hardware?
You can't defend against that, not without specialist fault-tolerant
hardware, so just don't worry about it.

If you mean external factors like "the network goes down" or "the disk is
full", you can still test for those with appropriate test doubles (think
"stunt doubles", only for testing) such as stubs or mocks. It's a little
bit more work (sometimes a lot more work), but it can be done.

Or don't worry about it. Release early, release often, and take lots of
logs. You'll soon learn what exceptions can happen and what can't. Your
software is still useful even when it's not perfect, and there's always
time for another bug fix release.

Also, if you're writing libraries (rather than self-contained programs),
you have no control over the arguments.

You can't control what the caller passes to you, but once you have it,
you have total control over it. You can reject it with an exception,
stick it inside a wrapper object, convert it to something else, deal with
it as best you can, or just ignore it.

Coupled with the fact that duck
typing is quite widely advocated in Python circles, you're stuck with
the possibility that any method call on any argument can raise any
exception. This is even true for calls to standard library functions or
methods of standard classes if you're passing caller-supplied objects as
arguments.

That's a gross exaggeration. It's true that some methods could in theory
raise any exception, but in practice most exceptions are vanishingly
rare. And it isn't even remotely correct that "any" method could raise
anything. If you can get something other than NameError, ValueError or
TypeError by calling "spam".index(arg), I'd like to see it.

Frankly, it sounds to me that you're over-analysing all the things that
"could" go wrong rather than focusing on the things that actually do go
wrong. That's your prerogative, of course, but I don't think you'll get
much support for it here.
 
N

Nobody

Huh? What do you mean by "external factors"?

I mean this:
If you mean external factors like "the network goes down" or "the disk is
full",
you can still test for those with appropriate test doubles (think
"stunt doubles", only for testing) such as stubs or mocks. It's a little
bit more work (sometimes a lot more work), but it can be done.

I'd say "a lot" is more often the case.
You can't control what the caller passes to you, but once you have it,
you have total control over it.

Total control insofar as you can wrap all method calls in semi-bare
excepts (i.e. catch any Exception but not Interrupt).
That's a gross exaggeration. It's true that some methods could in theory
raise any exception, but in practice most exceptions are vanishingly
rare.

Now *that* is a gross exaggeration. Exceptions are by their nature
exceptional, in some sense of the word. But a substantial part of Python
development is playing whac-a-mole with exceptions. Write code, run
code, get traceback, either fix the cause (LBYL) or handle the exception
(EAFP), wash, rinse, repeat.
And it isn't even remotely correct that "any" method could raise
anything. If you can get something other than NameError, ValueError or
TypeError by calling "spam".index(arg), I'd like to see it.

How common is it to call methods on a string literal in real-world code?

It's far, far more common to call methods on an argument or expression
whose value could be any "string-like object" (e.g. UserString or a str
subclass).

IOW, it's "almost" correct that any method can raise any exception. The
fact that the number of counter-examples is non-zero doesn't really
change this. Even an isinstance() check won't help, as nothing prohibits a
subclass from raising exceptions which the original doesn't. Even using
"type(x) == sometype" doesn't help if x's methods involve calling methods
of user-supplied values (unless those methods are wrapped in catch-all
excepts).

Java's checked exception mechanism was based on real-world experience of
the pitfalls of abstract types. And that experience was gained in
environments where interface specifications were far more detailed than is
the norm in the Python world.
Frankly, it sounds to me that you're over-analysing all the things that
"could" go wrong rather than focusing on the things that actually do go
wrong.

See Murphy's Law.
That's your prerogative, of course, but I don't think you'll get
much support for it here.

Alas, I suspect that you're correct. Which is why I don't advocate using
Python for "serious" software. Neither the language nor its "culture" are
amenable to robustness.
 
K

kj

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.


I don't get your point. Even when I *know* that a certain exception
may happen, I don't necessarily catch it. I catch only those
exceptions for which I can think of a suitable response that is
*different* from just letting the program fail. (After all, my
own code raises its own exceptions with the precise intention of
making the program fail.) If an unexpected exception occurs, then
by definition, I had no better response in mind for that situation
than just letting the program fail, so I'm happy to let that happen.
If, afterwards, I think of a different response for a previously
uncaught exception, I'll modify the code accordingly.

I find this approach far preferable to the alternative of knowing
a long list of possible exceptions (some of which may never happen
in actual practice), and think of ways to keep the program still
alive no-matter-what. "No memory? No disk space? No problem!
Just a flesh wound!" What's the point of that?

(If I want the final error message to be something other than a
bare stack trace, I may wrap the whole execution in a global/top-level
try/catch block so that I can fashion a suitable error message
right before calling exit, but that's just "softening the fall":
the program still will go down.)
 
Ad

Advertisements

S

Steven D'Aprano

I don't get your point. Even when I *know* that a certain exception may
happen, I don't necessarily catch it. I catch only those exceptions for
which I can think of a suitable response that is *different* from just
letting the program fail. (After all, my own code raises its own
exceptions with the precise intention of making the program fail.) If
an unexpected exception occurs, then by definition, I had no better
response in mind for that situation than just letting the program fail,
so I'm happy to let that happen. If, afterwards, I think of a different
response for a previously uncaught exception, I'll modify the code
accordingly.

I find this approach far preferable to the alternative of knowing a long
list of possible exceptions (some of which may never happen in actual
practice), and think of ways to keep the program still alive
no-matter-what. "No memory? No disk space? No problem! Just a flesh
wound!" What's the point of that?

/me cheers wildly!

Well said!
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top