Message from exception raised in generator disappears

M

Mike Brown

I thought I was being pretty clever with my first attempt at using generators,
but I seem to be missing some crucial concept, for even though this seems to
work as intended, the text of the exception message does not bubble up with
either of the ValueErrors when one of them is raised.


# This helps iterate over a unicode string. When python is built with
# 16-bit chars (as is the default on Windows), it returns surrogate
# pairs together (unlike 'for c in s'), and detects illegal surrogate
# pairs. Byte strings are unaffected.
def chars(s):
surrogate = None
for c in s:
cp = ord(c)
if surrogate is not None:
if cp > 56319 and cp < 57344:
pair = surrogate + c
surrogate = None
yield pair
else:
raise ValueError("Bad surrogate pair in %s" % s)
else:
if cp > 55295 and cp < 57344:
if cp < 56320:
surrogate = c
else:
raise ValueError("Bad surrogate pair in %s" %s)
else:
surrogate = None
yield c
if surrogate is not None:
raise ValueError("Bad surrogate pair at end of %s" % s)


# as expected, returns u'example \xe9...\u2022...\U00010000...\U0010fffd'
''.join([c for c in chars(u'example \xe9...\u2022...\ud800\udc00...\U0010fffd')])

# now test the 3 exception conditions. Each produces a ValueError
''.join([c for c in chars(u'2nd half bad: \ud800bogus')])
''.join([c for c in chars(u'no 1st half: \udc00')])
''.join([c for c in chars(u'no 2nd half: \ud800')])


All 3 result of the exception tests result in a bare ValueError; there's no
"Bad surrogate pair in" message shown. Why is thta? What am I doing wrong?
 
S

Scott David Daniels

Mike said:
I thought I was being pretty clever with my first attempt at using generators,
but I seem to be missing some crucial concept, for even though this seems to
work as intended, the text of the exception message does not bubble up with
either of the ValueErrors when one of them is raised.


# This helps iterate over a unicode string. When python is built with
# 16-bit chars (as is the default on Windows), it returns surrogate
# pairs together (unlike 'for c in s'), and detects illegal surrogate
# pairs. Byte strings are unaffected.
def chars(s):
surrogate = None
for c in s:
cp = ord(c)
if surrogate is not None:
if cp > 56319 and cp < 57344:
pair = surrogate + c
surrogate = None
yield pair
else:
raise ValueError("Bad surrogate pair in %s" % s)
else:
if cp > 55295 and cp < 57344:
if cp < 56320:
surrogate = c
else:
raise ValueError("Bad surrogate pair in %s" %s)
else:
surrogate = None
yield c
if surrogate is not None:
raise ValueError("Bad surrogate pair at end of %s" % s)


# as expected, returns u'example \xe9...\u2022...\U00010000...\U0010fffd'
''.join([c for c in chars(u'example \xe9...\u2022...\ud800\udc00...\U0010fffd')])

# now test the 3 exception conditions. Each produces a ValueError
''.join([c for c in chars(u'2nd half bad: \ud800bogus')])
''.join([c for c in chars(u'no 1st half: \udc00')])
''.join([c for c in chars(u'no 2nd half: \ud800')])


All 3 result of the exception tests result in a bare ValueError; there's no
"Bad surrogate pair in" message shown. Why is thta? What am I doing wrong?

The problem is that type('abc%s' % u'\udc00') is unicode, not str.
Change your raises to something like:
raise ValueError("Bad surrogate pair at end of %r" % s)
and the you can relax.

-Scott David Daniels
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top