UTF-8 characters in doctest

Bzyczek · Sep 19, 2007

Hello,
I have problems with running doctests if I use czech national
characters in UTF-8 encoding.

I have Python script, which begin with encoding definition:

# -*- coding: utf-8 -*-

I have this function with doctest:

def get_inventary_number(block):
""" ... mìdirytina, grafika je zcela vyøezána z papíru - max.
rozmìr
... 420×582 neznaèeno
... text: opis v levém medailonu: CAROL VI IMP.ELIS.CHR. AVG.
P.P.''' (u'nezna\xc4\x8deno', u'28. \xc4\x8cesk\xc3\xa9 kr\xc3\xa1lovsk
\xc3\xa9 insignie\nm\xc4\x9bdirytina, grafika je zcela vy\xc5\x99ez
\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm\xc4\x9br
\n420\xc3\x97582 \ntext: opis v lev\xc3\xa9m medailonu: CAROL VI
IMP.ELIS.CHR. AVG. P.P.')
"""
m = RE_INVENTARNI_CISLO.search(block)
if m: return m.group(1), block.replace(m.group(0), '')
else: return None, block

After running doctest.testmod() I get this error message:

File "vizovice_03.py", line 417, in ?
doctest.testmod()
File "/usr/local/lib/python2.4/doctest.py", line 1841, in testmod
for test in finder.find(m, name, globs=globs,
extraglobs=extraglobs):
File "/usr/local/lib/python2.4/doctest.py", line 851, in find
self._find(tests, obj, name, module, source_lines, globs, {})
File "/usr/local/lib/python2.4/doctest.py", line 910, in _find
globs, seen)
File "/usr/local/lib/python2.4/doctest.py", line 895, in _find
test = self._get_test(obj, name, module, globs, source_lines)
File "/usr/local/lib/python2.4/doctest.py", line 985, in _get_test
filename, lineno)
File "/usr/local/lib/python2.4/doctest.py", line 602, in get_doctest
return DocTest(self.get_examples(string, name), globs,
File "/usr/local/lib/python2.4/doctest.py", line 616, in
get_examples
return [x for x in self.parse(string, name)
File "/usr/local/lib/python2.4/doctest.py", line 577, in parse
(source, options, want, exc_msg) = \
File "/usr/local/lib/python2.4/doctest.py", line 648, in
_parse_example
lineno + len(source_lines))
File "/usr/local/lib/python2.4/doctest.py", line 732, in
_check_prefix
raise ValueError('line %r of the docstring for %s has '
ValueError: line 17 of the docstring for __main__.get_inventary_number
has inconsistent leading whitespace: 'm\xc4\x9bdirytina, grafika je
zcela vy\xc5\x99ez\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm
\xc4\x9br'

I try to fill expected output in docstring according to output from
Python shell, from doctest (if I bypass it in docstring, doctest says
me what he expect and what it get), I try to set variable t as t='some
text' together t=u'some unicode text'. But everything fails.

So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...

Thank you for any advice.

Regards
Michal

Peter Otten · Sep 19, 2007

Bzyczek said:
So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...

Use raw strings in combination with explicit decoding and a little
try-and-error. E. g. this little gem passes

# -*- coding: utf8 -*-
r"""(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)

if __name__ == "__main__":
import doctest
doctest.testmod()

Peter

John J. Lee · Sep 20, 2007

Peter Otten said:
# -*- coding: utf8 -*-
r"""
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?

John

J. Cliff Dyer · Sep 21, 2007

John said:
Forgive me if this is a stupid question, but: What purpose does
function f serve?

John

Well, it has nothing to do with the unicode bit that came before it. It
just takes an argument, and wraps it in a 1-tuple. Guessing by the
argument of "s", that argument is expected to be a string.

One use I can think of is that sometimes you'll find a function that
returns a string or a list or tuple of strings. If you want to pass that
result on to a for loop, and only loop once on the string (instead of
looping on each letter of the string), you might want to wrap it in a
tuple or a list before passing it to the loop.

Cheers,
Cliff

J. Cliff Dyer · Sep 21, 2007

J. Cliff Dyer said:
John said:

[...]

def f(s):
return (s,)

Click to expand...

Forgive me if this is a stupid question, but: What purpose does
function f serve?

John

Click to expand...

Well, it has nothing to do with the unicode bit that came before it. It
just takes an argument, and wraps it in a 1-tuple. Guessing by the
argument of "s", that argument is expected to be a string.

One use I can think of is that sometimes you'll find a function that
returns a string or a list or tuple of strings. If you want to pass that
result on to a for loop, and only loop once on the string (instead of
looping on each letter of the string), you might want to wrap it in a
tuple or a list before passing it to the loop.

Cheers,
Cliff

(replying to my own post)

Sorry. Itchy trigger finger and tired brain. I didn't read the whole
context of the thread. Dunno what it's doing here. Forcing __repr__ to
be called on a print statement? Funny way to do that. Like I said, I
don't know, so I'll leave it to someone else to say.

Cheers,
Cliff

Peter Otten · Sep 21, 2007

John said:
Forgive me if this is a stupid question, but: What purpose does
function f serve?

Like the OP's get_inventary_number() it takes a unicode string and
returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
had stripped down his code to a point where the analogy was still
recognizable.

Peter

John J. Lee · Sep 22, 2007

Peter Otten said:
Like the OP's get_inventary_number() it takes a unicode string and
returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
had stripped down his code to a point where the analogy was still
recognizable.

Ah, right.

John

MeCab UTF-8 Decoding Problem	6	Jun 29, 2013
doctest + shelve question	0	Mar 6, 2009
doctest + swig example	0	Jul 26, 2006
Unicode characters in btye-strings	5	Mar 12, 2010
Dynamic doctests?	1	May 13, 2005
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position	58	Sep 29, 2013
doctest problem with null byte	1	Jan 26, 2007
doctest bug with nested triple quotes	3	Aug 2, 2005

UTF-8 characters in doctest

Bzyczek

Peter Otten

John J. Lee

J. Cliff Dyer

J. Cliff Dyer

Peter Otten

John J. Lee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads