PEP263 + exec statement

Carlos Ribeiro · Nov 26, 2004

Hello all,

I have a module that retrieves code snippets from a database to be
executed on the fly [1]. As I make heavy use of accented characters
inside strings (I'm in Brazil), it came to my attention that the exec
statement does not implement PEP 263 encoding checking. It does not
issue any warning when fed with arbitrary code that includes non-ASCII
characters. I am not sure if this is:

1) a design feature;
2) a bug on PEP 263 implementation;
3) a side effect of the fact that exec is not recommended anyway, and
that it will probably be deprecated at some point.

Does anyone have more information on this? I have tried Google in
vain. It seems that exec is not that popular (which is a good sign,
IMHO), and that nobody else had this problem before.

----
[1] yes, I know about the security concerns regarding exec, but that
was the best design choice for several other reasons.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)

Nick Coghlan · Nov 26, 2004

Carlos said:
Does anyone have more information on this? I have tried Google in
vain. It seems that exec is not that popular (which is a good sign,
IMHO), and that nobody else had this problem before.

Does compile() work? (i.e. "bytecode = compile(code_str); exec bytecode" instead
of "exec code_str").

PEP 263 states explicitly that feeding a unicode string to compile() should
respect the encoding. It's silence on the question of exec fails to inspire
confidence. . .

If compile() does work even though exec doesn't, it would explain why exec has
never been fixed

Cheers,
Nick.

Carlos Ribeiro · Nov 26, 2004

Does compile() work? (i.e. "bytecode = compile(code_str); exec bytecode" instead
of "exec code_str").

PEP 263 states explicitly that feeding a unicode string to compile() should
respect the encoding. It's silence on the question of exec fails to inspire
confidence. . .

If compile() does work even though exec doesn't, it would explain why exec has
never been fixed

In [3]: s = """
...: a = "Olá!"
...: print "a:", a
...: print "repr(a):", repr(a)
...: b = u"Olá"
...: print "b:", b
...: print "b(latin-1):", b.encode('latin-1')
...: """

In [4]: exec s
a: Olá!
repr(a): 'Ol\xe1!'
b: Olá
b(latin-1): Olá

In [5]: c = compile(s, '<string>', 'exec')

In [6]: print c
<code object ? at 0x403c8660, file "<string>", line 2>

In [7]: exec c
a: Olá!
repr(a): 'Ol\xe1!'
b: Olá
b(latin-1): Olá

No exceptions at any point with Python 2.3, which I thought would
support the encoding PEP (as it does for source files -- it issues a
warning). I haven't tested it with 2.4 -- don't have it installed
here, so I don't know what is it supposed to do.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)

Nick Coghlan · Nov 26, 2004

Carlos said:
In [3]: s = """
...: a = "OlÃ¡!"
...: print "a:", a
...: print "repr(a):", repr(a)
...: b = u"OlÃ¡"
...: print "b:", b
...: print "b(latin-1):", b.encode('latin-1')
...: """

Even with a source encoding declared, I believe this statement still creates an
ASCII string, so the encoding information gets lost. What happens if you make it
explicitly unicode? (i.e. s = u"""". . . etc)

The way I read the PEP, compile() always treats string objects as ASCII, and
only deals with encodings for unicode objects.

My results below at least show that whether the string object is str or unicode
makes a difference, whereas exec vs compile() does not. ('u' in the following
code is the same as your 's', but unicode instead an ASCII string).

My system is well and truly set up for ASCII though, so I don't think too much
more can be read into my results (I also don't know what your expected output is!).
a: OlÃ¡!
repr(a): 'Ol\xa0!'
b: Ol
b(latin-1): OlÃ¡
a: Olâ”œÃ!
repr(a): 'Ol\xc3\xa1!'
b: OlÃ¡
b(latin-1): OlÃŸ
a: OlÃ¡!
repr(a): 'Ol\xa0!'
b: Ol
b(latin-1): OlÃ¡
a: Olâ”œÃ!
repr(a): 'Ol\xc3\xa1!'
b: OlÃ¡
b(latin-1): OlÃŸ

Cheers,
Nick.

Carlos Ribeiro · Nov 26, 2004

Even with a source encoding declared, I believe this statement still creates an
ASCII string, so the encoding information gets lost. What happens if you make it
explicitly unicode? (i.e. s = u"""". . . etc)

Well, that was an quite obvious mistake of mine. But even so... when a
file is read, it's supposed to be ASCII-only, but it still gives the
warning for extended ASCII characters.

Anyway, I decided to give it a try. I'm still not sure if these are
the expected results. u is a unicode string (the same as for the
previous ASCII test). e is the same source code, but with encoding
enabled on the second line. I

In [9]: u = u"""
...: a = "Olá!"
...: print "a:", a
...: print "repr(a):", repr(a)
...: b = u"Olá"
...: print "b:", b
...: print "b(latin-1):", b.encode('latin-1')
...: """

In [10]: e = u"""
....: # -*- coding: iso-8859-1 -*-
....: a = "Olá!"
....: print "a:", a
....: print "repr(a):", repr(a)
....: b = u"Olá"
....: print "b:", b
....: """

In [11]: exec u
a: OlÃ¡!
repr(a): 'Ol\xc3\xa1!'
b: Olá
b(latin-1): Olá

In [12]: exec e
a: OlÃ¡!
repr(a): 'Ol\xc3\xa1!'
b: Olá

In [13]:

Curious as I am, I decided to do some tests. I found something quite
weird when I added I tried to concatenate the comment on the string.
Exec failed with a weird message, but worked after encoding.

In [64]: u
Out[64]: u'\na = "Ol\xe1!"\nprint "a:", a\nprint "repr(a):",
repr(a)\nb = u"Ol\xe1"\nprint "b:", b\nprint "b(latin-1):",
b.encode(\'latin-1\')\n'

In [65]: e = "# -*- coding: iso-8859-1 -*-\n" + u

In [66]: exec e
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)

/home/cribeiro/work/CherryPy/branches/cribeiro-experimental/<console>

SystemError: compile_node: unexpected node type

In [67]: e = u"# -*- coding: iso-8859-1 -*-\n" + u

In [68]: exec e
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)

/home/cribeiro/work/CherryPy/branches/cribeiro-experimental/<console>

SystemError: compile_node: unexpected node type

In [69]: exec e.encode('latin-1')
a: Olá!
repr(a): 'Ol\xe1!'
b: Olá
b(latin-1): Olá

Quite weird. Is it appropriate to post it at python-dev?

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Nov 26, 2004

Carlos said:
It does not
issue any warning when fed with arbitrary code that includes non-ASCII
characters. I am not sure if this is:

1) a design feature;
2) a bug on PEP 263 implementation;
3) a side effect of the fact that exec is not recommended anyway, and
that it will probably be deprecated at some point.

It is rather

4) almost out of scope of PEP 263, which, in its first sentence,
restricts itself to Python source *files*

There are a few places where Python source code is not stored
in files, namely:
- exec
- eval
- compile
- interactive mode
- IDLE shell

The PEP isn't really precise on what to do in these cases, it certainly
isn't fair to require an encoding declaration in all of them. In
particular, for interactive mode, it is undesirable to require such a
declaration. It is also unnecessary, since, in interactive mode, you
almost certainly know the encoding from sys.stdin.encoding.

So Python 2.4 will use sys.stdin.encoding for code entered in
interactive mode. My plan is that compile() grows an argument
to specify the encoding of the string, falling back to ASCII
if none is specified. Then, exec and eval would not need to
be modified - people who want to specify an encoding outside
of the source itself could pass a code object to exec or eval,
instead of directly passing the string to these constructs.
This would also take care of IDLE, which internally uses
compile().

I have implemented these changes, but I was a little bit too
late to submit them for Python 2.4.

Regards,
Martin

types.UnboundMethodType is types.MethodType	2	Oct 7, 2004
Re-executing the code object from a class 'declaration'	17	Oct 6, 2004
Unary plus operator and __pos__	3	Sep 27, 2004
Creating new classes on the fly	5	Oct 5, 2004
Replacing globals in exec by custom class	5	Dec 8, 2010
Style question on recursive generators	22	Oct 18, 2004
How to load new class definitions at runtime?	7	Nov 11, 2004
Question: tools for business apps development	33	Sep 3, 2004

PEP263 + exec statement

Carlos Ribeiro

Nick Coghlan

Carlos Ribeiro

Nick Coghlan

Carlos Ribeiro

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads