Anyway to designating the encoding of the "source" for compile?

J

janeaustine50

Python's InteractiveInterpreter uses the built-in compile function.

According to the ref. manual, it doesn't seem to concern about the
encoding of the source string.

When I hand in an unicode object, it is encoded in utf-8 automatically.
It can be a problem when I'm building an interactive environment using
"compile", with a different encoding from utf-8. IDLE itself has the
same problem. ( '<a string with non-ascii-encoding>' is treated okay
but u'<a string with non-ascii-encoding>' is treated wrong.)

Any suggestions or any plans in future python versions?
 
J

janeaustine50

Python's InteractiveInterpreter uses the built-in compile function.

According to the ref. manual, it doesn't seem to concern about the
encoding of the source string.

When I hand in an unicode object, it is encoded in utf-8 automatically.
It can be a problem when I'm building an interactive environment using
"compile", with a different encoding from utf-8. IDLE itself has the
same problem. ( '<a string with non-ascii-encoding>' is treated okay
but u'<a string with non-ascii-encoding>' is treated wrong.)

Any suggestions or any plans in future python versions?

I've read a posting from Martin Von Loewis mentioning trying to build
in that feature(optionally marking encoding when calling "compile").
Anyone knows how it is going on?
 
J

John Machin

I don't understand this. Suppose your "different encoding" is cp125x
(where x is a digit). Would you not do something like this?

compile_input = user_input.decode('cp125x')
code_object = compile(compile_input, ......

I've read a posting from Martin Von Loewis mentioning trying to build
in that feature(optionally marking encoding when calling "compile").
Anyone knows how it is going on?

Firstly, it would help those who might be trying to help you if you
could post a simple example: input, output, what error message, what
you mean by 'is treated wrong' ... and when it comes to Unicode
objects (indeed any text), show us repr(text) -- "what you see is
often not what others see and often not what you've actually got".

Secondly, are any of the contents of PEP 263 of any use to you?
http://www.python.org/peps/pep-0263.html
 
J

janeaustine50

John Machin 작성:
I don't understand this. Suppose your "different encoding" is cp125x
(where x is a digit). Would you not do something like this?

compile_input = user_input.decode('cp125x')
code_object = compile(compile_input, ......



Firstly, it would help those who might be trying to help you if you
could post a simple example: input, output, what error message, what
you mean by 'is treated wrong' ... and when it comes to Unicode
objects (indeed any text), show us repr(text) -- "what you see is
often not what others see and often not what you've actually got".

Secondly, are any of the contents of PEP 263 of any use to you?
http://www.python.org/peps/pep-0263.html


Okay, I'll use one of the CJK codecs as the example. EUC-KR is the
default encoding.
u'\ud55c\uae00'

So I reckon that the "compile" should get a unicode object. However...

C:\Python24\Lib>python code.py
<string>(1)?()
(Pdb) c
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
Am I right that I assume the problem lies in the code.py(and therefore
in codeop.py)? To correct the problem, I seem to parse each string and
change the literal unicode object... Hmm... Sounds a bad approach.
 
J

janeaustine50

John Machin 작성:
I don't understand this. Suppose your "different encoding" is cp125x
(where x is a digit). Would you not do something like this?

compile_input = user_input.decode('cp125x')
code_object = compile(compile_input, ......



Firstly, it would help those who might be trying to help you if you
could post a simple example: input, output, what error message, what
you mean by 'is treated wrong' ... and when it comes to Unicode
objects (indeed any text), show us repr(text) -- "what you see is
often not what others see and often not what you've actually got".

Secondly, are any of the contents of PEP 263 of any use to you?
http://www.python.org/peps/pep-0263.html


Okay, I'll use one of the CJK codecs as the example. EUC-KR is the
default encoding.
u'\ud55c\uae00'

So I reckon that the "compile" should get a unicode object. However...

C:\Python24\Lib>python code.py
<string>(1)?()
(Pdb) c
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

Am I right that I assume the problem lies in the code.py(and therefore
in codeop.py)? To correct the problem, I seem to parse each string and
change the literal unicode object... Hmm... Sounds a bad approach.

Oh, I forgot one more thing.

C:\Python24\Lib>python
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
 
J

John Machin

==== This is *EXACTLY* what your problem is =====================================================




==== It would have helped had you followed this ==========
===========================================================
# There's a very strong assumption that the above was originally
encoded in euc-kr but by the time I copied the 2 chars out of my
browser it was definitely Unicode. See what I mean about using repr()?

[big snip]

Like I said, *ALL* you have to do (like in any other Unicode-aware
app) is decode your user input into Unicode (you *don't* need to parse
bits and pieces of it) and feed it in ... like this:

HTH,
John
 
J

janeaustine50

John said:
On 16 May 2005 16:44:30 -0700, (e-mail address removed) wrote: [snip]


Like I said, *ALL* you have to do (like in any other Unicode-aware
app) is decode your user input into Unicode (you *don't* need to parse
bits and pieces of it) and feed it in ... like this:

HTH,
John

Thank you but there is still a problem.

|>>> s='euckr="\xc7\xd1";uni=u"\xc7\xd1"'
|>>> su=s.decode('euc-kr')
|>>> su
|u'euckr="\ud55c";uni=u"\ud55c"'
|>>> c=compile(su,'','single')
|>>> exec c
|>>> euckr,uni
|('\xed\x95\x9c', u'\ud55c')
|>>>

As you see the single's result is turned into UTF-8 encoding.
 
S

Serge.Orlov

That is the problem. Non-ascii characters in byte strings are
deprecated. Here is what I get when I run a deprecated hello
world program in Russian:
------- hello.py ---------
print "ЗдравÑтвуй, мир!"
--------------------------
C:\py>c:\Python24\python.exe hello.py
sys:1: DeprecationWarning: Non-ASCII character '\xc7' in file
text.py on line 1, but no encoding declared; see
http://www.python.org/peps/pep-0263.html for details
â•ŸÑ„ÐртёЄтєщ, ьшÐ!
--------------------------
Oops, not only there is a warning, but it doesn't even work
on Windows in Russian locale. To correct the program I need
to switch to unicode strings:
------- hello.py ---------
# -*- coding: windows-1251 -*-
print u"ЗдравÑтвуй, мир!"
--------------------------
C:\py>c:\Python24\python.exe hello.py
ЗдравÑтвуй, мир!
--------------------------

Since non-ascii characters are deprecated in byte strings,
any non-ascii encoding for sys.getdefaultencoding() is
deprecated as well. Don't set it to 'euc-kr'.

Any suggestions or any plans in future python versions?

In python 3.0 byte strings will be gone. So you won't be
able to put non-ascii characters into them.


Serge.
 
S

Serge.Orlov

Thank you but there is still a problem.

|>>> s='euckr="\xc7\xd1";uni=u"\xc7\xd1"'
|>>> su=s.decode('euc-kr')
|>>> su
|u'euckr="\ud55c";uni=u"\ud55c"'

su[7] is a non-ascii character inside the byte string euckr
|>>> c=compile(su,'','single')
|>>> exec c
|>>> euckr,uni
|('\xed\x95\x9c', u'\ud55c')
|>>>

As you see the single's result is turned into UTF-8 encoding.

See my previous message. Non-ascii characters in byte strings
are deprecated.

Serge.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top