Unicode formatting for Strings

  • Thread starter robson.cozendey.rj
  • Start date
R

robson.cozendey.rj

Hi,

I´m trying desperately to tell the interpreter to put an 'á' in my
string, so here is the code snippet:

# -*- coding: utf-8 -*-
filename = u"Ataris Aquáticos #2.txt"
f = open(filename, 'w')

Then I save it with Windows Notepad, in the UTF-8 format. So:

1) I put the "magic comment" at the start of the file
2) I write u"" to specify my unicode string
3) I save it in the UTF-8 format

And even so, I get an error!

File "Ataris Aqußticos #2.py", line 1
SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
on line 1
, but no encoding declared; see http://www.python.org/peps/
pep-0263.html for det
ails

I don´t know how to tell Python that it should use UTF-8, it keeps
saying "no encoding declared" !

Robson
 
K

kyosohma

Hi,

I´m trying desperately to tell the interpreter to put an 'á' in my
string, so here is the code snippet:

# -*- coding: utf-8 -*-
filename = u"Ataris Aquáticos #2.txt"
f = open(filename, 'w')

Then I save it with Windows Notepad, in the UTF-8 format. So:

1) I put the "magic comment" at the start of the file
2) I write u"" to specify my unicode string
3) I save it in the UTF-8 format

And even so, I get an error!

File "Ataris Aqußticos #2.py", line 1
SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
on line 1
, but no encoding declared; seehttp://www.python.org/peps/
pep-0263.html for det
ails

I don´t know how to tell Python that it should use UTF-8, it keeps
saying "no encoding declared" !

Robson

I can't tell from your email if you get the message when you try to
open or close the file. So, I recommend that you read the following
article as it explains the whole unicode business quite well:
http://www.pyzine.com/Issue008/Section_Articles/article_Encodings.html
 
K

Kent Johnson

Hi,

I´m trying desperately to tell the interpreter to put an 'á' in my
string, so here is the code snippet:

# -*- coding: utf-8 -*-
filename = u"Ataris Aquáticos #2.txt"
f = open(filename, 'w')

Then I save it with Windows Notepad, in the UTF-8 format. So:

1) I put the "magic comment" at the start of the file
2) I write u"" to specify my unicode string
3) I save it in the UTF-8 format

And even so, I get an error!

File "Ataris Aqußticos #2.py", line 1
SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
on line 1

It looks like you are saving the file in Unicode format (not utf-8) and
Python is choking on the Byte Order Mark that Notepad puts at the
beginning of the document.

Try using an editor that will save utf-8 without a BOM, e.g. jedit or
TextPad.

Kent
 
C

Chris Mellon

It looks like you are saving the file in Unicode format (not utf-8) and
Python is choking on the Byte Order Mark that Notepad puts at the
beginning of the document.

Notepad does support saving to UTF-8, and I was able to do this
without the problem the OP was having. I also saved both with and
without a BOM (in UTF-8) using SciTe, and Python worked correctly in
both cases.
 
R

robson.cozendey.rj

Notepad does support saving to UTF-8, and I was able to do this
without the problem the OP was having. I also saved both with and
without a BOM (in UTF-8) using SciTe, and Python worked correctly in
both cases.





- Show quoted text -- Hide quoted text -

- Show quoted text -

I saved it in UTF-8 with Notepad. I was thinking here... It can be a
limitation of file.open() method? Have anyone tested that?
 
J

John Machin

I saved it in UTF-8 with Notepad.

Please consider that you might possibly be mistaken.

Here are dumps of 4 varieties of file:

| >>> for i in range(4):
.... print '\nFile %d:\n%r' % (i, open('robson' + str(i) + '.py',
'rb').read())
....

File 0:
'\xef\xbb\xbf# -*- coding: utf-8 -*-\r\nfilename = u"Ataris Aqu
\xc3\xa1ticos #2.
txt"\r\nf = open(filename, \'w\')'

File 1:
'# -*- coding: utf-8 -*-\r\nfilename = u"Ataris Aqu\xc3\xa1ticos
#2.txt"\r\nf =
open(filename, \'w\')'

File 2:
'# -*- coding: cp1252 -*-\r\nfilename = u"Ataris Aqu\xe1ticos #2.txt"\r
\nf = ope
n(filename, \'w\')'

File 3:
'\xff\xfe#\x00 \x00-\x00*\x00-\x00 \x00c\x00o\x00d\x00i\x00n\x00g
\x00:\x00 \x00u
\x00t\x00f\x00-\x008\x00 \x00-\x00*\x00-\x00\r\x00\n\x00f\x00i\x00l
\x00e\x00n\x0
0a\x00m\x00e\x00 \x00=\x00 \x00u\x00"\x00A\x00t\x00a\x00r\x00i\x00s
\x00 ]
[snip]

File 0 was saved in UTF-8 with Notepad. Notepad puts a "UTF-8 BOM" at
the front of the file. It works (that is, it creates a file with the a-
acute character in its name). There is no \xff character in line 1 for
Python to complain about.

File 1 was saved in UTF-8 with another editor. No BOM, no problem.
Works.

File 2 (which specifies cp1252 encoding (my default, and probably
yours too)) was saved normally (i.e. without the stuffing about
necessary to get UTF-8). Works.

File 3 was saved in "Unicode" (really utf_16_le) using Notepad. As you
can see, it has a UTF-16-LE BOM (which contains \xff) at the start.
Python is not amused, giving exactly the same error message as you
reported.

So:

(1) If you still believe that you are getting a problem with a file
saved as UTF-8, please present reproducible credible evidence: for
example, a copy/paste of what happens when you (a) dump of the file,
immediately followed by (b) running the file with Python.

(2) Consider using your "native" encoding (e.g. cp1252) with your
normal/usual editor/IDE.
I was thinking here... It can be a
limitation of file.open() method?

No, it can't.
Have anyone tested that?

Unlikely.

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top