writing \feff at the begining of a file

  • Thread starter Jean-Michel Pichavant
  • Start date
J

Jean-Michel Pichavant

Hello python world,

I'm trying to update the content of a $Microsoft$ VC2005 project files
using a python application.
Since those files are XML data, I assumed I could easily do that.

My problem is that VC somehow thinks that the file is corrupted and
update the file like the following:

-<?xml version='1.0' encoding='UTF-8'?>
+?<feff><?xml version="1.0" encoding="UTF-8"?>


Actually, <feff> is displayed in a different color by vim, telling me
that this is some kind of special caracter code (I'm no familiar with
such thing).
After googling that, I have a clue : could be some unicode caracter use
to indicate something ... well I don't know in fact ("UTF-8 files
sometimes start with a byte-order marker (BOM) to indicate that they are
encoded in UTF-8.").

My problem is however simplier : how do I add such character at the
begining of the file ?
I tried

f = open('paf', w)
f.write(u'\ufeff')

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)

The error may be explicit but I have no idea how to proceed further. Any
clue ?

JM
 
U

Ulrich Eckhardt

Jean-Michel Pichavant said:
My problem is however simplier : how do I add such character [a BOM]
at the begining of the file ?
I tried

f = open('paf', w)
f.write(u'\ufeff')

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)

Try the codecs module to open the file, which will then do all the
transcoding between internal texts and external UTF-8 for you.

Uli
 
N

Nobody

I'm trying to update the content of a $Microsoft$ VC2005 project files
using a python application.
Since those files are XML data, I assumed I could easily do that.

My problem is that VC somehow thinks that the file is corrupted and
update the file like the following:

-<?xml version='1.0' encoding='UTF-8'?>
+?<feff><?xml version="1.0" encoding="UTF-8"?>


Actually, <feff> is displayed in a different color by vim, telling me
that this is some kind of special caracter code (I'm no familiar with
such thing).

U+FEFF is a "byte order mark" or BOM. Each Unicode-based encoding (UTF-8,
UTF-16, UTF-16-LE, etc) will encode it differently, so it enables a
program reading the file to determine the encoding before reading any
actual data.
My problem is however simplier : how do I add such character at the
begining of the file ?
I tried

Either:

1. Open the file as binary and write '\xef\xbb\xbf' to the file:

f = open('foo.txt', 'wb')
f.write('\xef\xbb\xbf')

[You can also use the constant BOM_UTF8 from the codecs module.]

2. Open the file as utf-8 and write u'\ufeff' to the file:

import codecs
f = codecs.open('foo.txt', 'w', 'utf-8')
f.write(u'\ufeff')

3. Open the file as utf-8-sig and don't write anything (or write an empty
string):

import codecs
f = codecs.open('foo.txt', 'w', 'utf-8-sig')
f.write('')

The utf-8-sig codec automatically writes a BOM at the beginning of the
file. It is present in Python 2.5 and later.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top