Python 2.6 StreamReader.readline()

C

cpppwner

Hi,

I have a simple question, I'm using something like the following lines in python 2.6.2

reader = codecs.getreader(encoding)
lines = []
with open(filename, 'rb') as f:
lines = reader(f, 'strict').readlines(keepends=False)

where encoding == 'utf-16-be'
Everything works fine, except that lines[0] is equal to codecs.BOM_UTF16_BE
Is this behaviour correct, that the BOM is still present?

Thanks in advance for your help.

Best,
Stefan
 
U

Ulrich Eckhardt

Am 24.07.2012 17:01, schrieb (e-mail address removed):
reader = codecs.getreader(encoding)
lines = []
with open(filename, 'rb') as f:
lines = reader(f, 'strict').readlines(keepends=False)

where encoding == 'utf-16-be'
Everything works fine, except that lines[0] is equal to codecs.BOM_UTF16_BE
Is this behaviour correct, that the BOM is still present?

Yes, assuming the first line only contains that BOM. Technically it's a
space character, and why should those be removed?

Uli
 
W

Walter Dörwald

Am 24.07.2012 17:01, schrieb (e-mail address removed):
reader = codecs.getreader(encoding)
lines = []
with open(filename, 'rb') as f:
lines = reader(f, 'strict').readlines(keepends=False)

where encoding == 'utf-16-be'
Everything works fine, except that lines[0] is equal to
codecs.BOM_UTF16_BE
Is this behaviour correct, that the BOM is still present?

Yes, assuming the first line only contains that BOM. Technically it's a
space character, and why should those be removed?

If the first "character" in the file is a BOM the file encoding is
probably not utf-16-be but utf-16.

Servus,
Walter
 
W

wxjmfauth

On 25.07.12 08:09, Ulrich Eckhardt wrote:

> Am 24.07.2012 17:01, schrieb (e-mail address removed):
>> reader = codecs.getreader(encoding)
>> lines = []
>> with open(filename, 'rb') as f:
>> lines = reader(f, 'strict').readlines(keepends=False)
>>
>> where encoding == 'utf-16-be'
>> Everything works fine, except that lines[0] is equal to
>> codecs.BOM_UTF16_BE
>> Is this behaviour correct, that the BOM is still present?
>
> Yes, assuming the first line only contains that BOM. Technically it's a
> space character, and why should those be removed?

If the first "character" in the file is a BOM the file encodingis
probably not utf-16-be but utf-16.

Servus,
Walter

The byte order mark, if present, is nothing else than
an encoded
'ZERO WIDTH NO-BREAK SPACE'

*code point*.

Five "BOM" are possible (Unicode consortium). utf-8-sig, utf-16-be,
utf-16-le, utf-32-be, utf-32-le. The codecs module provide many
aliases.

The fact that utf-16/32 does correspond to -le or to -be may
vary according to the platforms, the compilers, ...
'3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit
(Intel)]'
---

As far as I know, Py 2.7 or Py 3.2 never return a "BOM" when
a file is read correctly.
.... r = f.readlines()
.... for zeile in r:
.... print(zeile.rstrip())
....
abc
élève
cœur
€uro

jmf
 
W

wxjmfauth

On 25.07.12 08:09, Ulrich Eckhardt wrote:

> Am 24.07.2012 17:01, schrieb (e-mail address removed):
>> reader = codecs.getreader(encoding)
>> lines = []
>> with open(filename, 'rb') as f:
>> lines = reader(f, 'strict').readlines(keepends=False)
>>
>> where encoding == 'utf-16-be'
>> Everything works fine, except that lines[0] is equal to
>> codecs.BOM_UTF16_BE
>> Is this behaviour correct, that the BOM is still present?
>
> Yes, assuming the first line only contains that BOM. Technically it's a
> space character, and why should those be removed?

If the first "character" in the file is a BOM the file encodingis
probably not utf-16-be but utf-16.

Servus,
Walter

The byte order mark, if present, is nothing else than
an encoded
'ZERO WIDTH NO-BREAK SPACE'

*code point*.

Five "BOM" are possible (Unicode consortium). utf-8-sig, utf-16-be,
utf-16-le, utf-32-be, utf-32-le. The codecs module provide many
aliases.

The fact that utf-16/32 does correspond to -le or to -be may
vary according to the platforms, the compilers, ...
'3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit
(Intel)]'
---

As far as I know, Py 2.7 or Py 3.2 never return a "BOM" when
a file is read correctly.
.... r = f.readlines()
.... for zeile in r:
.... print(zeile.rstrip())
....
abc
élève
cœur
€uro

jmf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top