Python 3.2 bug? Reading the last line of a file

tkpmep · May 25, 2011

The following function that returns the last line of a file works
perfectly well under Python 2.71. but fails reliably under Python 3.2.
Is this a bug, or am I doing something wrong? Any help would be
greatly appreciated.

import os

def lastLine(filename):
'''
Returns the last line of a file
file.seek takes an optional 'whence' argument which allows you
to
start looking at the end, so you can just work back from there
till
you hit the first newline that has anything after it
Works perfectly under Python 2.7, but not under 3.2!
'''
offset = -50
with open(filename) as f:
while offset > -1024:
offset *= 2
f.seek(offset, os.SEEK_END)
lines = f.readlines()
if len(lines) > 1:
return lines[-1]

If I execute this with a valid filename fn. I get the following error
message:
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
lastLine(fn)
File "<pyshell#11>", line 13, in lastLine
f.seek(offset, os.SEEK_END)
io.UnsupportedOperation: can't do nonzero end-relative seeks

Sincerely

Thomas Philips

MRAB · May 25, 2011

The following function that returns the last line of a file works
perfectly well under Python 2.71. but fails reliably under Python 3.2.
Is this a bug, or am I doing something wrong? Any help would be
greatly appreciated.

import os

def lastLine(filename):
'''
Returns the last line of a file
file.seek takes an optional 'whence' argument which allows you
to
start looking at the end, so you can just work back from there
till
you hit the first newline that has anything after it
Works perfectly under Python 2.7, but not under 3.2!
'''
offset = -50
with open(filename) as f:
while offset> -1024:
offset *= 2
f.seek(offset, os.SEEK_END)
lines = f.readlines()
if len(lines)> 1:
return lines[-1]

If I execute this with a valid filename fn. I get the following error
message:
Traceback (most recent call last):
File "<pyshell#12>", line 1, in<module>
lastLine(fn)
File "<pyshell#11>", line 13, in lastLine
f.seek(offset, os.SEEK_END)
io.UnsupportedOperation: can't do nonzero end-relative seeks

You're opening the file in text mode, and seeking relative to the end
of the file is not allowed in text mode, presumably because the file
contents have to be decoded, and, in general, seeking to an arbitrary
position within a sequence of encoded bytes can have undefined results
when you attempt to decode to Unicode starting from that position.

The strange thing is that you _are_ allowed to seek relative to the
start of the file.

Try opening the file in binary mode and do the decoding yourself,
catching the DecodeError exceptions if/when they occur.

Ian Kelly · May 25, 2011

You're opening the file in text mode, and seeking relative to the end
of the file is not allowed in text mode, presumably because the file
contents have to be decoded, and, in general, seeking to an arbitrary
position within a sequence of encoded bytes can have undefined results
when you attempt to decode to Unicode starting from that position.

The strange thing is that you _are_ allowed to seek relative to the
start of the file.

I think that with text files seek() is only really meant to be called
with values returned from tell(), which may include the decoder state
in its return value.

MRAB · May 25, 2011

I think that with text files seek() is only really meant to be called
with values returned from tell(), which may include the decoder state
in its return value.

What do you mean by "may include the decoder state in its return value"?

It does make sense that the values returned from tell() won't be in the
middle of an encoded sequence of bytes.

tkpmep · May 25, 2011

Thanks for the guidance - it was indeed an issue with reading in
binary vs. text., and I do now succeed in reading the last line,
except that I now seem unable to split it, as I demonstrate below.
Here's what I get when I read the last line in text mode using 2.7.1
and in binary mode using 3.2 respectively under IDLE:

2.7.1
Name 31/12/2009 0 0 0

3.2
b'Name\t31/12/2009\t0\t0\t0\r\n'

if, under 2.7.1 I read the file in text mode and write['Name', '31/12/2009', '0', '0', '0\n']

but under 3.2, with its binary read, I getTraceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
x.split('\t')
TypeError: Type str doesn't support the buffer API

If I remove the '\t', the split now works and I get a list of bytes
literals[b'Name', b'31/12/2009', b'0', b'0', b'0']

Looking through the docs did not clarify my understanding of the
issue. Why can I not split on '\t' when reading in binary mode?

Sincerely

Thomas Philips

MRAB · May 25, 2011

Thanks for the guidance - it was indeed an issue with reading in
binary vs. text., and I do now succeed in reading the last line,
except that I now seem unable to split it, as I demonstrate below.
Here's what I get when I read the last line in text mode using 2.7.1
and in binary mode using 3.2 respectively under IDLE:

2.7.1
Name 31/12/2009 0 0 0

3.2
b'Name\t31/12/2009\t0\t0\t0\r\n'

if, under 2.7.1 I read the file in text mode and write['Name', '31/12/2009', '0', '0', '0\n']

but under 3.2, with its binary read, I getTraceback (most recent call last):
File "<pyshell#26>", line 1, in<module>
x.split('\t')
TypeError: Type str doesn't support the buffer API

If I remove the '\t', the split now works and I get a list of bytes
literals[b'Name', b'31/12/2009', b'0', b'0', b'0']

Looking through the docs did not clarify my understanding of the
issue. Why can I not split on '\t' when reading in binary mode?

x.split('\t') tries to split on '\t', a string (str), but x is a
bytestring (bytes).

Do x.split(b'\t') instead.

Ethan Furman · May 25, 2011

Thanks for the guidance - it was indeed an issue with reading in
binary vs. text., and I do now succeed in reading the last line,
except that I now seem unable to split it, as I demonstrate below.
Here's what I get when I read the last line in text mode using 2.7.1
and in binary mode using 3.2 respectively under IDLE:

3.2
b'Name\t31/12/2009\t0\t0\t0\r\n'

under 3.2, with its binary read, I get
--> x.split('\t')
Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
x.split('\t')
TypeError: Type str doesn't support the buffer API

You are trying to split a bytes object with a str object -- the two are
not compatible. Try splitting with the bytes object b'\t'.

~Ethan~

Ethan Furman · May 25, 2011

MRAB said:
Thanks for the guidance - it was indeed an issue with reading in
binary vs. text., and I do now succeed in reading the last line,
except that I now seem unable to split it, as I demonstrate below.
Here's what I get when I read the last line in text mode using 2.7.1
and in binary mode using 3.2 respectively under IDLE:

2.7.1
Name 31/12/2009 0 0 0

3.2
b'Name\t31/12/2009\t0\t0\t0\r\n'

if, under 2.7.1 I read the file in text mode and write

x = lastLine(fn)

Click to expand...

I can then cleanly split the line to get its contents

x.split('\t')

Click to expand...

['Name', '31/12/2009', '0', '0', '0\n']

but under 3.2, with its binary read, I get

x.split('\t')

Click to expand...

Traceback (most recent call last):
File "<pyshell#26>", line 1, in<module>
x.split('\t')
TypeError: Type str doesn't support the buffer API

If I remove the '\t', the split now works and I get a list of bytes
literals

x.split()

Click to expand...

[b'Name', b'31/12/2009', b'0', b'0', b'0']

Looking through the docs did not clarify my understanding of the
issue. Why can I not split on '\t' when reading in binary mode?

Click to expand...

x.split('\t') tries to split on '\t', a string (str), but x is a
bytestring (bytes).

Do x.split(b'\t') instead.

<nitpick>
Instances of the bytes class are more appropriately called 'bytes
objects' rather than 'bytestrings' as they are really lists of integers.
Accessing a single element of a bytes object does not return a bytes
object, but rather the integer at that location; i.e.

--> b'xyz'[1]
121

Contrast that with the str type where

--> 'xyz'[1]
'y'
</nitpick>

~Ethan~

Ian Kelly · May 25, 2011

What do you mean by "may include the decoder state in its return value"?

It does make sense that the values returned from tell() won't be in the
middle of an encoded sequence of bytes.

If you take a look at the source code, tell() returns a long that
includes decoder state data in the upper bytes. For example:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python32\lib\codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "c:\python32\lib\encodings\utf_16.py", line 61, in _buffer_decode
codecs.utf_16_ex_decode(input, errors, 0, final)
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 6-6:
truncated data

The problem of course is the initial space, throwing off the decoder.
We can try to seek past it:
'\ufeff\u0302a'

But notice that since we're not reading from the beginning of the
file, the BOM has now been interpreted as data. However:
'\u0302a'

And you can see that instead of reading from position
73786976294838206465 it has read from position 1 starting in the "read
a BOM" state. Note that I wouldn't recommend doing anything remotely
like this in production code, not least because the value that I
passed into seek() is platform-dependent. This is just a
demonstration of how the seek() value can include decoder state.

Cheers,
Ian

Jussi Piitulainen · May 26, 2011

Looking through the docs did not clarify my understanding of the
issue. Why can I not split on '\t' when reading in binary mode?

You can split on b'\t' to get a list of byteses, which you can then
decode if you want them as strings.

You can decode the bytes to get a string and then split on '\t' to get
strings.

b'tic\ttac\ttoe'.split(b'\t') [b'tic', b'tac', b'toe']
b'tic\ttac\ttoe'.decode('utf-8').split('\t')

Click to expand...

Click to expand...

['tic', 'tac', 'toe']

tkpmep · May 27, 2011

This is exactly what I want to do - I can then pick up various
elements of the list and turn them into floats, ints, etc. I have not
ever used decode, and will look it up in the docs to better understand
it. I can't thank everyone enough for the generous serving of help and
guidance - I certainly would not have discovered all this on my own.

Sincerely

Thomas Philips

Possible File iteration bug	20	Jul 14, 2011
The pty module, reading from a pty, and Python 2/3	0	Oct 23, 2012
counting how often the same word appears in a txt file...But my codeonly prints the last line entry	8	Dec 19, 2012
Reading LAST line from text file without iterating through the file?	191	Feb 23, 2011
Reading the first line of a file (in a zipfile)	7	Apr 11, 2007
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 22, 2014
[Q] How to ignore the first line of the text read from a file	13	Aug 28, 2008
How do i read just the last line of a text file?	8	May 29, 2005

Python 3.2 bug? Reading the last line of a file

tkpmep

MRAB

Ian Kelly

MRAB

tkpmep

MRAB

Ethan Furman

Ethan Furman

Ian Kelly

Jussi Piitulainen

tkpmep

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads