Reading a text file backwards

Jay · Sep 30, 2004

I have a very large text file (being read by a CGI script on a web server),
and I get memory errors when I try to read the whole file into a list of
strings. The problem is, I want to read the file backwards, starting with
the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?

Rick Holbert · Sep 30, 2004

Jay,

Try this:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
mylines.reverse()

Rick

Andrew Dalke · Sep 30, 2004

Jay said:
Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?

Python Cookbook has a recipe. Or two.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/276149
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/120686

I've not looked at them to judge the quality

Another approach is to read the lines forwards and save
the starting line position. Then iterate backwards
through the positions, seek to it and read a line.

def find_offsets(infile):
offsets = []
offset = 0
for line in infile:
offsets.append(offset)
offset += len(line)
return offsets

def iter_backwards(infile):
# make sure it's seekable and at the start
infile.seek(0)
offsets = find_offsets(infile)
for offset in offsets[::-1]:
infile.seek(offset)
yield infile.readline()

for line in iter_backwards(open("spam.py")):
print repr(line)

This won't work on MS Windows because of the
'\r\n' -> '\n' conversion. You would instead
need something like

def find_offsets(infile):
offsets = []
while 1:
offset = infile.tell()
if not infile.readline():
break
offsets.append(offset)
return offsets

Just submitted this solution to the cookbook.

Andrew
(e-mail address removed)

Daniel Yoo · Sep 30, 2004

: Jay,

: Try this:

: myfile = open('myfile.txt', 'r')
: mylines = myfile.readlines()
: myfile.close()
: mylines.reverse()

Hi Rick,

But this probably won't work for Jay: he's running into memory issues
because the file's too large to hold in memory at once. The point is
to avoid readlines().

Here's a generator that tries to iterate backwards across a file. We
first get the file positions of each newline, and then afterwards
start going through the offsets.

###

def backfileiter(myfile):
"""Iterates the lines of a file, but in reverse order."""
myfile.seek(0)
offsets = _getLineOffsets(myfile)
myfile.seek(0)
offsets.reverse()
for i in offsets:
myfile.seek(i+1)
yield myfile.readline()

def _getLineOffsets(myfile):
"""Return a list of offsets where newlines are located."""
offsets = [-1]
i = 0
while True:
byte = myfile.read(1)
if not byte:
break
elif byte == '\n':
offsets.append(i)
i += 1
return offsets
###

For example:

### .... hello world
.... this
.... is a
.... test""")
....
'test'
'is a\n'
'this\n'
'hello world\n'
'\n'
###

Hope this helps!

Graham Fawcett · Oct 1, 2004

It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line

Andrew Dalke · Oct 1, 2004

Graham said:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.

Andrew
(e-mail address removed)

Jeremy Bowers · Oct 1, 2004

It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line

It probably isn't shifting the burden; they probably do it right.

Doing it right involves reading the file in chunks backwards, and scanning
backwards for newlines, but getting it right when lines cross boundaries,
while perhaps not *hard*, is exactly the kind of tricky programming it is
best to do once... preferably somebody else's once.

This way you don't read the file twice, as the first time can take a while.

Paul Rubin · Oct 1, 2004

Andrew Dalke said:
Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.

You can try tail(1).

Reading/writing a dictionary to file problem :(	1	Mar 31, 2020
Problem Splitting Text String	2	Dec 29, 2022
Php combine identical lines in text file	4	Oct 11, 2023
Reading file issue	5	Jan 28, 2013
Seeking help: reading text file with genfromtxt	0	Apr 4, 2012
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019
Reading csv file	1	Dec 17, 2013
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023

Reading a text file backwards

Jay

Rick Holbert

Andrew Dalke

Daniel Yoo

Graham Fawcett

Andrew Dalke

Jeremy Bowers

Paul Rubin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads