deleting a line from a file

E

eMko

Hello,

In Perl, using a Tie::File module I can easily and comfortably delete
a line from the middle of a text file:

my @file;
open(DATA, "+<:encoding(utf8):raw" , "file.txt") or return 0;
tie @file, 'Tie::File', \*DATA or return 0;
splice(@file, $_[0], 1);
untie @file;
close DATA;

(when the first argument of the function ($_[0]) is a number of the
line which should be deleted)

Is there some easy way how to delete a line from a middle of a file in
Python?

Thanks a lot
eMko
 
S

Steven D'Aprano

Is there some easy way how to delete a line from a middle of a file in
Python?

If the file is small enough to fit into memory (say, up to a few hundred
megabytes on most modern PCs):


lines = open('file', 'r').readlines()
del line[100]
open('file', 'w').writelines(lines)


Quick and easy for the coder, but not the safest way to do it in serious
production code because there's no error handling there.

The only safe way to delete a line from a file (at least under common
operating systems like Windows, Linux and Mac) is to copy the file
(without the line you wish to delete) to a temporary file, then replace
the original file with the new version. That's also how to do it for
files too big to read into memory.
 
P

Paul Rubin

Steven D'Aprano said:
The only safe way to delete a line from a file (at least under common
operating systems like Windows, Linux and Mac) is to copy the file
(without the line you wish to delete) to a temporary file, then replace
the original file with the new version. That's also how to do it for
files too big to read into memory.

You could do it "in place" in all those systems afaik, either opening
the file for both reading and writing, or using something like mmap.
Basically you'd leave the file unchanged up to line N, then copy lines
downward starting from line N+1. At the end you'd use ftrunc to
shrink the file, getting rid of the duplicate last line.
 
P

Paddy

Hello,

In Perl, using a Tie::File module I can easily and comfortably delete
a line from the middle of a text file:

my @file;
open(DATA, "+<:encoding(utf8):raw" , "file.txt") or return 0;
tie @file, 'Tie::File', \*DATA or return 0;
splice(@file, $_[0], 1);
untie @file;
close DATA;

(when the first argument of the function ($_[0]) is a number of the
line which should be deleted)

Is there some easy way how to delete a line from a middle of a file in
Python?

Thanks a lot
eMko

Module fileinput has\;

Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file (if
a file of the same name as the backup file already exists, it will be
replaced silently). This makes it possible to write a filter that
rewrites its input file in place. If the keyword argument
backup='.<some extension>' is also given, it specifies the extension
for the backup file, and the backup file remains around; by default,
the extension is '.bak' and it is deleted when the output file is
closed. In-place filtering is disabled when standard input is read.


- Paddy.
 
M

Mark Wooding

Paul Rubin said:
You could do it "in place" in all those systems afaik, either opening
the file for both reading and writing, or using something like mmap.
Basically you'd leave the file unchanged up to line N, then copy lines
downward starting from line N+1. At the end you'd use ftrunc to
shrink the file, getting rid of the duplicate last line.

Making a new copy and renaming it when you're finished is probably both
easier (don't have to keep seeking about all the time) and more reliable
(doesn't leave your file corrupted if you crash half-way through).

Is there a standard wossname which does this?

from __future__ import with_statement
from contextlib import contextmanager
import os, sys, errno

def fresh_file(base, mode = 'w'):
"""
Return a file name and open file handle for a fresh file in the same
directory as BASE.
"""
for seq in xrange(50):
try:
name = '%s.new.%d' % (base, seq)
fd = os.open(name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
f = os.fdopen(fd, mode)
return name, f
except OSError, err:
if err.errno == errno.EEXIST:
pass
else:
raise
raise IOError(errno.EEXIST, os.strerror(errno.EEXIST), base)

@contextmanager
def safely_writing(filename, mode = 'w'):
"""
Context manager for updating files safely.

It produces a file object. If the controlled suite completes successfully,
the file named by FILENAME is atomically replaced by the material written
to the file object; otherwise the file is left alone.

Safe in the presence of multiple simultaneous writers, in the sense that
the resulting file is exactly the output of one of the writers (chosen
nondeterministically).
"""
f = None
newname = None
try:
newname, f = fresh_file(filename, mode)
yield f
f.close()
f = None
os.rename(newname, filename)
finally:
if f is not None:
f.close()
if newname is not None:
try:
os.unlink(newname)
except:
pass

It seems like an obvious thing to want.

(Extra messing about will be needed on Windows, which doesn't have
proper atomic-rename semantics. Playing with the transactional
filesystem stuff is left as an exercise to the interested student.)

-- [mdw]
 
P

Paddy

Paul Rubin said:
You could do it "in place" in all those systems afaik, either opening
the file for both reading and writing, or using something like mmap.
Basically you'd leave the file unchanged up to line N, then copy lines
downward starting from line N+1. At the end you'd use ftrunc to
shrink the file, getting rid of the duplicate last line.

Making a new copy and renaming it when you're finished is probably both
easier (don't have to keep seeking about all the time) and more reliable
(doesn't leave your file corrupted if you crash half-way through).

Is there a standard wossname which does this?

from __future__ import with_statement
from contextlib import contextmanager
import os, sys, errno

def fresh_file(base, mode = 'w'):
"""
Return a file name and open file handle for a fresh file in the same
directory as BASE.
"""
for seq in xrange(50):
try:
name = '%s.new.%d' % (base, seq)
fd = os.open(name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
f = os.fdopen(fd, mode)
return name, f
except OSError, err:
if err.errno == errno.EEXIST:
pass
else:
raise
raise IOError(errno.EEXIST, os.strerror(errno.EEXIST), base)

@contextmanager
def safely_writing(filename, mode = 'w'):
"""
Context manager for updating files safely.

It produces a file object. If the controlled suite completes successfully,
the file named by FILENAME is atomically replaced by the material written
to the file object; otherwise the file is left alone.

Safe in the presence of multiple simultaneous writers, in the sense that
the resulting file is exactly the output of one of the writers (chosen
nondeterministically).
"""
f = None
newname = None
try:
newname, f = fresh_file(filename, mode)
yield f
f.close()
f = None
os.rename(newname, filename)
finally:
if f is not None:
f.close()
if newname is not None:
try:
os.unlink(newname)
except:
pass

It seems like an obvious thing to want.

(Extra messing about will be needed on Windows, which doesn't have
proper atomic-rename semantics. Playing with the transactional
filesystem stuff is left as an exercise to the interested student.)

-- [mdw]

Why not use the fileinput modules functionality to iterate over a file
in-place,printing just those lines you want?

- Paddy.
 
M

Mark Wooding

Paddy said:
Why not use the fileinput modules functionality to iterate over a file
in-place,printing just those lines you want?

From the Python 2.5 manual:

: *Optional in-place filtering:* if the keyword argument `INPLACE=1' is
: passed to `input()' or to the `FileInput' constructor, the file is
: moved to a backup file and standard output is directed to the input
: file (if a file of the same name as the backup file already exists, it
: will be replaced silently).

This behaviour is very dangerous. If the script fails half-way through,
it will leave the partially-written file in place, with the `official'
name. The change-over is not atomic, breaking other programs attempting
to read simultaneously with an update.

Two almost-simultaneous updates will corrupt the file without a usable
backup. The first will back up the input file, and start writing. A
second will /replace/ the backup file with the partially-constructed
output of the first, and then start processing it; but since its input
is incomplete, it will produce incomplete output.

The safely_writing context manager has none of these defects.

-- [mdw]
 
P

Paddy

From the Python 2.5 manual:

: *Optional in-place filtering:* if the keyword argument `INPLACE=1' is
: passed to `input()' or to the `FileInput' constructor, the file is
: moved to a backup file and standard output is directed to the input
: file (if a file of the same name as the backup file already exists, it
: will be replaced silently).

This behaviour is very dangerous. If the script fails half-way through,
it will leave the partially-written file in place, with the `official'
name. The change-over is not atomic, breaking other programs attempting
to read simultaneously with an update.

I've used this methodology all the time both explicitely writing
utilities like that and using 'perl -p -i -e ...'
theoretically there may be problems. In practice I have people
transforming files in their own workarea, and have never been called
to
clear-up after such a failing.
Although theoretically you can describe a method of failure, I take
issue
with you calling it 'very dangerous' without further qualification.
Two almost-simultaneous updates will corrupt the file without a usable
backup. The first will back up the input file, and start writing. A
second will /replace/ the backup file with the partially-constructed
output of the first, and then start processing it; but since its input
is incomplete, it will produce incomplete output.

Ahah - the qualification. The above would be dangerous, but the OP may
find
that his current flow, or a slight modification of it will make
simultaneous
updates unlikely and use of fileinput OK. If not, then other methods
may
have to be employed. It _is_ good to know the threat and do the
analysis though.

- Paddy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top