Is it possible to open a file for both input & output

P

pavan734

Hi,
In perl, is it possible to open a file and modify it(using search
and replace) and write it to the same file without need of intermediate
files.If possible tell me how.
 
P

Paul Lalli

In perl, is it possible to open a file and modify it(using search
and replace) and write it to the same file without need of intermediate
files.If possible tell me how.

Yes. We call this in-place editing. This is possible using Perl's $^I
variable, or the -i command line switch. Read
perldoc perlopentut
perldoc perlrun
which discusses each of them, respectively.

At it's most basic, you can do a one-liner on the command line, like:

perl -pi.bkp -e's/foo/bar/g' file.txt

which will open file.txt, create a copy of its original state as
file.txt.bkp, and then modify the file to replace all "foo" with "bar".

If you need to do this from within a larger problem, you need to make
use of the $^I variable, like so:

{
local @ARGV = qw/file.txt/;
local $^I = '.bkp';
while (<>) {
s/foo/bar/;
}
continue {
print;
}
}

Hope this helps,
Paul Lalli
 
P

pavan734

Hi Paul Thank you very much.
Paul said:
Yes. We call this in-place editing. This is possible using Perl's $^I
variable, or the -i command line switch. Read
perldoc perlopentut
perldoc perlrun
which discusses each of them, respectively.

At it's most basic, you can do a one-liner on the command line, like:

perl -pi.bkp -e's/foo/bar/g' file.txt

which will open file.txt, create a copy of its original state as
file.txt.bkp, and then modify the file to replace all "foo" with "bar".

If you need to do this from within a larger problem, you need to make
use of the $^I variable, like so:

{
local @ARGV = qw/file.txt/;
local $^I = '.bkp';
while (<>) {
s/foo/bar/;
}
continue {
print;
}
}

Hope this helps,
Paul Lalli
 
J

Jürgen Exner

In perl, is it possible to open a file and modify it(using search
and replace) and write it to the same file without need of
intermediate files.If possible tell me how.

I would think the explanation in the documentation of open() is pretty
clear:

"You can
put a "'+'" in front of the "'>'" or "'<'" to indicate that you
want both read and write access to the file;"

Please let us know which part is difficult to understand such that it can be
improved in the next version.

jue
 
X

xhoster

Hi,
In perl, is it possible to open a file and modify it(using search
and replace) and write it to the same file without need of intermediate
files.If possible tell me how.

Not for the general case.

You can use the -i switch (or $^I variable), which uses intermediate files
behind the scenes but you don't need to deal with them explicitly.

If you can operate on the whole file at once (in a scalar) you could use
seek and truncate to overwrite it.

If the search-replaces are length-preserving, you can fairly easily operate
on chunks of the file at a time using tell and seek.

If the operations are not length-preserving but they are line-oriented, you
can use Tie::File, although it will not be very efficient.

Xho
 
X

xhoster

Jürgen Exner said:
Then I guess "perldoc -f open", 3rd paragraph, 4th sentence must be
wrong.

I wouldn't know. All sentences in the 3rd paragraph, other than the first
two, seem to be written in invisible ink.

Xho
 
P

Paul Lalli

Jürgen Exner said:
Then I guess "perldoc -f open", 3rd paragraph, 4th sentence must be wrong.

Hrm. I guess that's vacuously correct, since there is no 4th sentence
in the third paragraph.

Or are you using, perhaps, a prior release of Perl? You may wish to
specify such things when referring to such specific parts of
documentation.

Paul Lalli
 
J

Jürgen Exner

Paul said:
Hrm. I guess that's vacuously correct, since there is no 4th sentence
in the third paragraph.

Or are you using, perhaps, a prior release of Perl? You may wish to
specify such things when referring to such specific parts of
documentation.

Fair enough:
<quote>
You can
put a "'+'" in front of the "'>'" or "'<'" to indicate that you
want both read and write access to the file
</quote>
 
X

xhoster

Jürgen Exner said:
Fair enough:
<quote>
You can
put a "'+'" in front of the "'>'" or "'<'" to indicate that you
want both read and write access to the file
</quote>

Read and write is not generally the same thing as search and replace. Did
you read the OPs message, or merely the subject line?

Xho
 
J

jgraber

Not for the general case.

You can use the -i switch (or $^I variable), which uses intermediate files
behind the scenes but you don't need to deal with them explicitly.

If you can operate on the whole file at once (in a scalar) you could use
seek and truncate to overwrite it.

If the search-replaces are length-preserving, you can fairly easily operate
on chunks of the file at a time using tell and seek.

If the operations are not length-preserving but they are line-oriented, you
can use Tie::File, although it will not be very efficient.

Xho

Thats a nice list of specific case alternatives.

I will clarify that operating on the whole file at once
need not be as a scalar, any type of slurp will do.
{ undef $/; $scalar = <>; } # slurp as scalar
or
@array = <>; # slurp as array
or
while(<>){ push @array,$_ if /pattern/; } # selective array slurp
etc.

Specificially, I have slurped a file through gzip pipe
to create a gzipped memory image scalar,
and then written it back in place and truncated the shorter file.

A more general, but still specific case,
If the operations are line-oriented and typically length shortening,
it is possible, but somewhat complicated, to track two filepointer
locations, and manage your own read and write buffers, such that
the read buffer always keeps ahead of the write buffer.
The extreme case of using a slurp of the entire file as the
read buffer then reverts to the same as the previous paragraph.

If the operations are line-oriented and always shortening or omitting lines,
you may also be able to just open the file with two filehandles,
and process normally.
 
J

jgraber

If the operations are line-oriented and always shortening or omitting lines,
you may also be able to just open the file with two filehandles,
and process normally.

With tested example, works on perl 5.0 v 8 0 linux
Note $WRITE must use "+<" to avoid clobbering file.

#!/usr/local/bin/perl
use warnings; use strict;
open my $INIT, ">", 'uu' or die "cant open init file:$!\n";
print $INIT "line 1,\nline Oct\nline 3,\n"; # 30 bytes
close $INIT or warn "cant close initfile $!\n";
open my $READ, "+<", 'uu' or die "cant open read file:$!\n";
open my $WRITE, "+<", 'uu' or die "cant open writ file:$!\n";
while(<$READ>){ print($WRITE $_) if /Oct/; } # keep only some lines
print STDERR "Read file was ",tell $READ," bytes long\n";
close $READ or warn "error on closing read file : $!\n";
my $wr_len = tell $WRITE;
truncate( $WRITE, $wr_len) or warn "error on truncating write file : $!\n";
close( $WRITE ) or warn "error on closing write file : $!\n";
print STDERR "write file was $wr_len long\n";

% test_2.pl
Read file was 30 bytes long
write file was 10 long
% cat uu
line Oct
 
U

Uri Guttman

j> With tested example, works on perl 5.0 v 8 0 linux
j> Note $WRITE must use "+<" to avoid clobbering file.

j> #!/usr/local/bin/perl
j> use warnings; use strict;
j> open my $INIT, ">", 'uu' or die "cant open init file:$!\n";
j> print $INIT "line 1,\nline Oct\nline 3,\n"; # 30 bytes
j> close $INIT or warn "cant close initfile $!\n";
j> open my $READ, "+<", 'uu' or die "cant open read file:$!\n";
j> open my $WRITE, "+<", 'uu' or die "cant open writ file:$!\n";
j> while(<$READ>){ print($WRITE $_) if /Oct/; } # keep only some lines

that is insane. what if you ended up writing MORE than you read in each
line? eventually you would overwrite data before you read it.

j> print STDERR "Read file was ",tell $READ," bytes long\n";
j> close $READ or warn "error on closing read file : $!\n";
j> my $wr_len = tell $WRITE;
j> truncate( $WRITE, $wr_len) or warn "error on truncating write file : $!\n";
j> close( $WRITE ) or warn "error on closing write file : $!\n";
j> print STDERR "write file was $wr_len long\n";

that can all be done with File::Slurp and it is so much cleaner (and
likely much faster):

my $file_name = 'foo' ;
write_file( $file_name, grep /Oct/, read_file( $file_name ) ) ;

and i am in the planning stages of adding an edit_file call to
file::slurp that will allow that in one call. it would look like this
(tentative api):

edit_file( $file_name, sub { grep /Oct/, $_ } ) ;

there are some open issues regarding list vs scalar editing. any
suggestions are welcome.

uri
 
X

xhoster

||| If the operations are line-oriented and always shortening or omitting
||| lines,
You've snipped this highly relevant line marked with |||.
I've added it back in:
j> With tested example, works on perl 5.0 v 8 0 linux
j> Note $WRITE must use "+<" to avoid clobbering file.

j> #!/usr/local/bin/perl
j> use warnings; use strict;
j> open my $INIT, ">", 'uu' or die "cant open init file:$!\n";
j> print $INIT "line 1,\nline Oct\nline 3,\n"; # 30 bytes
j> close $INIT or warn "cant close initfile $!\n";
j> open my $READ, "+<", 'uu' or die "cant open read file:$!\n";
j> open my $WRITE, "+<", 'uu' or die "cant open writ file:$!\n";
j> while(<$READ>){ print($WRITE $_) if /Oct/; } # keep only some lines

that is insane. what if you ended up writing MORE than you read in each
line?

Then you failed to read the suggestion closely. Yes, bad things happen
when you take things out context and/or don't pay attention to the
instructions.

eventually you would overwrite data before you read it.

j> print STDERR "Read file was ",tell $READ," bytes long\n";
j> close $READ or warn "error on closing read file : $!\n";
j> my $wr_len = tell $WRITE;
j> truncate( $WRITE, $wr_len) or warn "error on truncating write file :
$!\n"; j> close( $WRITE ) or warn "error on closing write file : $!\n";
j> print STDERR "write file was $wr_len long\n";

that can all be done with File::Slurp and it is so much cleaner (and
likely much faster):

my $file_name = 'foo' ;
write_file( $file_name, grep /Oct/, read_file( $file_name ) ) ;

It may be cleaner, but it not faster on my machine (slower by a factor of
3). And it loads the entire dataset into memory. The whole point of Joel's
suggestion was to avoid doing that.

Xho
 
U

Uri Guttman

x> ||| If the operations are line-oriented and always shortening or omitting
x> ||| lines,
x> You've snipped this highly relevant line marked with |||.
x> I've added it back in:

x> Then you failed to read the suggestion closely. Yes, bad things happen
x> when you take things out context and/or don't pay attention to the
x> instructions.

even in the case you cover i find it to be a poor idea. the OS may have
issues with this. IIRC winblows locks files for you and that may cause
problems. the line lengths could change later in the project and screw
things. assuming the newly written lines will always be shorter is a
very risky bet. more projects have gotten screwed by making such bets.

x> It may be cleaner, but it not faster on my machine (slower by a
x> factor of 3). And it loads the entire dataset into memory. The
x> whole point of Joel's suggestion was to avoid doing that.

for 30 bytes, sure i can see it being slower. i would never expect slurp
to be faster for files of that size. but my solution is safe from any
line size changes, it is cleaner (as you admitted and that is important
too) and it would be faster for some range of file sizes (probably a
larger range that you would think). as for slurping in the whole file,
that is much easier done than you think with today's typical ram
size. as i wrote in my slurp article, it used to be a 20k file would
never be slurped and today that isn't even a fly on the wall in terms of
size. i am ordering a new pc (no winblows) with 2gb of ram. i would
gladly slurp in megabytes on a box like that (especially if it is a
server without GUI's sucking up all the ram).

uri
 
J

John W. Krahn

I will clarify that operating on the whole file at once
need not be as a scalar, any type of slurp will do.
{ undef $/; $scalar = <>; } # slurp as scalar

You shouldn't do that because the $/ variable is global and undefining it will
undefine it for the whole program.

{ local $/; $scalar = <> }

Or better:

my $scalar = do { local $/; <> };



John
 
J

jgraber

John W. Krahn said:
You shouldn't do that because the $/ variable
is global and undefining it will
undefine it for the whole program.

{ local $/; $scalar = <> }

Or better:
my $scalar = do { local $/; <> };

Thanks for the correction.
Thats what I was attempting to do by using {}
but I forgot the correct wording of the idiom.
 
J

jgraber

x> If the search-replaces are length-preserving, you can fairly easily operate
x> on chunks of the file at a time using tell and seek.
j> Double file pointer is also good here.

Uri Guttman said:
"j" == jgraber <[email protected]> writes:

j> This is actually x re-quoting j:
x> j> ||| If the operations are line-oriented
x> j> ||| and always shortening or omitting lines,
j> you may also be able to just open the file with two filehandles,
j> and process normally.

x> You've snipped this highly relevant line marked with |||.
x> I've added it back in:


x> Then you failed to read the suggestion closely. Yes, bad things happen
x> when you take things out context and/or don't pay attention to the
x> instructions.
u> even in the case you cover i find it to be a poor idea. the OS may have
u> issues with this. IIRC winblows locks files for you and that may cause
u> problems. the line lengths could change later in the project and screw
u> things. assuming the newly written lines will always be shorter is a
u> very risky bet. more projects have gotten screwed by making such bets.
x> It may be cleaner, but it not faster on my machine (slower by a
x> factor of 3). And it loads the entire dataset into memory. The
x> whole point of Joel's suggestion was to avoid doing that.
u> for 30 bytes, sure i can see it being slower. i would never expect slurp
u> to be faster for files of that size. but my solution is safe from any
u> line size changes, it is cleaner (as you admitted and that is important
u> too) and it would be faster for some range of file sizes (probably a
u> larger range that you would think). as for slurping in the whole file,
u> that is much easier done than you think with today's typical ram
u> size. as i wrote in my slurp article, it used to be a 20k file would
u> never be slurped and today that isn't even a fly on the wall in terms of
u> size. i am ordering a new pc (no winblows) with 2gb of ram. i would
u> gladly slurp in megabytes on a box like that (especially if it is a
u> server without GUI's sucking up all the ram).

Lots of "may" on both sides, as always, Your Mileage May Vary.

Now that RAM is larger, I use the slurp method more than I used to,
but I still preach avoiding it on applications
where file sizes are commonly greater than 100MB,
and some users may be on older CPUs with multiple jobs running.

I've just always wanted to demonstrate the double filepointer method
because of all the posts saying that it can't be done,
or neglecting to mention it at all.
It works, its theoretically interesting,
and may be perceived to have significant performance advantages
in a tiny minority of specific cases, usually involving
large files (relative to RAM), early decisions to truncate,
and/or some desire to balance CPU vs IO activity.

I should have perhaps highlighted the reasons why
the double filepointer method it is seldom seen,
but Uri has now done that adequately.

While highlighting CAUTIONS for the record, I'll mention that
the -i method is a safer emulation of the OPs request,
"p" = (e-mail address removed) = Original Poster
p> In perl, is it possible to open a file and modify it(using search
p> and replace) and write it to the same file without need of intermediate
p> files. If possible tell me how.

and this safety and ease of use (built into Perl),
makes it preferable in the vast majority of common cases.

One case where -i is not appropriate,
but slurp/rewrite/truncate method works well,
is gzip-in-place on a full unix disk.
 
U

Uri Guttman

j> Thanks for the correction.
j> Thats what I was attempting to do by using {}
j> but I forgot the correct wording of the idiom.

or use File::Slurp as i mentioned elsewhere in this thread. it generally
beats that idiom in speed and is much cleaner code as well IMO.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top