x> If the search-replaces are length-preserving, you can fairly easily operate
x> on chunks of the file at a time using tell and seek.
j> Double file pointer is also good here.
Uri Guttman said:
"j" == jgraber <
[email protected]> writes:
j> This is actually x re-quoting j:
x> j> ||| If the operations are line-oriented
x> j> ||| and always shortening or omitting lines,
j> you may also be able to just open the file with two filehandles,
j> and process normally.
x> You've snipped this highly relevant line marked with |||.
x> I've added it back in:
x> Then you failed to read the suggestion closely. Yes, bad things happen
x> when you take things out context and/or don't pay attention to the
x> instructions.
u> even in the case you cover i find it to be a poor idea. the OS may have
u> issues with this. IIRC winblows locks files for you and that may cause
u> problems. the line lengths could change later in the project and screw
u> things. assuming the newly written lines will always be shorter is a
u> very risky bet. more projects have gotten screwed by making such bets.
x> It may be cleaner, but it not faster on my machine (slower by a
x> factor of 3). And it loads the entire dataset into memory. The
x> whole point of Joel's suggestion was to avoid doing that.
u> for 30 bytes, sure i can see it being slower. i would never expect slurp
u> to be faster for files of that size. but my solution is safe from any
u> line size changes, it is cleaner (as you admitted and that is important
u> too) and it would be faster for some range of file sizes (probably a
u> larger range that you would think). as for slurping in the whole file,
u> that is much easier done than you think with today's typical ram
u> size. as i wrote in my slurp article, it used to be a 20k file would
u> never be slurped and today that isn't even a fly on the wall in terms of
u> size. i am ordering a new pc (no winblows) with 2gb of ram. i would
u> gladly slurp in megabytes on a box like that (especially if it is a
u> server without GUI's sucking up all the ram).
Lots of "may" on both sides, as always, Your Mileage May Vary.
Now that RAM is larger, I use the slurp method more than I used to,
but I still preach avoiding it on applications
where file sizes are commonly greater than 100MB,
and some users may be on older CPUs with multiple jobs running.
I've just always wanted to demonstrate the double filepointer method
because of all the posts saying that it can't be done,
or neglecting to mention it at all.
It works, its theoretically interesting,
and may be perceived to have significant performance advantages
in a tiny minority of specific cases, usually involving
large files (relative to RAM), early decisions to truncate,
and/or some desire to balance CPU vs IO activity.
I should have perhaps highlighted the reasons why
the double filepointer method it is seldom seen,
but Uri has now done that adequately.
While highlighting CAUTIONS for the record, I'll mention that
the -i method is a safer emulation of the OPs request,
"p" = (e-mail address removed) = Original Poster
p> In perl, is it possible to open a file and modify it(using search
p> and replace) and write it to the same file without need of intermediate
p> files. If possible tell me how.
and this safety and ease of use (built into Perl),
makes it preferable in the vast majority of common cases.
One case where -i is not appropriate,
but slurp/rewrite/truncate method works well,
is gzip-in-place on a full unix disk.