Reading a file twice, back to back ?

M

martin

Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

Thanks. Martin
 
I

it_says_BALLS_on_your_forehead

martin said:
Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

i don't understand why you need a second pass at all...
 
J

John W. Krahn

martin said:
Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

perldoc -f seek


John
 
I

it_says_BALLS_on_your_forehead

martin said:
Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

i don't understand why you need a second pass at all...
 
I

it_says_BALLS_on_your_forehead

martin said:
Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

i don't understand why you need a second pass at all...
 
I

it_says_BALLS_on_your_forehead

martin said:
Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

i don't understand why you need a second pass at all...
 
M

martin

Thanks, Actually I had read about

seek (MYFILEHANLDE, 0, 0).

but I was wondering if "seek" is safer than opening and closing and
re-opening. Or if there was any other way. In one of the posts there
was a suggestion that opening twice could potentially modify the
content of a file and is not considered safe or reliable practice.

Martin
 
J

Jürgen Exner

it_says_BALLS_on_your_forehead said:
i don't understand why you need a second pass at all...

Yes, we heard you the first time already.

This can be a totally valid scenario if the file is too large to keep in
memory and e.g. the first pass is to determine the most frequent key and the
second pass to collect all lines containing that key.

jue
 
C

Charles DeRykus

martin said:
Hi, I have a question regarding reading a file.

I like to read a file (let's say FILE1) line by line and print the
lines that match a certain criteria to another file.

The question is, on second pass do I need to first close FILE1 and then
reopen it (with read access). Or is there a better way to do it, to
basically go back to the beginning of the file as if I am opening it
for reading for the second time.

Tie::File could help since you'd have an array containing
all the lines of the file without slurping the whole file
into memory.

If it's a huge file and you're doing lots of searching,
Tie::File might be a bit slow. If that's an issue, you
might try File::Slurp, which despite its name, does try
avoid over-gulping memory if I remember correctly.

hth,
 
I

it_says_BALLS_on_your_forehead

Jürgen Exner said:
Yes, we heard you the first time already.

This can be a totally valid scenario if the file is too large to keep in
memory and e.g. the first pass is to determine the most frequent key and the
second pass to collect all lines containing that key.

sorry, google groups was acting up.
 
X

Xicheng Jia

=> Tie::File could help since you'd have an array containing
=> all the lines of the file without slurping the whole file
=> into memory.

dont think this could be much better than "seek", Tied things intriduce
extra implementation overhead, while by using seek, you can keep
handling the file in line-mode or whatever you previously set.

Xicheng
 
P

Peter J. Holzer

[Rearranged and trimmed quoting for better readability. Humans are used
to read from top to bottom, so please quote relevant context first and
add your comments after that.]
Thanks, Actually I had read about

seek (MYFILEHANLDE, 0, 0).

but I was wondering if "seek" is safer than opening and closing and
re-opening.

(If you were wondering, why didn't you ask that?)

Depends on what you mean by "safe". For a regular file, seek always
rewinds to the beginning of the same file, while reopening the file may
not do that. OTOH, seek doesn't work on some special files (like pipes
or sockets).
Or if there was any other way.

I can't think of a third way at the moment.
In one of the posts there was a suggestion that opening twice could
potentially modify the content of a file and is not considered safe or
reliable practice.

Closing and reopening the file doesn't modify the content of the file.
But when you open a file with the same name twice you are not guaranteed
to open the same file. Consider the following scenario:

1) You open file "foo" and start reading it.

2) Some other process renames "foo" to "foo.old" and creates a new file
"foo".

3) You continue to read from the file you have opened (which is now
called "foo.old").

4) You close the file.

5) You open the file "foo". Oops! This is now a different file than you
read in steps 1, 3 and 4.

hp
 
C

Charles DeRykus

Xicheng said:
=> Tie::File could help since you'd have an array containing
=> all the lines of the file without slurping the whole file
=> into memory.

dont think this could be much better than "seek", Tied things intriduce
extra implementation overhead, while by using seek, you can keep
handling the file in line-mode or whatever you previously set.

Yes, that's the inference I expected to be drawn when I said "if its
a huge file and you're doing lots of searching... might be a bit slow."

For convenience and ease of use though, it'd be much easier to make a
2nd pass by looping through an array instead of a seek to rewind and
re-reading...
 
M

martin

But this could be avoided by locking the file before reading (the first
pass). Can't one do that?

Martin



[Rearranged and trimmed quoting for better readability. Humans are used
to read from top to bottom, so please quote relevant context first and
add your comments after that.]
Thanks, Actually I had read about

seek (MYFILEHANLDE, 0, 0).

but I was wondering if "seek" is safer than opening and closing and
re-opening.

(If you were wondering, why didn't you ask that?)

Depends on what you mean by "safe". For a regular file, seek always
rewinds to the beginning of the same file, while reopening the file may
not do that. OTOH, seek doesn't work on some special files (like pipes
or sockets).
Or if there was any other way.

I can't think of a third way at the moment.
In one of the posts there was a suggestion that opening twice could
potentially modify the content of a file and is not considered safe or
reliable practice.

Closing and reopening the file doesn't modify the content of the file.
But when you open a file with the same name twice you are not guaranteed
to open the same file. Consider the following scenario:

1) You open file "foo" and start reading it.

2) Some other process renames "foo" to "foo.old" and creates a new file
"foo".

3) You continue to read from the file you have opened (which is now
called "foo.old").

4) You close the file.

5) You open the file "foo". Oops! This is now a different file than you
read in steps 1, 3 and 4.

hp

--
_ | Peter J. Holzer | Löschung von at.usenet.schmankerl?
|_|_) | Sysadmin WSR/LUGA |
| | | (e-mail address removed) | Diskussion derzeit in at.usenet.gruppen
__/ | http://www.hjp.at/ |
 
T

Tad McClellan

martin said:
But this could be avoided by locking the file before reading (the first
pass). Can't one do that?

Peter said:
[Rearranged and trimmed quoting for better readability. Humans are used
to read from top to bottom, so please quote relevant context first and
add your comments after that.]


[ snip TOFU]


Your rudeness is now seen as being intentional.

Off to the killfile you go.
 
J

Joe Smith

martin said:
But this could be avoided by locking the file before reading (the first
pass). Can't one do that?

You're supposed to put your question *AFTER* the text you are referring
to, and should cut the quoted text to the bare essentials.
But this could be avoided by locking the file before reading (the first
pass). Can't one do that?

No, it can't be avoided by locking the file. That's not the sort of
thing that locking guards against.
-Joe
 
J

jgraber

Joe Smith said:
You're supposed to put your question *AFTER* the text you are referring
to, and should cut the quoted text to the bare essentials.



No, it can't be avoided by locking the file. That's not the sort of
thing that locking guards against.
-Joe

You can limit the time-window of vulnerability
by opening the same file twice immediately,
then read one filehandle through,
then start over with the second filehandle.

I thought about using "<&" version of open
to dup the first filehandle, but that wont keep the
second file pointer independent.

#tested on linux with system commands echo and mv
use strict; use warnings;
system("echo 'file1' >file1"); # create file1
open ( my $FH1, '<', 'file1') or die "Cant open FH1 file1 : $!\n";
open ( my $FH2, '<', 'file1') or die "Cant open FH2 file1 : $!\n"; # dual
#open (my $FH2, '<&', $FH1 ) or die "Cant open FH2 dup FH1 : $!\n"; # dup
system("mv file1 file1.old"); # some other process
system("echo 'file2' > file1"); # some other process
while(<$FH1>){ print "FH1 ", $_; }
while(<$FH2>){ print "FH2 ", $_; }

output:
FH1 file1
FH2 file1

If you move the # some other process
lines in between the two opens,
you will see the vulnerability as output
FH1 file1
FH2 file2
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top