Reading a file twice, back to back ?

Discussion in 'Perl Misc' started by martin, Apr 16, 2006.

  1. martin

    martin Guest

    Hi, I have a question regarding reading a file.

    I like to read a file (let's say FILE1) line by line and print the
    lines that match a certain criteria to another file.

    The question is, on second pass do I need to first close FILE1 and then
    reopen it (with read access). Or is there a better way to do it, to
    basically go back to the beginning of the file as if I am opening it
    for reading for the second time.

    Thanks. Martin
    martin, Apr 16, 2006
    #1
    1. Advertising

  2. martin wrote:
    > Hi, I have a question regarding reading a file.
    >
    > I like to read a file (let's say FILE1) line by line and print the
    > lines that match a certain criteria to another file.
    >
    > The question is, on second pass do I need to first close FILE1 and then
    > reopen it (with read access). Or is there a better way to do it, to
    > basically go back to the beginning of the file as if I am opening it
    > for reading for the second time.
    >


    i don't understand why you need a second pass at all...
    it_says_BALLS_on_your_forehead, Apr 16, 2006
    #2
    1. Advertising

  3. martin wrote:
    > Hi, I have a question regarding reading a file.
    >
    > I like to read a file (let's say FILE1) line by line and print the
    > lines that match a certain criteria to another file.
    >
    > The question is, on second pass do I need to first close FILE1 and then
    > reopen it (with read access). Or is there a better way to do it, to
    > basically go back to the beginning of the file as if I am opening it
    > for reading for the second time.


    perldoc -f seek


    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Apr 16, 2006
    #3
  4. martin wrote:
    > Hi, I have a question regarding reading a file.
    >
    > I like to read a file (let's say FILE1) line by line and print the
    > lines that match a certain criteria to another file.
    >
    > The question is, on second pass do I need to first close FILE1 and then
    > reopen it (with read access). Or is there a better way to do it, to
    > basically go back to the beginning of the file as if I am opening it
    > for reading for the second time.
    >


    i don't understand why you need a second pass at all...
    it_says_BALLS_on_your_forehead, Apr 16, 2006
    #4
  5. martin wrote:
    > Hi, I have a question regarding reading a file.
    >
    > I like to read a file (let's say FILE1) line by line and print the
    > lines that match a certain criteria to another file.
    >
    > The question is, on second pass do I need to first close FILE1 and then
    > reopen it (with read access). Or is there a better way to do it, to
    > basically go back to the beginning of the file as if I am opening it
    > for reading for the second time.
    >


    i don't understand why you need a second pass at all...
    it_says_BALLS_on_your_forehead, Apr 16, 2006
    #5
  6. martin wrote:
    > Hi, I have a question regarding reading a file.
    >
    > I like to read a file (let's say FILE1) line by line and print the
    > lines that match a certain criteria to another file.
    >
    > The question is, on second pass do I need to first close FILE1 and then
    > reopen it (with read access). Or is there a better way to do it, to
    > basically go back to the beginning of the file as if I am opening it
    > for reading for the second time.
    >


    i don't understand why you need a second pass at all...
    it_says_BALLS_on_your_forehead, Apr 16, 2006
    #6
  7. martin

    martin Guest

    Thanks, Actually I had read about

    seek (MYFILEHANLDE, 0, 0).

    but I was wondering if "seek" is safer than opening and closing and
    re-opening. Or if there was any other way. In one of the posts there
    was a suggestion that opening twice could potentially modify the
    content of a file and is not considered safe or reliable practice.

    Martin

    John W. Krahn wrote:
    > martin wrote:
    > > Hi, I have a question regarding reading a file.
    > >
    > > I like to read a file (let's say FILE1) line by line and print the
    > > lines that match a certain criteria to another file.
    > >
    > > The question is, on second pass do I need to first close FILE1 and then
    > > reopen it (with read access). Or is there a better way to do it, to
    > > basically go back to the beginning of the file as if I am opening it
    > > for reading for the second time.

    >
    > perldoc -f seek
    >
    >
    > John
    > --
    > use Perl;
    > program
    > fulfillment
    martin, Apr 16, 2006
    #7
  8. it_says_BALLS_on_your_forehead wrote:
    > martin wrote:
    >> Hi, I have a question regarding reading a file.
    >>
    >> I like to read a file (let's say FILE1) line by line and print the
    >> lines that match a certain criteria to another file.
    >>
    >> The question is, on second pass do I need to first close FILE1 and
    >> then reopen it (with read access). Or is there a better way to do
    >> it, to basically go back to the beginning of the file as if I am
    >> opening it for reading for the second time.
    >>

    >
    > i don't understand why you need a second pass at all...


    Yes, we heard you the first time already.

    This can be a totally valid scenario if the file is too large to keep in
    memory and e.g. the first pass is to determine the most frequent key and the
    second pass to collect all lines containing that key.

    jue
    Jürgen Exner, Apr 16, 2006
    #8
  9. martin wrote:
    > Hi, I have a question regarding reading a file.
    >
    > I like to read a file (let's say FILE1) line by line and print the
    > lines that match a certain criteria to another file.
    >
    > The question is, on second pass do I need to first close FILE1 and then
    > reopen it (with read access). Or is there a better way to do it, to
    > basically go back to the beginning of the file as if I am opening it
    > for reading for the second time.
    >


    Tie::File could help since you'd have an array containing
    all the lines of the file without slurping the whole file
    into memory.

    If it's a huge file and you're doing lots of searching,
    Tie::File might be a bit slow. If that's an issue, you
    might try File::Slurp, which despite its name, does try
    avoid over-gulping memory if I remember correctly.

    hth,
    --
    Charles DeRykus
    Charles DeRykus, Apr 16, 2006
    #9
  10. Jürgen Exner wrote:
    > it_says_BALLS_on_your_forehead wrote:
    > > martin wrote:
    > >> Hi, I have a question regarding reading a file.
    > >>
    > >> I like to read a file (let's say FILE1) line by line and print the
    > >> lines that match a certain criteria to another file.
    > >>
    > >> The question is, on second pass do I need to first close FILE1 and
    > >> then reopen it (with read access). Or is there a better way to do
    > >> it, to basically go back to the beginning of the file as if I am
    > >> opening it for reading for the second time.
    > >>

    > >
    > > i don't understand why you need a second pass at all...

    >
    > Yes, we heard you the first time already.
    >
    > This can be a totally valid scenario if the file is too large to keep in
    > memory and e.g. the first pass is to determine the most frequent key and the
    > second pass to collect all lines containing that key.


    sorry, google groups was acting up.
    it_says_BALLS_on_your_forehead, Apr 16, 2006
    #10
  11. martin

    Xicheng Jia Guest

    Charles DeRykus wrote:
    > martin wrote:
    > > Hi, I have a question regarding reading a file.
    > >
    > > I like to read a file (let's say FILE1) line by line and print the
    > > lines that match a certain criteria to another file.
    > >
    > > The question is, on second pass do I need to first close FILE1 and then
    > > reopen it (with read access). Or is there a better way to do it, to
    > > basically go back to the beginning of the file as if I am opening it
    > > for reading for the second time.
    > >

    >

    => Tie::File could help since you'd have an array containing
    => all the lines of the file without slurping the whole file
    => into memory.

    dont think this could be much better than "seek", Tied things intriduce
    extra implementation overhead, while by using seek, you can keep
    handling the file in line-mode or whatever you previously set.

    Xicheng

    > If it's a huge file and you're doing lots of searching,
    > Tie::File might be a bit slow. If that's an issue, you
    > might try File::Slurp, which despite its name, does try
    > avoid over-gulping memory if I remember correctly.
    >
    > hth,
    > --
    > Charles DeRykus
    Xicheng Jia, Apr 16, 2006
    #11
  12. [Rearranged and trimmed quoting for better readability. Humans are used
    to read from top to bottom, so please quote relevant context first and
    add your comments after that.]

    martin wrote:
    > John W. Krahn wrote:
    >> martin wrote:
    >> > The question is, on second pass do I need to first close FILE1 and
    >> > then reopen it (with read access). Or is there a better way to do
    >> > it, to basically go back to the beginning of the file as if I am
    >> > opening it for reading for the second time.

    >>
    >> perldoc -f seek

    >
    > Thanks, Actually I had read about
    >
    > seek (MYFILEHANLDE, 0, 0).
    >
    > but I was wondering if "seek" is safer than opening and closing and
    > re-opening.


    (If you were wondering, why didn't you ask that?)

    Depends on what you mean by "safe". For a regular file, seek always
    rewinds to the beginning of the same file, while reopening the file may
    not do that. OTOH, seek doesn't work on some special files (like pipes
    or sockets).

    > Or if there was any other way.


    I can't think of a third way at the moment.

    > In one of the posts there was a suggestion that opening twice could
    > potentially modify the content of a file and is not considered safe or
    > reliable practice.


    Closing and reopening the file doesn't modify the content of the file.
    But when you open a file with the same name twice you are not guaranteed
    to open the same file. Consider the following scenario:

    1) You open file "foo" and start reading it.

    2) Some other process renames "foo" to "foo.old" and creates a new file
    "foo".

    3) You continue to read from the file you have opened (which is now
    called "foo.old").

    4) You close the file.

    5) You open the file "foo". Oops! This is now a different file than you
    read in steps 1, 3 and 4.

    hp

    --
    _ | Peter J. Holzer | Löschung von at.usenet.schmankerl?
    |_|_) | Sysadmin WSR/LUGA |
    | | | | Diskussion derzeit in at.usenet.gruppen
    __/ | http://www.hjp.at/ |
    Peter J. Holzer, Apr 16, 2006
    #12
  13. Xicheng Jia wrote:
    > Charles DeRykus wrote:
    >> martin wrote:
    >>> Hi, I have a question regarding reading a file.
    >>>
    >>> I like to read a file (let's say FILE1) line by line and print the
    >>> lines that match a certain criteria to another file.
    >>>
    >>> The question is, on second pass do I need to first close FILE1 and then
    >>> reopen it (with read access). Or is there a better way to do it, to
    >>> basically go back to the beginning of the file as if I am opening it
    >>> for reading for the second time.
    >>>

    > => Tie::File could help since you'd have an array containing
    > => all the lines of the file without slurping the whole file
    > => into memory.
    >
    > dont think this could be much better than "seek", Tied things intriduce
    > extra implementation overhead, while by using seek, you can keep
    > handling the file in line-mode or whatever you previously set.
    >


    Yes, that's the inference I expected to be drawn when I said "if its
    a huge file and you're doing lots of searching... might be a bit slow."

    For convenience and ease of use though, it'd be much easier to make a
    2nd pass by looping through an array instead of a seek to rewind and
    re-reading...

    >
    >> If it's a huge file and you're doing lots of searching,
    >> Tie::File might be a bit slow. If that's an issue, you
    >> might try File::Slurp, which despite its name, does try
    >> avoid over-gulping memory if I remember correctly.
    >>


    --
    Charles DeRykus
    Charles DeRykus, Apr 16, 2006
    #13
  14. martin

    martin Guest

    But this could be avoided by locking the file before reading (the first
    pass). Can't one do that?

    Martin




    Peter J. Holzer wrote:
    > [Rearranged and trimmed quoting for better readability. Humans are used
    > to read from top to bottom, so please quote relevant context first and
    > add your comments after that.]
    >
    > martin wrote:
    > > John W. Krahn wrote:
    > >> martin wrote:
    > >> > The question is, on second pass do I need to first close FILE1 and
    > >> > then reopen it (with read access). Or is there a better way to do
    > >> > it, to basically go back to the beginning of the file as if I am
    > >> > opening it for reading for the second time.
    > >>
    > >> perldoc -f seek

    > >
    > > Thanks, Actually I had read about
    > >
    > > seek (MYFILEHANLDE, 0, 0).
    > >
    > > but I was wondering if "seek" is safer than opening and closing and
    > > re-opening.

    >
    > (If you were wondering, why didn't you ask that?)
    >
    > Depends on what you mean by "safe". For a regular file, seek always
    > rewinds to the beginning of the same file, while reopening the file may
    > not do that. OTOH, seek doesn't work on some special files (like pipes
    > or sockets).
    >
    > > Or if there was any other way.

    >
    > I can't think of a third way at the moment.
    >
    > > In one of the posts there was a suggestion that opening twice could
    > > potentially modify the content of a file and is not considered safe or
    > > reliable practice.

    >
    > Closing and reopening the file doesn't modify the content of the file.
    > But when you open a file with the same name twice you are not guaranteed
    > to open the same file. Consider the following scenario:
    >
    > 1) You open file "foo" and start reading it.
    >
    > 2) Some other process renames "foo" to "foo.old" and creates a new file
    > "foo".
    >
    > 3) You continue to read from the file you have opened (which is now
    > called "foo.old").
    >
    > 4) You close the file.
    >
    > 5) You open the file "foo". Oops! This is now a different file than you
    > read in steps 1, 3 and 4.
    >
    > hp
    >
    > --
    > _ | Peter J. Holzer | Löschung von at.usenet.schmankerl?
    > |_|_) | Sysadmin WSR/LUGA |
    > | | | | Diskussion derzeit in at.usenet.gruppen
    > __/ | http://www.hjp.at/ |
    martin, Apr 16, 2006
    #14
  15. martin <> wrote:
    > But this could be avoided by locking the file before reading (the first
    > pass). Can't one do that?



    > Peter J. Holzer wrote:
    >> [Rearranged and trimmed quoting for better readability. Humans are used
    >> to read from top to bottom, so please quote relevant context first and
    >> add your comments after that.]



    [ snip TOFU]


    Your rudeness is now seen as being intentional.

    Off to the killfile you go.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Apr 16, 2006
    #15
  16. martin

    Joe Smith Guest

    martin wrote:
    > But this could be avoided by locking the file before reading (the first
    > pass). Can't one do that?


    You're supposed to put your question *AFTER* the text you are referring
    to, and should cut the quoted text to the bare essentials.

    >> 1) You open file "foo" and start reading it.
    >>
    >> 2) Some other process renames "foo" to "foo.old" and creates a new file
    >> "foo".
    >>
    >> 3) You continue to read from the file you have opened (which is now
    >> called "foo.old").
    >>
    >> 4) You close the file.
    >>
    >> 5) You open the file "foo". Oops! This is now a different file than you
    >> read in steps 1, 3 and 4.


    martin wrote:
    > But this could be avoided by locking the file before reading (the first
    > pass). Can't one do that?


    No, it can't be avoided by locking the file. That's not the sort of
    thing that locking guards against.
    -Joe
    Joe Smith, Apr 18, 2006
    #16
  17. martin

    Guest

    Joe Smith <> writes:
    > martin wrote:
    > > But this could be avoided by locking the file before reading (the first
    > > pass). Can't one do that?

    >
    > You're supposed to put your question *AFTER* the text you are referring
    > to, and should cut the quoted text to the bare essentials.
    >
    > >> 1) You open file "foo" and start reading it.
    > >>
    > >> 2) Some other process renames "foo" to "foo.old" and creates a new file
    > >> "foo".
    > >>
    > >> 3) You continue to read from the file you have opened (which is now
    > >> called "foo.old").
    > >>
    > >> 4) You close the file.
    > >>
    > >> 5) You open the file "foo". Oops! This is now a different file than you
    > >> read in steps 1, 3 and 4.

    >
    > martin wrote:
    > > But this could be avoided by locking the file before reading (the first
    > > pass). Can't one do that?

    >
    > No, it can't be avoided by locking the file. That's not the sort of
    > thing that locking guards against.
    > -Joe


    You can limit the time-window of vulnerability
    by opening the same file twice immediately,
    then read one filehandle through,
    then start over with the second filehandle.

    I thought about using "<&" version of open
    to dup the first filehandle, but that wont keep the
    second file pointer independent.

    #tested on linux with system commands echo and mv
    use strict; use warnings;
    system("echo 'file1' >file1"); # create file1
    open ( my $FH1, '<', 'file1') or die "Cant open FH1 file1 : $!\n";
    open ( my $FH2, '<', 'file1') or die "Cant open FH2 file1 : $!\n"; # dual
    #open (my $FH2, '<&', $FH1 ) or die "Cant open FH2 dup FH1 : $!\n"; # dup
    system("mv file1 file1.old"); # some other process
    system("echo 'file2' > file1"); # some other process
    while(<$FH1>){ print "FH1 ", $_; }
    while(<$FH2>){ print "FH2 ", $_; }

    output:
    FH1 file1
    FH2 file1

    If you move the # some other process
    lines in between the two opens,
    you will see the vulnerability as output
    FH1 file1
    FH2 file2

    --
    Joel
    , Apr 18, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DJP
    Replies:
    7
    Views:
    7,339
    glen herrmannsfeldt
    Oct 21, 2004
  2. DJP
    Replies:
    16
    Views:
    956
    Villy Kruse
    Oct 21, 2004
  3. Kiuhnm

    twice(twice(x))

    Kiuhnm, Apr 1, 2006, in forum: C++
    Replies:
    2
    Views:
    381
    Kiuhnm
    Apr 1, 2006
  4. ramana
    Replies:
    5
    Views:
    342
    ramana
    Dec 26, 2007
  5. ramana
    Replies:
    7
    Views:
    314
    red floyd
    Dec 27, 2007
Loading...

Share This Page