Read and re-write file with one open?

Discussion in 'Ruby' started by Adam Bender, Apr 30, 2009.

  1. Adam Bender

    Adam Bender Guest

    [Note: parts of this message were removed to make it a legal post.]

    I would like to write a Ruby script that opens a text file, performs a gsub
    on each line, and then overwrites the file with the updated contents. Right
    now I open the file twice: once to read and once to write. The reason for
    this is that if I try to perform both operations on the same IO object by
    calling io.rewind and writing from the beginning, and the substituted word
    is shorter than what it is replacing, some portion of the end of the
    original file remains. Is there an idiom for "clearing" the contents of the
    file before writing?

    Thanks,

    Adam
     
    Adam Bender, Apr 30, 2009
    #1
    1. Advertising

  2. Instead of using the rewind method, you can use the reopen method to
    open a same file in Write mode.
    Example :-
    file = File.open( "filename", "r" )
    file.reopen( "filename", "w" )
    --
    Posted via http://www.ruby-forum.com/.
     
    Siddick Ebramsha, Apr 30, 2009
    #2
    1. Advertising

  3. On Thu, Apr 30, 2009 at 12:27 AM, Adam Bender <> wrote:
    > I would like to write a Ruby script that opens a text file, performs a gs=

    ub
    > on each line, and then overwrites the file with the updated contents. =A0=

    Right
    > now I open the file twice: once to read and once to write. =A0The reason =

    for
    > this is that if I try to perform both operations on the same IO object by
    > calling io.rewind and writing from the beginning, and the substituted wor=

    d
    > is shorter than what it is replacing, some portion of the end of the
    > original file remains. =A0Is there an idiom for "clearing" the contents o=

    f the
    > file before writing?


    Yes, but typically this is done by creating a new file and then
    renaming it to replace the old one.

    Here's an example from my upcoming book "Ruby Best Practices"[0]. It
    naively strips comments from source files.
    You can modify it to fit your needs.

    ----------

    require "tempfile"
    require "fileutils"

    temp =3D Tempfile.new("without_comments")
    File.foreach(ARGV[0]) do |line|
    temp << line unless line =3D~ /^\s*#/
    end
    temp.close

    FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") # comment out if you don't want back=
    ups.
    FileUtils.mv(temp.path,ARGV[0])

    ----------

    -greg

    PS: sorry for the shameless book plug, but it really might be helpful
    for questions like these. :)


    [0] http://rubybestpractices.com
     
    Gregory Brown, Apr 30, 2009
    #3
  4. Adam Bender

    7stud -- Guest

    Adam Bender wrote:
    > I would like to write a Ruby script that opens a text file, performs a
    > gsub
    > on each line, and then overwrites the file with the updated contents.
    > Right
    > now I open the file twice: once to read and once to write. The reason
    > for
    > this is that if I try to perform both operations on the same IO object
    > by
    > calling io.rewind and writing from the beginning, and the substituted
    > word
    > is shorter than what it is replacing, some portion of the end of the
    > original file remains. Is there an idiom for "clearing" the contents of
    > the
    > file before writing?
    >


    Yes, opening the file in write mode! However, you are going to expose
    yourself to this catastrophe. Suppose you read the contents of the
    file into a variable, then open the file for writing, which then erases
    the file, but immediately thereafter your program crashes or the power
    goes out in your city. What are you left with? You will be left with
    an empty file, and the variable that contained the contents of the file
    will have evaporated into the ether. In other words, you will lose all
    your data!

    So the idiom for rewriting a file is:

    1) Open the file for *reading*.
    2) Open another file for writing with a name like origName-edited.txt
    3) Read the original file line by line(saves memory, but is slower)
    4) Write each altered line to the file origName-edited.txt
    5) Delete the original file.
    6) Change the name of the new file (origName-edited.txt) to origName.txt









    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Apr 30, 2009
    #4
  5. Adam Bender

    Adam Bender Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Apr 30, 2009 at 1:26 AM, 7stud -- <> wrote:

    > Yes, opening the file in write mode! However, you are going to expose
    > yourself to this catastrophe. Suppose you read the contents of the
    > file into a variable, then open the file for writing, which then erases
    > the file, but immediately thereafter your program crashes or the power
    > goes out in your city.
    >
    > So the idiom for rewriting a file is:
    >
    > 1) Open the file for *reading*.
    > 2) Open another file for writing with a name like origName-edited.txt
    > 3) Read the original file line by line(saves memory, but is slower)
    > 4) Write each altered line to the file origName-edited.txt
    > 5) Delete the original file.
    > 6) Change the name of the new file (origName-edited.txt) to origName.txt



    I see the potential for catastrophe with the way I suggested, however, there
    is potential for catastrophe here if there is already an
    "origName-edited.txt" file (I know, slim chance, but you never know). You
    could get around this by generating new file names until you found one that
    didn't exist, or writing to /tmp, of course. I think I'll switch to Greg
    Brown's suggestion. Does Tempfile guarantee that it won't overwrite an
    existing file?

    Adam
     
    Adam Bender, Apr 30, 2009
    #5
  6. Adam Bender

    7stud -- Guest

    Adam Bender wrote:
    > On Thu, Apr 30, 2009 at 1:26 AM, 7stud -- <>
    > wrote:
    >
    >> 3) Read the original file line by line(saves memory, but is slower)
    >> 4) Write each altered line to the file origName-edited.txt
    >> 5) Delete the original file.
    >> 6) Change the name of the new file (origName-edited.txt) to origName.txt

    >
    >
    > I see the potential for catastrophe with the way I suggested, however,
    > there
    > is potential for catastrophe here if there is already an
    > "origName-edited.txt" file (I know, slim chance, but you never know).
    >


    Of course, if that was a possibility then you would take extra measures
    like create a new file name with rand, and then check it with
    File.exists?, which is probably what Tempfile does.

    > Does Tempfile guarantee that it won't overwrite an

    existing file?

    What the standard library docs aren't clear enough for you:

    ----
    tempfile - manipulates temporary files
    ----

    ???!! lol. pathetic. But once in a great while you can actually find
    some information on a standard library module using google:

    http://www.rubytips.org/2008/01/11/using-temporary-files-in-ruby-tempfilenew/

    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Apr 30, 2009
    #6
  7. 2009/4/30 Gregory Brown <>:
    > On Thu, Apr 30, 2009 at 12:27 AM, Adam Bender <> wrote:
    >> I would like to write a Ruby script that opens a text file, performs a g=

    sub
    >> on each line, and then overwrites the file with the updated contents. =

    =A0Right
    >> now I open the file twice: once to read and once to write. =A0The reason=

    for
    >> this is that if I try to perform both operations on the same IO object b=

    y
    >> calling io.rewind and writing from the beginning, and the substituted wo=

    rd
    >> is shorter than what it is replacing, some portion of the end of the
    >> original file remains. =A0Is there an idiom for "clearing" the contents =

    of the
    >> file before writing?

    >
    > Yes, but typically this is done by creating a new file and then
    > renaming it to replace the old one.
    >
    > Here's an example from my upcoming book "Ruby Best Practices"[0]. =A0It
    > naively strips comments from source files.


    A variant exploiting Ruby's command line parameters:

    11:05:08 Temp$ ruby -e '10.times {|i| puts i}' >| x
    11:05:22 Temp$ cat x
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    11:05:23 Temp$ ./x.rb x
    11:05:29 Temp$ cat x
    <<<0>>>
    <<<1>>>
    <<<2>>>
    <<<3>>>
    <<<4>>>
    <<<5>>>
    <<<6>>>
    <<<7>>>
    <<<8>>>
    <<<9>>>
    11:05:34 Temp$ cat x.bak
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    11:05:37 Temp$ cat x.rb
    #!/opt/bin/ruby19 -pi.bak

    $_.sub! /^/, '<<<'
    $_.sub! /$/, '>>>'
    11:05:40 Temp$

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Apr 30, 2009
    #7
  8. On Thu, Apr 30, 2009 at 3:11 AM, Adam Bender <> wrote:

    > I see the potential for catastrophe with the way I suggested, however, th=

    ere
    > is potential for catastrophe here if there is already an
    > "origName-edited.txt" file (I know, slim chance, but you never know). =A0=

    You
    > could get around this by generating new file names until you found one th=

    at
    > didn't exist, or writing to /tmp, of course. =A0I think I'll switch to Gr=

    eg
    > Brown's suggestion. =A0Does Tempfile guarantee that it won't overwrite an
    > existing file?


    Yes, Tempfile avoids file collisions.

    -greg
     
    Gregory Brown, Apr 30, 2009
    #8
  9. Adam Bender

    James Dinkel Guest

    Gregory Brown wrote:
    >
    > Here's an example from my upcoming book "Ruby Best Practices"[0]. It
    > naively strips comments from source files.
    > You can modify it to fit your needs.
    >
    > ----------
    >
    > require "tempfile"
    > require "fileutils"
    >
    > temp = Tempfile.new("without_comments")
    > File.foreach(ARGV[0]) do |line|
    > temp << line unless line =~ /^\s*#/
    > end
    > temp.close
    >
    > FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") # comment out if you don't want
    > backups.
    > FileUtils.mv(temp.path,ARGV[0])
    >
    > ----------
    >
    > -greg


    What about if another program opens the file after it has been read by
    the Ruby program but before the Ruby program has copied the temp file?
    If the second program makes a change and saves it, those changes will be
    lost when the Ruby program copies the temp file over it. Or if the
    second program still has it open and the Ruby program finishes, then
    when the second program saves it's open file, it will overwrite the Ruby
    program's changes.
    --
    Posted via http://www.ruby-forum.com/.
     
    James Dinkel, Apr 30, 2009
    #9
  10. On Thu, Apr 30, 2009 at 10:41 AM, James Dinkel <> wrote:

    > What about if another program opens the file after it has been read by
    > the Ruby program but before the Ruby program has copied the temp file?
    > If the second program makes a change and saves it, those changes will be
    > lost when the Ruby program copies the temp file over it. =A0Or if the
    > second program still has it open and the Ruby program finishes, then
    > when the second program saves it's open file, it will overwrite the Ruby
    > program's changes.


    Is this related to the OP's concerns? In this case, you'd need file
    locking (see the Ruby API).
    But I didn't see any mention of these sorts of issues in Adam's original po=
    st.

    If you need this feature, read the API docs for File#flock

    -greg
     
    Gregory Brown, Apr 30, 2009
    #10
  11. Adam Bender

    timr Guest

    On Apr 29, 9:27 pm, Adam Bender <> wrote:
    > [Note:  parts of this message were removed to make it a legal post.]
    >
    > I would like to write a Ruby script that opens a text file, performs a gsub
    > on each line, and then overwrites the file with the updated contents.  Right
    > now I open the file twice: once to read and once to write.  The reason for
    > this is that if I try to perform both operations on the same IO object by
    > calling io.rewind and writing from the beginning, and the substituted word
    > is shorter than what it is replacing, some portion of the end of the
    > original file remains.  Is there an idiom for "clearing" the contents of the
    > file before writing?
    >
    > Thanks,
    >
    > Adam
     
    timr, May 1, 2009
    #11
  12. Adam Bender

    timr Guest

    On Apr 29, 9:27 pm, Adam Bender <> wrote:
    > [Note:  parts of this message were removed to make it a legal post.]
    >
    > I would like to write a Ruby script that opens a text file, performs a gsub
    > on each line, and then overwrites the file with the updated contents.  Right
    > now I open the file twice: once to read and once to write.  The reason for
    > this is that if I try to perform both operations on the same IO object by
    > calling io.rewind and writing from the beginning, and the substituted word
    > is shorter than what it is replacing, some portion of the end of the
    > original file remains.  Is there an idiom for "clearing" the contents of the
    > file before writing?
    >
    > Thanks,
    >
    > Adam


    I think File objects have a truncate method you can use after reading
    (f.truncate(0) #the file is now blank) to delete the contents.

    file.truncate(integer) => 0

    Truncates file to at most integer bytes. The file must be opened for
    writing. Not available on all platforms.

    f = File.new("out", "w")
    f.syswrite("1234567890") #=> 10
    f.truncate(5) #=> 0
    f.close() #=> nil
    File.size("out") #=> 5
     
    timr, May 1, 2009
    #12
  13. On Thu, Apr 30, 2009 at 1:24 PM, 7stud -- <> wrote:
    > What the standard library docs aren't clear enough for you:
    >
    > ----
    > tempfile - manipulates temporary files
    > ----
    >
    > ???!! =A0lol. pathetic.


    Don't just scoff, send in a patch!

    martin
     
    Martin DeMello, May 1, 2009
    #13
  14. Adam Bender

    James Dinkel Guest

    timr wrote:
    > On Apr 29, 9:27�pm, Adam Bender <> wrote:
    >
    > Truncates file to at most integer bytes. The file must be opened for
    > writing. Not available on all platforms.
    >
    > f = File.new("out", "w")
    > f.syswrite("1234567890") #=> 10
    > f.truncate(5) #=> 0
    > f.close() #=> nil
    > File.size("out") #=> 5


    That's pretty close to how I've been modifying files:

    File.open(filename, 'r+') do |file|
    lines = file.readlines

    # modify data in the lines array

    file.pos = 0
    file.print lines # will not put \$ between array elements
    file.truncate(file.pos)
    end

    This opens a file for reading and writing, reads the file into an array,
    then you modify the array how you need to. Then the block returns to
    the beginning of the file, writes out your changes over the existing
    file, then chops off what's left. Then, of course, the file is closed
    when the block exits.

    To get a little more complicated, I wrapped this in a class method:

    class File

    def self.change!(filename, create = false)
    ### method to make it easy to open a file and make changes to it.
    ### usage example
    ## File.change(myfile) do |contents|
    ## contents.gsub!(/this/, "that")
    ## end
    ### I can also use "throw:)nochanges)" anywhere in the block
    ### to prevent the file from being written.
    ### Make sure 'contents' does not get pointed to a new object;
    ### for example, an assignemt "contents = 'new data'" will break the
    method

    # if create is true, create the file if it does not exist
    if create == true
    File.open(filename, 'w') { |blank| blank.write '' } unless
    File.exist?(filename)
    end

    # read the file, execute a block, then write the file
    if File.exist?(filename)
    File.open(filename, 'r+') do |file|
    lines = file.readlines
    # do not write the file if it did not change (block must "throw
    :nochanges")
    catch:)nochanges) do
    yield lines
    file.pos = 0
    file.print lines # will not put \$ between array elements
    file.truncate(file.pos)
    end
    end
    end
    end
    end
    --
    Posted via http://www.ruby-forum.com/.
     
    James Dinkel, May 1, 2009
    #14
  15. Adam Bender

    James Dinkel Guest

    Aw, word wrapping broke some of my lines...

    these should be single lines:

    > ### for example, an assignemt "contents = 'new data'" will break the method



    > File.open(filename, 'w') { |blank| blank.write '' } unless File.exist?(filename)



    > # do not write the file if it did not change (block must "throw :nochanges")



    --
    Posted via http://www.ruby-forum.com/.
     
    James Dinkel, May 1, 2009
    #15
  16. Adam Bender

    Adam Bender Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Fri, May 1, 2009 at 4:33 PM, James Dinkel <> wrote:

    > That's pretty close to how I've been modifying files:
    >
    > File.open(filename, 'r+') do |file|
    > lines = file.readlines
    >
    > # modify data in the lines array
    >
    > file.pos = 0
    > file.print lines # will not put \$ between array elements
    > file.truncate(file.pos)
    > end
    >
    > This opens a file for reading and writing, reads the file into an array,
    > then you modify the array how you need to. Then the block returns to
    > the beginning of the file, writes out your changes over the existing
    > file, then chops off what's left. Then, of course, the file is closed
    > when the block exits.
    >


    This is exactly what I wanted. Thanks everyone.

    Adam
     
    Adam Bender, May 2, 2009
    #16
  17. Adam Bender

    7stud -- Guest

    Adam Bender wrote:
    > On Fri, May 1, 2009 at 4:33 PM, James Dinkel <> wrote:
    >
    >> end
    >>
    >> This opens a file for reading and writing, reads the file into an array,
    >> then you modify the array how you need to. Then the block returns to
    >> the beginning of the file, writes out your changes over the existing
    >> file, then chops off what's left. Then, of course, the file is closed
    >> when the block exits.
    >>

    >
    > This is exactly what I wanted. Thanks everyone.
    >
    > Adam


    I think you guys are missing the point. There are lots of ways to
    rewrite a file that 'work'. For instance, simply reading a file into an
    array, closing the file, then opening the file for writing(which erases
    the file), and then writing the altered lines back to the file 'works'.
    You can test it yourself and see that it works. You can rewrite the
    file like that 1,000 times and it will 'work'.

    However, if your data is important you need to ask yourself the
    question: what happens if my program crashes while I am writing the data
    back out to the file?

    So let's ask that question about the solution you've decided to adopt.
    Suppose your program is at the point where it has written half of the
    altered data back to the file, and the file contains half altered data
    and half original data. Then your program crashes. What are you left
    with?

    The "rewriting a file" issue has been hashed out by many programmers for
    decades. You can either try to come up with your own screwy method, or
    you can adopt an accepted idiom. Within the accepted idiom, modules
    like Tempfile were created to deal with the problem of overwriting an
    existing file name, and it provides a shortcut.

    >truncate: Not available on all platforms.


    That should also be a red flag. Generally, you should strive to write
    cross platform programs.






    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, May 2, 2009
    #17
  18. Adam Bender

    7stud -- Guest

    7stud -- wrote:
    > Adam Bender wrote:
    >> On Fri, May 1, 2009 at 4:33 PM, James Dinkel <> wrote:
    >>
    >>> end
    >>>
    >>> This opens a file for reading and writing, reads the file into an array,
    >>> then you modify the array how you need to. Then the block returns to
    >>> the beginning of the file, writes out your changes over the existing
    >>> file, then chops off what's left. Then, of course, the file is closed
    >>> when the block exits.
    >>>

    >>
    >> This is exactly what I wanted. Thanks everyone.
    >>
    >> Adam

    >
    > I think you guys are missing the point. There are lots of ways to
    > rewrite a file that 'work'. For instance, simply reading a file into an
    > array, closing the file, then opening the file for writing(which erases
    > the file), and then writing the altered lines back to the file 'works'.
    > You can test it yourself and see that it works. You can rewrite the
    > file like that 1,000 times and it will 'work'.
    >
    > However, if your data is important you need to ask yourself the
    > question: what happens if my program crashes while I am writing the data
    > back out to the file?
    >
    > So let's ask that question about the solution you've decided to adopt.
    > Suppose your program is at the point where it has written half of the
    > altered data back to the file, and the file contains half altered data
    > and half original data. Then your program crashes. What are you left
    > with?


    Hmmm...I guess you are left with a file that you can examine and then
    locate the spot where you should begin gsub'ing again.

    You may however run into a problem if the program crashes in the middle
    of a line. Input and output to/from files gets buffered to make things
    more efficient because repeatedly accessing files is relatively slow.
    For instance, when you tell your program to write a line to a file, it
    doesn't actually do that. Instead, programs store the line in a buffer.
    Then when the buffer fills up, the contents of the buffer get written to
    the file in one big chunk. That cuts down on the number of file
    accesses. The same thing happens when you read from a file. You may
    tell your program to read one line from a file, but your program will
    ignore you. Instead, your program will read a chunk of the file and
    store it in a buffer. Then If you request more lines from the file,
    your program will retrieve them from the buffer. That cuts down on the
    number of times your program has to access the file.

    As a result, I think even though you may write one line at a time to the
    file, it's possible the buffer may get written to the file where the
    last thing in the buffer is half a line. Then if your program crashes
    you are going to have a corrupted line in your file.










    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, May 2, 2009
    #18
  19. On Sat, May 2, 2009 at 4:05 AM, 7stud -- <> wrote:

    > So let's ask that question about the solution you've decided to adopt.
    > Suppose your program is at the point where it has written half of the
    > altered data back to the file, and the file contains half altered data
    > and half original data. =A0Then your program crashes. =A0What are you lef=

    t
    > with?
    >
    > The "rewriting a file" issue has been hashed out by many programmers for
    > decades. =A0You can either try to come up with your own screwy method, or
    > you can adopt an accepted idiom. =A0Within the accepted idiom, modules
    > like Tempfile were created to deal with the problem of overwriting an
    > existing file name, and it provides a shortcut.


    Thanks for going into the detail here. I threw the atomic save
    solution out there, but didn't give much background, and this
    establishes a much better motivation for it.

    -greg
     
    Gregory Brown, May 2, 2009
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Lozzi

    Open and read/write ASPX file

    David Lozzi, Nov 25, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    3,699
    Ken Cox
    Nov 26, 2005
  2. Pallav singh
    Replies:
    6
    Views:
    1,919
    James Kanze
    Jun 14, 2009
  3. Asif Iqbal
    Replies:
    0
    Views:
    159
    Asif Iqbal
    Aug 6, 2009
  4. mike
    Replies:
    3
    Views:
    158
  5. Iulian Ilea
    Replies:
    1
    Views:
    317
    pcx99
    Dec 21, 2006
Loading...

Share This Page