Sorting a logfile, how would you write it?

Discussion in 'Ruby' started by Frank Meyer, Aug 10, 2007.

  1. Frank Meyer

    Frank Meyer Guest

    I've written a little ruby program which can sort logfiles with the
    following format:

    4.text text text
    1.text text text
    2.text text text
    10.text text text
    2.text2 text2 text2

    The file is given as a command line parameter and after sorting the
    entries it writes them back into this file.

    The program is in the attachement.


    What I want to know is how would you write such a tool in ruby? I'm
    asking this because I'm still learning ruby and I want to learn how to
    do it in ruby (ans its design principles).


    Thank you!



    Turing

    Attachments:
    http://www.ruby-forum.com/attachment/86/test.rb

    --
    Posted via http://www.ruby-forum.com/.
    Frank Meyer, Aug 10, 2007
    #1
    1. Advertising

  2. On Aug 10, 1:29 pm, Frank Meyer <> wrote:
    > I've written a little ruby program which can sort logfiles with the
    > following format:
    >
    > 4.text text text
    > 1.text text text
    > 2.text text text
    > 10.text text text
    > 2.text2 text2 text2
    >
    > The file is given as a command line parameter and after sorting the
    > entries it writes them back into this file.
    >
    > The program is in the attachement.
    >
    > What I want to know is how would you write such a tool in ruby? I'm
    > asking this because I'm still learning ruby and I want to learn how to
    > do it in ruby (ans its design principles).
    >
    > Thank you!
    >
    > Turing
    >
    > Attachments:http://www.ruby-forum.com/attachment/86/test.rb



    File.open( ARGV.first, "r+" ){|file|
    array = file.readlines
    file.rewind
    file.truncate(0)
    file.puts array.sort_by{|s| s[/^\d+/].to_i }
    }
    William James, Aug 10, 2007
    #2
    1. Advertising

  3. Frank Meyer

    Ryan Davis Guest

    On Aug 10, 2007, at 13:54 , William James wrote:

    > On Aug 10, 1:29 pm, Frank Meyer <> wrote:
    >> I've written a little ruby program which can sort logfiles with the
    >> following format:
    >>
    >> 4.text text text
    >> 1.text text text
    >> 2.text text text
    >> 10.text text text
    >> 2.text2 text2 text2

    > ...
    > File.open( ARGV.first, "r+" ){|file|
    > array = file.readlines
    > file.rewind
    > file.truncate(0)
    > file.puts array.sort_by{|s| s[/^\d+/].to_i }
    > }


    your version takes a lot of memory, is slow, and doesn't properly
    sort the content of the line, just the number. swap the two "2."
    lines and you'll see what I mean. Using the right tool for the job
    (`sort`) does wonders:

    % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times
    { m = rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
    % cp blah.txt blah2.txt
    % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
    file.readlines; file.rewind; file.truncate(0); file.puts array.sort_by
    {|s| s[/^\d+/].to_i } }' blah.txt
    real 0m8.182s ...
    % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" > "#
    {path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
    real 0m3.175s ...
    % cmp blah.txt blah2.txt
    blah.txt blah2.txt differ: char 50, line 3
    % head blah.txt blah2.txt
    ==> blah.txt <==
    3. file4 file4 file4
    4. file4 file4 file4
    6. file3 file3 file3
    6. file1 file1 file1
    6. file0 file0 file0
    7. file0 file0 file0
    7. file4 file4 file4
    8. file1 file1 file1
    8. file3 file3 file3
    8. file3 file3 file3

    ==> blah2.txt <==
    3. file4 file4 file4
    4. file4 file4 file4
    6. file0 file0 file0
    6. file1 file1 file1
    6. file3 file3 file3
    7. file0 file0 file0
    7. file4 file4 file4
    8. file1 file1 file1
    8. file3 file3 file3
    8. file3 file3 file3
    532 %
    Ryan Davis, Aug 11, 2007
    #3
  4. On 11.08.2007 06:19, Ryan Davis wrote:
    >
    > On Aug 10, 2007, at 13:54 , William James wrote:
    >
    >> On Aug 10, 1:29 pm, Frank Meyer <> wrote:
    >>> I've written a little ruby program which can sort logfiles with the
    >>> following format:
    >>>
    >>> 4.text text text
    >>> 1.text text text
    >>> 2.text text text
    >>> 10.text text text
    >>> 2.text2 text2 text2

    >> ...
    >> File.open( ARGV.first, "r+" ){|file|
    >> array = file.readlines
    >> file.rewind
    >> file.truncate(0)
    >> file.puts array.sort_by{|s| s[/^\d+/].to_i }
    >> }

    >
    > your version takes a lot of memory, is slow, and doesn't properly sort
    > the content of the line, just the number. swap the two "2." lines and
    > you'll see what I mean. Using the right tool for the job (`sort`) does
    > wonders:
    >
    > % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
    > rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
    > % cp blah.txt blah2.txt
    > % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
    > file.readlines; file.rewind; file.truncate(0); file.puts
    > array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
    > real 0m8.182s ...
    > % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
    > "#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
    > real 0m3.175s ...
    > % cmp blah.txt blah2.txt
    > blah.txt blah2.txt differ: char 50, line 3
    > % head blah.txt blah2.txt
    > ==> blah.txt <==
    > 3. file4 file4 file4
    > 4. file4 file4 file4
    > 6. file3 file3 file3
    > 6. file1 file1 file1
    > 6. file0 file0 file0
    > 7. file0 file0 file0
    > 7. file4 file4 file4
    > 8. file1 file1 file1
    > 8. file3 file3 file3
    > 8. file3 file3 file3
    >
    > ==> blah2.txt <==
    > 3. file4 file4 file4
    > 4. file4 file4 file4
    > 6. file0 file0 file0
    > 6. file1 file1 file1
    > 6. file3 file3 file3
    > 7. file0 file0 file0
    > 7. file4 file4 file4
    > 8. file1 file1 file1
    > 8. file3 file3 file3
    > 8. file3 file3 file3
    > 532 %


    It's a one liner:

    ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file

    Less memory usage:

    ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
    b[/^\d+/].to_i}' file

    Kind regards

    robert
    Robert Klemme, Aug 11, 2007
    #4
  5. On Aug 10, 11:19 pm, Ryan Davis <> wrote:
    > On Aug 10, 2007, at 13:54 , William James wrote:
    >
    >
    >
    > > On Aug 10, 1:29 pm, Frank Meyer <> wrote:
    > >> I've written a little ruby program which can sort logfiles with the
    > >> following format:

    >
    > >> 4.text text text
    > >> 1.text text text
    > >> 2.text text text
    > >> 10.text text text
    > >> 2.text2 text2 text2

    > > ...
    > > File.open( ARGV.first, "r+" ){|file|
    > > array = file.readlines
    > > file.rewind
    > > file.truncate(0)
    > > file.puts array.sort_by{|s| s[/^\d+/].to_i }
    > > }

    >
    > your version takes a lot of memory,


    Wrong.

    When the number of lines to sort is small,
    it uses a small amount of memory.
    When the number of lines to sort is medium,
    it uses a medium amount of memory.
    When the number of lines to sort is large,
    it uses a large amount of memory.

    > is slow,


    Everything is relative. If its speed is compared to the
    speed of other versions written in scripting languages, it
    is not slow.

    > and doesn't properly
    > sort the content of the line,


    Wrong.

    Looking at the source code of the original poster immediately
    reveals that he wants to sort only on the number at the
    beginning of the line.


    > just the number. swap the two "2."
    > lines and you'll see what I mean. Using the right tool for the job
    > (`sort`) does wonders:
    >
    > % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times
    > { m = rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
    > % cp blah.txt blah2.txt
    > % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
    > file.readlines; file.rewind; file.truncate(0); file.puts array.sort_by
    > {|s| s[/^\d+/].to_i } }' blah.txt
    > real 0m8.182s ...
    > % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" > "#
    > {path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt


    Wrong.

    The original poster stated:

    > The file is given as a command line parameter and after sorting the
    > entries it writes them back into this file.


    Your code makes no attempt to write to the original file; it uses
    a temporary file.

    Furthermore, your solution won't even run:

    E:\Ruby>ruby -e 'path = ARGV.shift; system %(sort -n "#{path}"
    > "#{path}.tmp"); File.rename "#{path}.tmp", path' data

    -e:1: unterminated string meets end of file

    If your code is put in a file ...

    E:\Ruby>ruby try.rb data
    Input file specified two times.

    .... it still won't work.

    Perhaps your attempt at a solution requires Unix, and you,
    in your ignorance, or your thoughtlessness, or your
    ignorance and your thoughtlessness, assumed that every
    user of Ruby is a user of Unix.
    William James, Aug 11, 2007
    #5
  6. On Aug 11, 7:15 am, Robert Klemme <> wrote:
    > On 11.08.2007 06:19, Ryan Davis wrote:
    >
    >
    >
    >
    >
    > > On Aug 10, 2007, at 13:54 , William James wrote:

    >
    > >> On Aug 10, 1:29 pm, Frank Meyer <> wrote:
    > >>> I've written a little ruby program which can sort logfiles with the
    > >>> following format:

    >
    > >>> 4.text text text
    > >>> 1.text text text
    > >>> 2.text text text
    > >>> 10.text text text
    > >>> 2.text2 text2 text2
    > >> ...
    > >> File.open( ARGV.first, "r+" ){|file|
    > >> array = file.readlines
    > >> file.rewind
    > >> file.truncate(0)
    > >> file.puts array.sort_by{|s| s[/^\d+/].to_i }
    > >> }

    >
    > > your version takes a lot of memory, is slow, and doesn't properly sort
    > > the content of the line, just the number. swap the two "2." lines and
    > > you'll see what I mean. Using the right tool for the job (`sort`) does
    > > wonders:

    >
    > > % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
    > > rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
    > > % cp blah.txt blah2.txt
    > > % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
    > > file.readlines; file.rewind; file.truncate(0); file.puts
    > > array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
    > > real 0m8.182s ...
    > > % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
    > > "#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
    > > real 0m3.175s ...
    > > % cmp blah.txt blah2.txt
    > > blah.txt blah2.txt differ: char 50, line 3
    > > % head blah.txt blah2.txt
    > > ==> blah.txt <==
    > > 3. file4 file4 file4
    > > 4. file4 file4 file4
    > > 6. file3 file3 file3
    > > 6. file1 file1 file1
    > > 6. file0 file0 file0
    > > 7. file0 file0 file0
    > > 7. file4 file4 file4
    > > 8. file1 file1 file1
    > > 8. file3 file3 file3
    > > 8. file3 file3 file3

    >
    > > ==> blah2.txt <==
    > > 3. file4 file4 file4
    > > 4. file4 file4 file4
    > > 6. file0 file0 file0
    > > 6. file1 file1 file1
    > > 6. file3 file3 file3
    > > 7. file0 file0 file0
    > > 7. file4 file4 file4
    > > 8. file1 file1 file1
    > > 8. file3 file3 file3
    > > 8. file3 file3 file3
    > > 532 %

    >
    > It's a one liner:
    >
    > ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file


    It's my understanding that when you use -i, a temporary file
    is created, the original file is deleted, and the temporary
    file is renamed. Doesn't this cause unnecessary disk
    fragmentation?

    >
    > Less memory usage:
    >
    > ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
    > b[/^\d+/].to_i}' file


    Of course, you're trading speed for memory.

    >
    > Kind regards
    >
    > robert
    William James, Aug 11, 2007
    #6
  7. Frank Meyer

    Eric Hodel Guest

    On Aug 11, 2007, at 15:25, William James wrote:
    > On Aug 10, 11:19 pm, Ryan Davis <> wrote:
    >> On Aug 10, 2007, at 13:54 , William James wrote:
    >>> File.open( ARGV.first, "r+" ){|file|
    >>> array = file.readlines
    >>> file.rewind
    >>> file.truncate(0)
    >>> file.puts array.sort_by{|s| s[/^\d+/].to_i }
    >>> }

    >>
    >> your version takes a lot of memory,

    >
    > Wrong.


    This method uses at least 2x the file size worth of memory. That's a
    lot.

    > E:\Ruby>ruby -e 'path = ARGV.shift; system %(sort -n "#{path}"
    >> "#{path}.tmp"); File.rename "#{path}.tmp", path' data

    > -e:1: unterminated string meets end of file


    I just ran it, it worked fine.

    You'll probably have to redo the quoting for a non-bourne-compatible
    shell.

    > Perhaps your attempt at a solution requires Unix, and you,
    > in your ignorance, or your thoughtlessness, or your
    > ignorance and your thoughtlessness, assumed that every
    > user of Ruby is a user of Unix.


    Please try to flame harder. This one just made me chuckle.

    --
    Poor workers blame their tools. Good workers build better tools. The
    best workers get their tools to do the work for them. -- Syndicate Wars
    Eric Hodel, Aug 12, 2007
    #7
  8. Frank Meyer

    Eric Hodel Guest

    On Aug 11, 2007, at 15:30, William James wrote:
    > On Aug 11, 7:15 am, Robert Klemme <> wrote:
    >> It's a one liner:
    >>
    >> ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}'
    >> file

    >
    > It's my understanding that when you use -i, a temporary file
    > is created, the original file is deleted, and the temporary
    > file is renamed. Doesn't this cause unnecessary disk
    > fragmentation?


    If I had a filesystem where I had to worry about fragmentation I
    wouldn't care. The amount of time spent figuring out some best way
    to "fix" it is going to be less than the time running a defragmenter
    will take.

    --
    Poor workers blame their tools. Good workers build better tools. The
    best workers get their tools to do the work for them. -- Syndicate Wars
    Eric Hodel, Aug 12, 2007
    #8
  9. On 8/11/07, Eric Hodel <> wrote:

    > > Perhaps your attempt at a solution requires Unix, and you,
    > > in your ignorance, or your thoughtlessness, or your
    > > ignorance and your thoughtlessness, assumed that every
    > > user of Ruby is a user of Unix.

    >
    > Please try to flame harder. This one just made me chuckle.


    Me too. Besides, `sort` is still the right tool :)

    http://www.mingw.org/msys.shtml
    Gregory Brown, Aug 12, 2007
    #9
  10. On Aug 11, 6:16 pm, Andrew Savige <> wrote:
    > --- William James <> wrote:
    >
    > > It's my understanding that when you use -i, a temporary file
    > > is created, the original file is deleted, and the temporary
    > > file is renamed. Doesn't this cause unnecessary disk
    > > fragmentation?

    >
    > To do this safely you'll need a temporary file.
    > Slurping a file into memory, sorting it, then writing it back to the same
    > file is an unsound practice, i.e. not "rerunnable-safe". Suppose, for
    > example, you suffer a power failure half-way through writing back the file,
    > or the write fails due to "disk full" or "user disk quota exceeded" or for
    > any other reason. Oops, you've just corrupted your input file.


    Of course. But I'm willing to take that miniscule chance when
    I'm doing a write to a small file that takes a fraction of a
    second.

    The question remains: doesn't using a temp file cause more
    disk fragmentation than writing directly to the original file?
    William James, Aug 12, 2007
    #10
  11. On 12.08.2007 00:27, William James wrote:
    > On Aug 11, 7:15 am, Robert Klemme <> wrote:
    >> On 11.08.2007 06:19, Ryan Davis wrote:
    >>
    >>
    >>
    >>
    >>
    >>> On Aug 10, 2007, at 13:54 , William James wrote:
    >>>> On Aug 10, 1:29 pm, Frank Meyer <> wrote:
    >>>>> I've written a little ruby program which can sort logfiles with the
    >>>>> following format:
    >>>>> 4.text text text
    >>>>> 1.text text text
    >>>>> 2.text text text
    >>>>> 10.text text text
    >>>>> 2.text2 text2 text2
    >>>> ...
    >>>> File.open( ARGV.first, "r+" ){|file|
    >>>> array = file.readlines
    >>>> file.rewind
    >>>> file.truncate(0)
    >>>> file.puts array.sort_by{|s| s[/^\d+/].to_i }
    >>>> }
    >>> your version takes a lot of memory, is slow, and doesn't properly sort
    >>> the content of the line, just the number. swap the two "2." lines and
    >>> you'll see what I mean. Using the right tool for the job (`sort`) does
    >>> wonders:
    >>> % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
    >>> rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
    >>> % cp blah.txt blah2.txt
    >>> % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
    >>> file.readlines; file.rewind; file.truncate(0); file.puts
    >>> array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
    >>> real 0m8.182s ...
    >>> % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
    >>> "#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
    >>> real 0m3.175s ...
    >>> % cmp blah.txt blah2.txt
    >>> blah.txt blah2.txt differ: char 50, line 3
    >>> % head blah.txt blah2.txt
    >>> ==> blah.txt <==
    >>> 3. file4 file4 file4
    >>> 4. file4 file4 file4
    >>> 6. file3 file3 file3
    >>> 6. file1 file1 file1
    >>> 6. file0 file0 file0
    >>> 7. file0 file0 file0
    >>> 7. file4 file4 file4
    >>> 8. file1 file1 file1
    >>> 8. file3 file3 file3
    >>> 8. file3 file3 file3
    >>> ==> blah2.txt <==
    >>> 3. file4 file4 file4
    >>> 4. file4 file4 file4
    >>> 6. file0 file0 file0
    >>> 6. file1 file1 file1
    >>> 6. file3 file3 file3
    >>> 7. file0 file0 file0
    >>> 7. file4 file4 file4
    >>> 8. file1 file1 file1
    >>> 8. file3 file3 file3
    >>> 8. file3 file3 file3
    >>> 532 %

    >> It's a one liner:
    >>
    >> ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file

    >
    > It's my understanding that when you use -i, a temporary file
    > is created, the original file is deleted, and the temporary
    > file is renamed.


    Correct.

    > Doesn't this cause unnecessary disk
    > fragmentation?


    Huh? Are you still on MS DOS? I haven't heard someone worry about disk
    fragmentation in ages. I don't think that this is an issue for any
    modern file system.

    >> Less memory usage:
    >>
    >> ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
    >> b[/^\d+/].to_i}' file

    >
    > Of course, you're trading speed for memory.


    Where exactly do you see that trade off? I was trading elegance for
    memory. Sure there are effects, that could make one or the other
    solution faster but if I would be really worrying about speed then I'd
    use "sort" anyway.

    Kind regards

    robert
    Robert Klemme, Aug 12, 2007
    #11
  12. Frank Meyer

    Frank Meyer Guest

    Thanks for all your suggestions, it helped me a lot to learn more about
    Ruby's library. I didn't know that there are so many handy functions :)


    And about the temporary file, I'm using it only for private purposes and
    I didn't want to bother with creating a temporary file in my first
    attempt to write a ruby program which can sort these log files.


    Thank you all!



    Turing
    --
    Posted via http://www.ruby-forum.com/.
    Frank Meyer, Aug 12, 2007
    #12
  13. On 8/12/07, Andrew Savige <> wrote:
    > --- William James <> wrote:
    > > Of course. But I'm willing to take that miniscule chance when
    > > I'm doing a write to a small file that takes a fraction of a
    > > second.

    >
    > That may be an acceptable risk for a program written for private use.
    > Not so for a production program. After all, impatient users often
    > press [CTRL-C] in my experience, and that could cause corruption
    > if it occurred while the file was being rewritten.


    You can of course capture that, but you're write that it's creating
    additional unnecessary work.
    Gregory Brown, Aug 12, 2007
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SklNLkgu?=

    Q: how would you write this

    =?Utf-8?B?SklNLkgu?=, Feb 17, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    295
    William F. Robertson, Jr.
    Feb 17, 2005
  2. Michael B Allen

    Error handling - How would you write it?

    Michael B Allen, Jul 9, 2005, in forum: C Programming
    Replies:
    13
    Views:
    500
    Christopher Benson-Manica
    Jul 11, 2005
  3. darrel
    Replies:
    2
    Views:
    292
    darrel
    Mar 21, 2007
  4. Jeff Rush
    Replies:
    4
    Views:
    298
    Michele Simionato
    Apr 26, 2007
  5. aleksa

    How would you write this?

    aleksa, Apr 24, 2010, in forum: C Programming
    Replies:
    13
    Views:
    421
    Tim Rentsch
    Apr 29, 2010
Loading...

Share This Page