get first and last line from txt file - how?

Discussion in 'Ruby' started by Mmcolli00 Mom, Dec 20, 2008.

  1. I have txt file with date/time stamps only. I want to grab the first
    date/time and the last date/time. For instance, I will be needing
    08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
    how I can pull these out? Thanks in advance.

    queued.txt
    8/09/08 3:00
    8/10/08 5:00
    8/23/08 22:00
    8/24/08 3:00

    firstDate = ""
    lastDate = ""

    File.open('queued.txt', 'r') do |f1|
    while line = f1.gets
    if f1.lineno == 1 then #<-this would only give me 8/09/08 3:00
    @@fistDate = f1
    end
    end
    --
    Posted via http://www.ruby-forum.com/.
    Mmcolli00 Mom, Dec 20, 2008
    #1
    1. Advertising

  2. Mmcolli00 Mom

    Tim Hunter Guest

    Mmcolli00 Mom wrote:
    > I have txt file with date/time stamps only. I want to grab the first
    > date/time and the last date/time. For instance, I will be needing
    > 08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
    > how I can pull these out? Thanks in advance.
    >
    > queued.txt
    > 8/09/08 3:00
    > 8/10/08 5:00
    > 8/23/08 22:00
    > 8/24/08 3:00
    >
    > firstDate = ""
    > lastDate = ""
    >
    > File.open('queued.txt', 'r') do |f1|
    > while line = f1.gets
    > if f1.lineno == 1 then #<-this would only give me 8/09/08 3:00
    > @@fistDate = f1
    > end
    > end


    lines = IO.readlines("queued.txt")
    first = lines.first
    last = lines.last

    puts first
    puts last


    --
    RMagick: http://rmagick.rubyforge.org/
    Tim Hunter, Dec 20, 2008
    #2
    1. Advertising

  3. [Note: parts of this message were removed to make it a legal post.]

    I'm just wondering..
    Let's say that we only need to read the last line. Can we do that without
    reading the other lines?

    Regards,
    Yaser Sulaiman
    Yaser Sulaiman, Dec 20, 2008
    #3
  4. Mmcolli00 Mom

    Ch Ba Guest

    Yaser Sulaiman wrote:
    > I'm just wondering..
    > Let's say that we only need to read the last line. Can we do that
    > without
    > reading the other lines?
    >
    > Regards,
    > Yaser Sulaiman



    It would work the same? Or do you mean without loading up the entire
    file?

    lines = IO.readlines("foo.bar")

    puts lines.last
    --
    Posted via http://www.ruby-forum.com/.
    Ch Ba, Dec 20, 2008
    #4
  5. [Note: parts of this message were removed to make it a legal post.]

    On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba <> wrote:
    >
    > It would work the same? Or do you mean without loading up the entire
    > file?


    Yep, that is exactly what I mean.
    Yaser Sulaiman, Dec 20, 2008
    #5
  6. Mmcolli00 Mom

    Tim Hunter Guest

    Yaser Sulaiman wrote:
    > On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba <> wrote:
    >> It would work the same? Or do you mean without loading up the entire
    >> file?

    >
    > Yep, that is exactly what I mean.
    >


    If you know where the last line starts (that is, the byte offset of the
    first character in the last line) then you could use IO#seek to seek to
    that offset and then read.

    How do you know where the last line starts? When you write the file,
    call IO#tell to get the current byte offset before you write the last line.

    --
    RMagick: http://rmagick.rubyforge.org/
    Tim Hunter, Dec 20, 2008
    #6
  7. 2008/12/20 Yaser Sulaiman <>:
    > I'm just wondering..
    > Let's say that we only need to read the last line. Can we do that without
    > reading the other lines?


    Yes. Position your file pointer to the last byte in a file, read and
    collect backwards each byte until you find a newline character (or the
    first byte of the file). This is the last line.

    -Thomas




    --
    Thomas Preymesser

    http://thopre.googlepages.com/
    http://thopre.wordpress.com/
    Thomas Preymesser, Dec 20, 2008
    #7
  8. On 20.12.2008 17:46, Thomas Preymesser wrote:
    > 2008/12/20 Yaser Sulaiman <>:
    >> I'm just wondering..
    >> Let's say that we only need to read the last line. Can we do that without
    >> reading the other lines?

    >
    > Yes. Position your file pointer to the last byte in a file, read and
    > collect backwards each byte until you find a newline character (or the
    > first byte of the file). This is the last line.


    You have to admit that this approach is rather inefficient. Here's a
    more efficient variant - especially for large files:

    $ cat r.rb
    #!/bin/env ruby

    OFFSET = 512 # > 2 * assumed avg line length

    file = ARGV.shift or abort "ERROR: need a file name"

    File.open file do |io|
    first = io.gets
    break unless first
    puts first

    limit = io.stat.size
    offset = OFFSET
    lines = []

    while lines.size < 2 && offset <= limit
    io.seek -offset, IO::SEEK_END
    lines = io.readlines
    offset += OFFSET
    end # while lines.size < 2

    puts lines.last unless lines.empty?
    end

    Cheers

    robert
    Robert Klemme, Dec 21, 2008
    #8
  9. 2008/12/21 Robert Klemme <>:
    > On 20.12.2008 17:46, Thomas Preymesser wrote:
    >>
    >> 2008/12/20 Yaser Sulaiman <>:
    >>>
    >>> I'm just wondering..
    >>> Let's say that we only need to read the last line. Can we do that without
    >>> reading the other lines?

    >>
    >> Yes. Position your file pointer to the last byte in a file, read and
    >> collect backwards each byte until you find a newline character (or the
    >> first byte of the file). This is the last line.

    >
    > You have to admit that this approach is rather inefficient.


    Really?

    I did a comparision of your code and my idea:

    $ time ruby r.rb input
    111111111111111111111111111111
    999999999999999999999999999999

    real 0m0.053s
    user 0m0.000s
    sys 0m0.004s

    $ time ruby t.rb input
    999999999999999999999999999999

    real 0m0.043s
    user 0m0.004s
    sys 0m0.000s

    the first result is your code, the second is mine.

    I did the tests with a test file with almost 8,000,000 lines.

    My q&d code:

    f=File.open("input")
    pos = 2
    f.seek(-pos, File::SEEK_END)
    c = f.getc
    result = ''
    while c.chr != "\n"
    result.insert(0,c.chr)
    pos += 1
    f.seek(-pos, File::SEEK_END)
    c = f.getc
    end
    f.close

    puts result

    -Thomas

    --
    Thomas Preymesser

    http://thopre.googlepages.com/
    http://thopre.wordpress.com/
    Thomas Preymesser, Dec 21, 2008
    #9
  10. * Yaser Sulaiman <> (2008-12-20) schrieb:

    > I'm just wondering..
    > Let's say that we only need to read the last line. Can we do that without
    > reading the other lines?


    Yes, of course. It's exactly the same problem as reading the first line.
    The only difference is that there is a standard function for the first
    line: gets.

    For the last line you need to implement it yourself.

    If I had mmap in Ruby, I'd just map the file into memory and do
    mapped_file[/^.*\z/].

    mfg, simon .... l
    Simon Krahnke, Dec 21, 2008
    #10
  11. Brian Candler, Dec 21, 2008
    #11
  12. * Brian Candler <> (22:56) schrieb:

    > Why re-invent the wheel?
    >
    > lastline = `tail -1 queued.txt`


    Cause there's not always a tool out there to do the job.

    New programming languages are always reinventing wheels.

    mfg, simon .... l
    Simon Krahnke, Dec 21, 2008
    #12
  13. Mmcolli00 Mom

    Chris Shea Guest

    On Dec 20, 7:16 am, Mmcolli00 Mom <> wrote:
    > I have txt file with date/time stamps only. I want to grab the first
    > date/time and the last date/time. For instance, I will be needing
    > 08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
    > how I can pull these out? Thanks in advance.
    >
    > queued.txt
    > 8/09/08 3:00
    > 8/10/08 5:00
    > 8/23/08 22:00
    > 8/24/08 3:00
    >
    > firstDate = ""
    > lastDate = ""
    >
    > File.open('queued.txt', 'r') do |f1|
    >  while line = f1.gets
    >    if f1.lineno ==  1 then #<-this would only give me 8/09/08 3:00
    >     @@fistDate = f1
    >    end
    > end
    > --
    > Posted viahttp://www.ruby-forum.com/.


    Aside from the suggestions already made for getting just the last
    line, there's also James Gray's Elif: http://elif.rubyforge.org/

    HTH,
    Chris
    Chris Shea, Dec 22, 2008
    #13
  14. Mmcolli00 Mom

    Peña, Botp Guest

    From: Thomas Preymesser [mailto:]=20
    # Really?
    # ....
    # I did the tests with a test file with almost 8,000,000 lines.

    test w zero or one line first
    =20
    # My q&d code:
    #=20
    # f=3DFile.open("input")
    # pos =3D 2
    # f.seek(-pos, File::SEEK_END)
    # c =3D f.getc
    # result =3D ''
    # while c.chr !=3D "\n"


    quick reaction: this would sure to fail on zero-or-one-liners that do =
    not end w a newline, no?


    # result.insert(0,c.chr)
    # pos +=3D 1
    # f.seek(-pos, File::SEEK_END)
    # c =3D f.getc
    # end
    # f.close
    Peña, Botp, Dec 22, 2008
    #14
  15. 2008/12/22 Pe=F1a, Botp <>:
    > quick reaction: this would sure to fail on zero-or-one-liners that do no=

    t end w a newline, no?

    Maybe, but this was only code to illustrate my idea. In a real
    implementation it would be necessary to consider these circumstances.

    -Thomas

    --=20
    Thomas Preymesser

    http://thopre.googlepages.com/
    http://thopre.wordpress.com/
    Thomas Preymesser, Dec 22, 2008
    #15
  16. On 21.12.2008 19:16, Thomas Preymesser wrote:
    > 2008/12/21 Robert Klemme <>:
    >> On 20.12.2008 17:46, Thomas Preymesser wrote:
    >>> 2008/12/20 Yaser Sulaiman <>:
    >>>> I'm just wondering..
    >>>> Let's say that we only need to read the last line. Can we do that without
    >>>> reading the other lines?
    >>> Yes. Position your file pointer to the last byte in a file, read and
    >>> collect backwards each byte until you find a newline character (or the
    >>> first byte of the file). This is the last line.

    >> You have to admit that this approach is rather inefficient.

    >
    > Really?
    >
    > I did a comparision of your code and my idea:
    >
    > $ time ruby r.rb input
    > 111111111111111111111111111111
    > 999999999999999999999999999999
    >
    > real 0m0.053s
    > user 0m0.000s
    > sys 0m0.004s
    >
    > $ time ruby t.rb input
    > 999999999999999999999999999999
    >
    > real 0m0.043s
    > user 0m0.004s
    > sys 0m0.000s
    >
    > the first result is your code, the second is mine.


    Did you make sure that no OS disk buffering distorts this result? I
    suggest to include both variants in a single script, execute each
    variant multiple times in a loop and use Benchmark#bmbm.

    > I did the tests with a test file with almost 8,000,000 lines.
    >
    > My q&d code:
    >
    > f=File.open("input")
    > pos = 2


    This opens the door for character loss of the last line under certain
    conditions.

    > f.seek(-pos, File::SEEK_END)
    > c = f.getc
    > result = ''
    > while c.chr != "\n"
    > result.insert(0,c.chr)
    > pos += 1
    > f.seek(-pos, File::SEEK_END)
    > c = f.getc
    > end
    > f.close
    >
    > puts result


    Also, this code is not equivalent to mine as it does not output the
    first line - which you can nicely see from the console output shown above.

    Please keep also in mind, that my code tries to do some error checking
    which avoids printing the line from a single line file twice (although
    that bit is a slightly flawed, I'll leave that debugging task as
    exercise for the reader).

    A final remark: using the block form of File.open is always safer.

    Cheers

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    Robert Klemme, Dec 22, 2008
    #16
  17. Mmcolli00 Mom

    Marc Heiler Guest

    > Why re-invent the wheel?

    Because your wheel will not work on i.e. Windows without the "tail"
    binary, but the ruby wheel will work wherever ruby works. And I honestly
    think that everything that is possible in ruby, should be done as well.
    The whole modularity of Unix tools has also led to shell scripts, which
    are just plain UGLY and a mess to maintain, especially the more
    complicated they grow (which is only less true for ruby scripts, because
    maintaining even complicated ruby scripts is a lot easier IMHO)

    I personally would rather maintain a collection of ruby or python files,
    than countless shell scripts that use various tools with various
    different syntax rules (awk, sed, grep and so on) to cope with.

    Noone will use a wooden wheel to drive on the 24 Hours of Le Mans.

    Use the better wheel.

    Use Ruby.
    --
    Posted via http://www.ruby-forum.com/.
    Marc Heiler, Dec 22, 2008
    #17
  18. Mmcolli00 Mom

    James Gray Guest

    On Dec 22, 2008, at 4:11 AM, Marc Heiler wrote:

    >> Why re-invent the wheel?

    >
    > Because your wheel will not work on i.e. Windows without the "tail"
    > binary, but the ruby wheel will work wherever ruby works.


    Definitely have a look at Elif then. It's a tail like algorithm in
    pure Ruby.

    James Edward Gray II
    James Gray, Dec 22, 2008
    #18
  19. Marc Heiler wrote:
    >> Why re-invent the wheel?

    >
    > Because your wheel will not work on i.e. Windows without the "tail"
    > binary, but the ruby wheel will work wherever ruby works.


    Sure. But if this particular poster is running under Linux, or cygwin,
    or MacOS X, then the `tail` solution is (a) dead quick to write, and (b)
    already highly optimised. As has been pointed out, the algorithm for
    doing tail efficiently is not as easy as it might first appear.

    If the OP is not writing this code for his personal use, but in a
    library which must be as widely portable as possible, then of course a
    pure Ruby solution is going to be beneficial. But in that case, he may
    wish to consider releasing the tail algorithm as a standalone library.

    > The whole modularity of Unix tools has also led to shell scripts, which
    > are just plain UGLY and a mess to maintain, especially the more
    > complicated they grow (which is only less true for ruby scripts, because
    > maintaining even complicated ruby scripts is a lot easier IMHO)


    I agree: fork/exec, argv, env and stdin/stdout are a fairly lousy API,
    but:

    > Use the better wheel.


    The tail wheel in gnu coreutils is a highly polished, aerodynamic and
    tested one.
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Dec 22, 2008
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    16
    Views:
    509
    Richard Bos
    Jun 13, 2005
  2. Richard Schneeman
    Replies:
    16
    Views:
    478
    Daniel Bush
    Aug 27, 2008
  3. Replies:
    10
    Views:
    263
    Robert Klemme
    Oct 11, 2008
  4. jobo

    How to read .txt file line by line

    jobo, Apr 17, 2007, in forum: Javascript
    Replies:
    1
    Views:
    85
    scripts.contact
    Apr 17, 2007
  5. Replies:
    8
    Views:
    203
    Dennis Lee Bieber
    Dec 19, 2012
Loading...

Share This Page