get first and last line from txt file - how?

M

Mmcolli00 Mom

I have txt file with date/time stamps only. I want to grab the first
date/time and the last date/time. For instance, I will be needing
08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
how I can pull these out? Thanks in advance.

queued.txt
8/09/08 3:00
8/10/08 5:00
8/23/08 22:00
8/24/08 3:00

firstDate = ""
lastDate = ""

File.open('queued.txt', 'r') do |f1|
while line = f1.gets
if f1.lineno == 1 then #<-this would only give me 8/09/08 3:00
@@fistDate = f1
end
end
 
T

Tim Hunter

Mmcolli00 said:
I have txt file with date/time stamps only. I want to grab the first
date/time and the last date/time. For instance, I will be needing
08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
how I can pull these out? Thanks in advance.

queued.txt
8/09/08 3:00
8/10/08 5:00
8/23/08 22:00
8/24/08 3:00

firstDate = ""
lastDate = ""

File.open('queued.txt', 'r') do |f1|
while line = f1.gets
if f1.lineno == 1 then #<-this would only give me 8/09/08 3:00
@@fistDate = f1
end
end

lines = IO.readlines("queued.txt")
first = lines.first
last = lines.last

puts first
puts last
 
Y

Yaser Sulaiman

[Note: parts of this message were removed to make it a legal post.]

I'm just wondering..
Let's say that we only need to read the last line. Can we do that without
reading the other lines?

Regards,
Yaser Sulaiman
 
C

Ch Ba

Yaser said:
I'm just wondering..
Let's say that we only need to read the last line. Can we do that
without
reading the other lines?

Regards,
Yaser Sulaiman


It would work the same? Or do you mean without loading up the entire
file?

lines = IO.readlines("foo.bar")

puts lines.last
 
Y

Yaser Sulaiman

[Note: parts of this message were removed to make it a legal post.]

It would work the same? Or do you mean without loading up the entire
file?

Yep, that is exactly what I mean.
 
T

Tim Hunter

Yaser said:
Yep, that is exactly what I mean.

If you know where the last line starts (that is, the byte offset of the
first character in the last line) then you could use IO#seek to seek to
that offset and then read.

How do you know where the last line starts? When you write the file,
call IO#tell to get the current byte offset before you write the last line.
 
T

Thomas Preymesser

2008/12/20 Yaser Sulaiman said:
I'm just wondering..
Let's say that we only need to read the last line. Can we do that without
reading the other lines?

Yes. Position your file pointer to the last byte in a file, read and
collect backwards each byte until you find a newline character (or the
first byte of the file). This is the last line.

-Thomas
 
R

Robert Klemme

Yes. Position your file pointer to the last byte in a file, read and
collect backwards each byte until you find a newline character (or the
first byte of the file). This is the last line.

You have to admit that this approach is rather inefficient. Here's a
more efficient variant - especially for large files:

$ cat r.rb
#!/bin/env ruby

OFFSET = 512 # > 2 * assumed avg line length

file = ARGV.shift or abort "ERROR: need a file name"

File.open file do |io|
first = io.gets
break unless first
puts first

limit = io.stat.size
offset = OFFSET
lines = []

while lines.size < 2 && offset <= limit
io.seek -offset, IO::SEEK_END
lines = io.readlines
offset += OFFSET
end # while lines.size < 2

puts lines.last unless lines.empty?
end

Cheers

robert
 
T

Thomas Preymesser

2008/12/21 Robert Klemme said:
You have to admit that this approach is rather inefficient.

Really?

I did a comparision of your code and my idea:

$ time ruby r.rb input
111111111111111111111111111111
999999999999999999999999999999

real 0m0.053s
user 0m0.000s
sys 0m0.004s

$ time ruby t.rb input
999999999999999999999999999999

real 0m0.043s
user 0m0.004s
sys 0m0.000s

the first result is your code, the second is mine.

I did the tests with a test file with almost 8,000,000 lines.

My q&d code:

f=File.open("input")
pos = 2
f.seek(-pos, File::SEEK_END)
c = f.getc
result = ''
while c.chr != "\n"
result.insert(0,c.chr)
pos += 1
f.seek(-pos, File::SEEK_END)
c = f.getc
end
f.close

puts result

-Thomas
 
S

Simon Krahnke

* Yaser Sulaiman said:
I'm just wondering..
Let's say that we only need to read the last line. Can we do that without
reading the other lines?

Yes, of course. It's exactly the same problem as reading the first line.
The only difference is that there is a standard function for the first
line: gets.

For the last line you need to implement it yourself.

If I had mmap in Ruby, I'd just map the file into memory and do
mapped_file[/^.*\z/].

mfg, simon .... l
 
S

Simon Krahnke

* Brian Candler said:
Why re-invent the wheel?

lastline = `tail -1 queued.txt`

Cause there's not always a tool out there to do the job.

New programming languages are always reinventing wheels.

mfg, simon .... l
 
C

Chris Shea

I have txt file with date/time stamps only. I want to grab the first
date/time and the last date/time. For instance, I will be needing
08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
how I can pull these out? Thanks in advance.

queued.txt
8/09/08 3:00
8/10/08 5:00
8/23/08 22:00
8/24/08 3:00

firstDate = ""
lastDate = ""

File.open('queued.txt', 'r') do |f1|
 while line = f1.gets
   if f1.lineno ==  1 then #<-this would only give me 8/09/08 3:00
    @@fistDate = f1
   end
end

Aside from the suggestions already made for getting just the last
line, there's also James Gray's Elif: http://elif.rubyforge.org/

HTH,
Chris
 
P

Peña, Botp

From: Thomas Preymesser [mailto:[email protected]]=20
# Really?
# ....
# I did the tests with a test file with almost 8,000,000 lines.

test w zero or one line first
=20
# My q&d code:
#=20
# f=3DFile.open("input")
# pos =3D 2
# f.seek(-pos, File::SEEK_END)
# c =3D f.getc
# result =3D ''
# while c.chr !=3D "\n"


quick reaction: this would sure to fail on zero-or-one-liners that do =
not end w a newline, no?


# result.insert(0,c.chr)
# pos +=3D 1
# f.seek(-pos, File::SEEK_END)
# c =3D f.getc
# end
# f.close
 
R

Robert Klemme

Really?

I did a comparision of your code and my idea:

$ time ruby r.rb input
111111111111111111111111111111
999999999999999999999999999999

real 0m0.053s
user 0m0.000s
sys 0m0.004s

$ time ruby t.rb input
999999999999999999999999999999

real 0m0.043s
user 0m0.004s
sys 0m0.000s

the first result is your code, the second is mine.

Did you make sure that no OS disk buffering distorts this result? I
suggest to include both variants in a single script, execute each
variant multiple times in a loop and use Benchmark#bmbm.
I did the tests with a test file with almost 8,000,000 lines.

My q&d code:

f=File.open("input")
pos = 2

This opens the door for character loss of the last line under certain
conditions.
f.seek(-pos, File::SEEK_END)
c = f.getc
result = ''
while c.chr != "\n"
result.insert(0,c.chr)
pos += 1
f.seek(-pos, File::SEEK_END)
c = f.getc
end
f.close

puts result

Also, this code is not equivalent to mine as it does not output the
first line - which you can nicely see from the console output shown above.

Please keep also in mind, that my code tries to do some error checking
which avoids printing the line from a single line file twice (although
that bit is a slightly flawed, I'll leave that debugging task as
exercise for the reader).

A final remark: using the block form of File.open is always safer.

Cheers

robert
 
M

Marc Heiler

Why re-invent the wheel?

Because your wheel will not work on i.e. Windows without the "tail"
binary, but the ruby wheel will work wherever ruby works. And I honestly
think that everything that is possible in ruby, should be done as well.
The whole modularity of Unix tools has also led to shell scripts, which
are just plain UGLY and a mess to maintain, especially the more
complicated they grow (which is only less true for ruby scripts, because
maintaining even complicated ruby scripts is a lot easier IMHO)

I personally would rather maintain a collection of ruby or python files,
than countless shell scripts that use various tools with various
different syntax rules (awk, sed, grep and so on) to cope with.

Noone will use a wooden wheel to drive on the 24 Hours of Le Mans.

Use the better wheel.

Use Ruby.
 
J

James Gray

Because your wheel will not work on i.e. Windows without the "tail"
binary, but the ruby wheel will work wherever ruby works.

Definitely have a look at Elif then. It's a tail like algorithm in
pure Ruby.

James Edward Gray II
 
B

Brian Candler

Marc said:
Because your wheel will not work on i.e. Windows without the "tail"
binary, but the ruby wheel will work wherever ruby works.

Sure. But if this particular poster is running under Linux, or cygwin,
or MacOS X, then the `tail` solution is (a) dead quick to write, and (b)
already highly optimised. As has been pointed out, the algorithm for
doing tail efficiently is not as easy as it might first appear.

If the OP is not writing this code for his personal use, but in a
library which must be as widely portable as possible, then of course a
pure Ruby solution is going to be beneficial. But in that case, he may
wish to consider releasing the tail algorithm as a standalone library.
The whole modularity of Unix tools has also led to shell scripts, which
are just plain UGLY and a mess to maintain, especially the more
complicated they grow (which is only less true for ruby scripts, because
maintaining even complicated ruby scripts is a lot easier IMHO)

I agree: fork/exec, argv, env and stdin/stdout are a fairly lousy API,
but:
Use the better wheel.

The tail wheel in gnu coreutils is a highly polished, aerodynamic and
tested one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top