Parsing a file with look ahead

S

S. Robert James

I need to parse a file line by line, and output the results line by
line (too big to fit into memory). So far, simple enough:
file.each_line.

However, the parser needs the ability to peek ahead to the next line,
in order to parse this line. What's the right way to do this? Again,
I really don't want to try to slurp the whole file into memory and
split on newlines.

Here's an example:
Line1: Hi
Line2: How
Line3: Are
Line4: you?

I'd like to:
parse('Hi', 'How')
parse('How', 'Are')
parse('Are', 'you?')
parse('you?', false)
# hey, this is practically a unit test!

Any ideas?
 
C

Carl Lerche

The first thing I can think of is do file.each_line and store that
line in a previous_line variable at the end of the proc. Then you have
access to the line that was read before hand and the current one.
 
F

Florian Frank

S. Robert James said:
I'd like to:
parse('Hi', 'How')
parse('How', 'Are')
parse('Are', 'you?')
parse('you?', false)
# hey, this is practically a unit test!

Any ideas?
require 'enumerator'

File.new(filename).enum_slice(2).each do |first, second|
p [ first, second ? second : false ]
end
 
S

S. Robert James

Thanks! BTW, looking at the Rdoc, it seems each_cons is what I want,
no?

S. Robert James said:
I'd like to:
parse('Hi', 'How')
parse('How', 'Are')
parse('Are', 'you?')
parse('you?', false)
# hey, this is practically a unit test!
Any ideas?

require 'enumerator'

File.new(filename).enum_slice(2).each do |first, second|
p [ first, second ? second : false ]
end
 
G

Gregory Brown

Thanks! BTW, looking at the Rdoc, it seems each_cons is what I want,
no?

If you are dealing with paired lines, use enum_slice(2)

if you are dealing with data dependent on the current and previous
line, use each_cons, yes.
 
D

Daniel DeLorme

Gregory said:
If you are dealing with paired lines, use enum_slice(2)

if you are dealing with data dependent on the current and previous
line, use each_cons, yes.

Except each_cons(n) will iterate 9 times if you have 10 lines.

Maybe something simple like this?

line = f.gets
while line
nextline = f.gets
#do stuff...
line = nextline
end

Daniel
 
T

Thomas Hafner

S. Robert James said:
I need to parse a file line by line, and output the results line by
line (too big to fit into memory). So far, simple enough:
file.each_line.

However, the parser needs the ability to peek ahead to the next line,
in order to parse this line. What's the right way to do this? Again,
I really don't want to try to slurp the whole file into memory and
split on newlines.

Sounds for me like it could be solved elegantly with a lazy stream of
input lines. For lazy streams see the Usenet thread starting with
article <[email protected]>, for instance.

The file will be split into lines, but lazily, and for that reason all
the lines don't need to be hold in memory at the same time. Old, i.e.
already consumed lines will be garbage collected soon, because the
application does no longer reference them. You can have as many
lookahead lines as you want (tradeoff: needs more memory, of course).

Regards
Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top