Text file parsing in ruby

P

Paul van Delst

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
...etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv
 
A

ara.t.howard

Hello,

As I use ruby more and more for things, I find myself creating "Config"
classes, filling them with data read from a simple text file, and then
passing instances of config around to do all the work. What I would like to
get some advice on, or links to, is ruby-ish methods of reading/parsing text
files.

A lot of text files have, for example, some sort of header that says how much
data is coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
..etc..

Or, svn log output where the header line says how many lines of log message
follow.

I find I'm struggling to figure out a tidy way to read these sorts of files.
If, for example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much
actual data follows before the next header? I.e. I discover that I need to
read 5 point so I read 5 points and the next line that is parsed in the above
iteration is the next header line. Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag
what is to come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv

yaml is your good friend:

harp:~ > cat a.rb
require 'yaml'
points = YAML.load(IO.read('points.yml'))
p points.size
points.each{|point| p point}

harp:~ > cat points.yml
---
- [10, 20]
- [11, 21]

harp:~ > ruby a.rb
2
[10, 20]
[11, 21]


and so much more...

regards.

-a
 
W

William James

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
..etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv


open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}
 
R

Robert Klemme

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
..etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv


open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}

Or test after the fact:

# untested
sets = []
current = nil
items = nil

File.foreach('data1') do |line|
case line
when /Number of data points: (\d+)/
raise "Wrong amount" if current && current.size != items
items = $1.to_i
current = []
else
current << line.scan(/\d+/).map! {|x| x.to_i}
end
end

raise "Wrong amount" if current && current.size != items

Regards

robert
 
P

Paul van Delst

[snip example]
I find I'm struggling to figure out a tidy way to read these sorts of
files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me
how much actual data
follows before the next header? I.e. I discover that I need to read 5
point so I read 5
points and the next line that is parsed in the above iteration is the
next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values
that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?


open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}

Or test after the fact:

# untested
sets = []
current = nil
items = nil

File.foreach('data1') do |line|
case line
when /Number of data points: (\d+)/
raise "Wrong amount" if current && current.size != items
items = $1.to_i
current = []
else
current << line.scan(/\d+/).map! {|x| x.to_i}
end
end

raise "Wrong amount" if current && current.size != items

To all responders, as always, thanks very much. You guys are great. One day I will grok
this much better (but I have some unlearning to do...)

cheers,

paulv

p.s. Ara, I do use YAML for some things, but I don't always (actually, quite rarely) have
control of how the file is created. :eek:(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top