How do you get the rows out of FasterCSV?

G

Gary

Hi. I want to add a normalized column to a csv file. That is, I want to
read the file, sum all of a column X, then add another column in which
each X is divided by the sum. So do I use the CSV rows twice without
reading the file twice?

The examples in the FasterCSV documentation at
http://fastercsv.rubyforge.org/ show class methods that provide
file-like operations. I want the file read once while the data is read,
then closed. While reading the first time, the column sum is calculated.
But then I want to go through the csv rows again, this time writing out
the rows with their new column.

Here's a sketch of what I had in mind. It doesn't work as intended...

# read csv file, summing the values
sum = 0
csv =
FasterCSV.new(open(csv_filename),{:headers=>true}).each_with_index do |
row, c |
sum += row["VAL"].to_f
end

# now write
FasterCSV.open("test_csv_file.csv", "w", {:headers=>true}) do |csvout|
csv.each_with_index do | row, c |
row["NORMED"] = row["VAL"].to_f / sum
csvout << row.headers if c==0
csvout << row
end
end

Also, is there a more graceful way to have the headers written out?
 
C

ChrisH

Hi. I want to add a normalized column to a csv file. That is, I want to
read the file, sum all of a column X, then add another column in which
each X is divided by the sum. So do I use the CSV rows twice without
reading the file twice?
...snip...

csvData = FasterCSV.read('/path/to/infile.csv', :headers=>true) ##
read all data into an array of FasterCSV::Rows

##, may run into memory issues
sumColX = 0
csvData.each{|row| sum += row['ColX']}

FasterCSV.open("path/to/outfile.csv", "w") do |csv|
csvData.each{ |row|
csv << row << row['ColX'].to_f / sum ## calc and append norm
value and output row of CSV DATA
}
end


Not tested, or even executed but I think its close 9^)

Cheers
Chris
 
A

ara.t.howard

Hi. I want to add a normalized column to a csv file. That is, I want to
read the file, sum all of a column X, then add another column in which
each X is divided by the sum. So do I use the CSV rows twice without
reading the file twice?

The examples in the FasterCSV documentation at
http://fastercsv.rubyforge.org/ show class methods that provide
file-like operations. I want the file read once while the data is read,
then closed. While reading the first time, the column sum is calculated.
But then I want to go through the csv rows again, this time writing out
the rows with their new column.

Here's a sketch of what I had in mind. It doesn't work as intended...

# read csv file, summing the values
sum = 0
csv =
FasterCSV.new(open(csv_filename),{:headers=>true}).each_with_index do |
row, c |
sum += row["VAL"].to_f
end

# now write
FasterCSV.open("test_csv_file.csv", "w", {:headers=>true}) do |csvout|
csv.each_with_index do | row, c |
row["NORMED"] = row["VAL"].to_f / sum
csvout << row.headers if c==0
csvout << row
end
end

Also, is there a more graceful way to have the headers written out?


harp:~ > cat a.rb
require 'rubygems'
require 'fastercsv'

csv = <<-csv
f,x
a,0
b,1
c,2
d,3
csv

fcsv = FasterCSV.new csv, :headers => true
table = fcsv.read

xs = table['x'].map{|x| x.to_i}
sum = Float xs.inject{|sum,i| sum += i}
norm = xs.map{|x| x / sum}

table['n'] = norm
puts table


harp:~ > ruby a.rb
f,x,n
a,0,0.0
b,1,0.166666666666667
c,2,0.333333333333333
d,3,0.5



-a
 
J

James Edward Gray II

require 'rubygems'
require 'fastercsv'

csv = <<-csv
f,x
a,0
b,1
c,2
d,3
csv

fcsv = FasterCSV.new csv, :headers => true
table = fcsv.read

Or just:

table = FCSV.parse(csv, :headers => true)
xs = table['x'].map{|x| x.to_i}

When you want to convert a field, ask FasterCSV to do it for you
while reading. This changes the above to:

table = FCSV.parse(
csv,
:headers => true,
:converters => lambda { |f, info| info.header == "x" ? f.to_i : f }
)

Or using a built-in converter:

table = FCSV.parse(csv, :headers => true, :converters => :integer)
sum = Float xs.inject{|sum,i| sum += i}

This is now simplified to:

sum = Float table['x'].inject{|sum,i| sum += i}
norm = xs.map{|x| x / sum}

table['n'] = norm
puts table

James Edward Gray II
 
G

Guest

Thanks, works great!!

If I want to parse some columns as floats, some as ints, and leave the
rest alone, is this the best way to do it?

table = FCSV.parse(open(csv_filename),
:headers => true,
:converters => lambda { |f, info|
case info.header
when "NUM"
f.to_i
when "RATE"
f.to_f
else
f
end
})

How do I write the finished file? My attempts appear without headers in
the csv file?
 
G

Guest

ThisForum said:
How do I write the finished file? My attempts appear without headers in
the csv file?

open("test_csv_file.csv", "w") << table

worked and included the headers. Is there a faster way for large csv
tables?

Thanks
 
J

James Edward Gray II

Thanks, works great!!

If I want to parse some columns as floats, some as ints, and leave the
rest alone, is this the best way to do it?

table = FCSV.parse(open(csv_filename),
:headers => true,
:converters => lambda { |f, info|
case info.header
when "NUM"
f.to_i
when "RATE"
f.to_f
else
f
end
})

Instead of:

FCSV.parse(open(csv_filename) ... )

Use:

FCSV.read(csv_filename ... )

Your converters work fine though, yes. If you numbers can be
recognized by some built-in converters, you might even be able to get
away with

FCSV.read(csv_filename, :headers => true, :converters => :numeric)
How do I write the finished file? My attempts appear without
headers in
the csv file?

I would use:

File.open("path/to/file", "w") { |f| f.puts table }

Hope that helps.

James Edward Gray II
 
A

ara.t.howard

Instead of:

FCSV.parse(open(csv_filename) ... )

Use:

FCSV.read(csv_filename ... )

Your converters work fine though, yes. If you numbers can be recognized by
some built-in converters, you might even be able to get away with

FCSV.read(csv_filename, :headers => true, :converters => :numeric)

alias to FCSV.table

perhaps?

-a
 
A

ara.t.howard

Alias read() to table() or read() with those options?

the latter. i hate typing! ;-)

actually, read with those __default__ options. so

def FCSV.table opts = {}

....

headers = opts[:headers] || opts['headers'] || true
converters = opts[:converters] || opts['converters'] || :converters

....

end


thoughts??

-a
 
J

James Edward Gray II

Alias read() to table() or read() with those options?

the latter. i hate typing! ;-)

actually, read with those __default__ options. so

def FCSV.table opts = {}

....

headers = opts[:headers] || opts['headers'] || true
converters = opts[:converters] || opts['converters']
|| :converters

....

end


thoughts??

Yes, FasterCSV doesn't support goofy Rails-like Hashes. :D

Beyond that though, I released FasterCSV 1.2.0 today with the
addition of:

def self.table(path, options = Hash.new)
read( path, { :headers => true,
:converters => :numeric,
:header_converters => :symbol }.merge(options) )
end

Enjoy.

James Edward Gray II
 
W

William James

Hi. I want to add a normalized column to a csv file. That is, I want to
read the file, sum all of a column X, then add another column in which
each X is divided by the sum. So do I use the CSV rows twice without
reading the file twice?
The examples in the FasterCSV documentation at
http://fastercsv.rubyforge.org/show class methods that provide
file-like operations. I want the file read once while the data is read,
then closed. While reading the first time, the column sum is calculated.
But then I want to go through the csv rows again, this time writing out
the rows with their new column.
Here's a sketch of what I had in mind. It doesn't work as intended...
# read csv file, summing the values
sum = 0
csv =
FasterCSV.new(open(csv_filename),{:headers=>true}).each_with_index do |
row, c |
sum += row["VAL"].to_f
end
# now write
FasterCSV.open("test_csv_file.csv", "w", {:headers=>true}) do |csvout|
csv.each_with_index do | row, c |
row["NORMED"] = row["VAL"].to_f / sum
csvout << row.headers if c==0
csvout << row
end
end
Also, is there a more graceful way to have the headers written out?

harp:~ > cat a.rb
require 'rubygems'
require 'fastercsv'

csv = <<-csv
f,x
a,0
b,1
c,2
d,3
csv

fcsv = FasterCSV.new csv, :headers => true
table = fcsv.read

xs = table['x'].map{|x| x.to_i}
sum = Float xs.inject{|sum,i| sum += i}

Should be sum + i.
norm = xs.map{|x| x / sum}

table['n'] = norm
puts table

harp:~ > ruby a.rb
f,x,n
a,0,0.0
b,1,0.166666666666667
c,2,0.333333333333333
d,3,0.5

array = "\
f,x
a,0
b,1
c,2
d,3
".split.map{|s| s.split ","}

headers = array.shift << 'n'

array = array.transpose
sum = array[1].inject{|a,b| a.to_i + b.to_i}.to_f
array << array[1].map{|x| x.to_i / sum}
array = array.transpose
([headers] + array).each{|x| puts x.join(',')}

--- output -----
f,x,n
a,0,0.0
b,1,0.166666666666667
c,2,0.333333333333333
d,3,0.5
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,779
Messages
2,569,606
Members
45,239
Latest member
Alex Young

Latest Threads

Top