FasterCSV - varying headers

S

Sean Mcknew

Hello,

Quick warning: I am very much a ruby newbie and am extremely new to
programming in general.

I'm attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user's headers as if they followed my original specifications exactly.

For example, let's say that I require the following columns: Product
Title, Product Price. If the user were to provide me with the headers
worded as Product Name and Product Pricing, I would want to assign
'Product Name' to represent 'Product Title.'

I suspect that throwing the headers into a hash would be ideal, but
I'm not entirely sure how to approach it. Here an excerpt from my
attempt thus far...


require "rubygems"
require "fastercsv"

class HeaderProcessing
attr_accessor :file
attr_accessor :headers
attr_accessor :clientid
attr_accessor :product_title_header, :product_price_header


def initialize
puts "What is the client ID?"
@clientid = gets.chomp
open_file
end

def open_file
infile = "tobeprocessed/#{@clientid}.csv"
outfile = "tobeprocessed/#{@clientid}_out.csv"
csv = FasterCSV.read(infile, {:headers => true, :return_headers =>
true, :header_converters => :symbol})
# Not sure if read is the best approach here, since some files
could get quite large.
puts "The user's headers are "
puts csv.headers.inspect
puts "\n \n Please enter the user supplied Product Title header"
@product_title_header = gets.chomp
puts "\n \n Please enter the user supplied Product Price"
@product_price_header = gets.chomp
# I do this with each required and optional header. Not very DRY for
now...
# I now have each of the user's headers I intend to use in a number of
instance variables.
# placeholder for user product data clean up
File.open(outfile, "w") { |f| f.puts csv }
end
end
queued = HeaderProcessing.new


If I understand FasterCSV correctly, by setting :headers to true, the
csv file was read as a table object. Is it possible to turn the table's
headers into a hash and then set each key/value to the appropriate
variable (as per @product_title_header etc)? If so, how? I've been
rummaging through the FasterCSV docs that I believe pertain to the
question, but I'm a bit lost on the actual implementation

Is it also feasible to save these hash definitions to a separate file so
that I won't have to go through the same process when/if the user
provides a new file with updated prices? Alternatively, if there's a
more appropriate way to tackle this, I'm all ears.

Thanks in advance!
Inf
 
A

Andrew Timberlake

Hello,

Quick warning: I am very much a ruby newbie and am extremely new to
programming in general.

I'm attempting to build a little program that =A0operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user's headers as if they followed my original specifications exactly.

For example, let's say that I require the following columns: =A0Product
Title, Product Price. =A0If the user were to provide me with the headers
worded as =A0Product Name and Product Pricing, I would want to assign
'Product Name' to represent 'Product Title.'

=A0I suspect that throwing the headers into a hash would be ideal, but
I'm not entirely sure how to approach it. Here an excerpt from my
attempt thus far...


require "rubygems"
require "fastercsv"

class HeaderProcessing
=A0attr_accessor :file
=A0attr_accessor :headers
=A0attr_accessor :clientid
=A0attr_accessor :product_title_header, :product_price_header


=A0def initialize
=A0puts "What is the client ID?"
=A0@clientid =3D gets.chomp
=A0 =A0open_file
=A0end

=A0def open_file
=A0 =A0infile =3D "tobeprocessed/#{@clientid}.csv"
=A0 =A0outfile =3D "tobeprocessed/#{@clientid}_out.csv"
=A0csv =3D =A0FasterCSV.read(infile, =A0{:headers =3D> true, :return_head= ers =3D>
true, :header_converters =3D> :symbol})
=A0 =A0 # Not sure if read is the best approach here, since some files
could get quite large.
=A0 =A0puts "The user's headers are "
=A0 =A0puts csv.headers.inspect
=A0 =A0puts "\n \n Please enter the user supplied Product Title header"
=A0@product_title_header =3D gets.chomp
=A0 =A0 puts "\n \n Please enter the user supplied Product Price"
=A0@product_price_header =3D gets.chomp
=A0# I do this with each required and optional header. Not very DRY for
now...
=A0# =A0I now have each of the user's headers I intend to use in a number= of
instance variables.
=A0# placeholder for user product data clean up
=A0File.open(outfile, "w") { |f| f.puts csv }
=A0 =A0end
=A0end
queued =3D HeaderProcessing.new


If I understand FasterCSV correctly, by setting :headers to true, the
csv file was read as a table object. Is =A0it possible to turn the table'= s
headers into a hash and then set each key/value to the appropriate
variable (as per @product_title_header etc)? =A0 If so, how? =A0I've been
rummaging through the FasterCSV docs that I believe pertain to the
question, but I'm a bit lost on the actual implementation

Is it also feasible to save these hash definitions to a separate file so
that I won't have to go through the same process when/if the user
provides a new file with updated prices? Alternatively, if there's a
more appropriate way to tackle this, I'm all ears.

Thanks in advance!
Inf

I wrote a rails plugin which does this type of translation between
user supplied columns and expected columns. It is specific to Rails
but you might be able to get some ideas from it.
http://github.com/internuity/map-fields

Andrew Timberlake
http://ramblingsonrails.com

http://MyMvelope.com - The SIMPLE way to manage your savings
 
J

James Edward Gray II

Hello,
Hello.

I'm attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat
the
user's headers as if they followed my original specifications exactly.
Alternatively, if there's a more appropriate way to tackle this, I'm
all ears.

I have some ideas.

First, let's talk about the matching headers problem. Coming up with
everything a user might think of to type in sounds hard to me. What
if we showed the user which headers are available instead and had them
pick from a list? It seems like that would be easier and more accurate.

My other thought is that it looks like you are slurping the whole file
into memory just to write it all back out. Why don't we just read a
line, fix it, write it out, and move on to the next line? That should
take less memory.

Here's some example code combining these thoughts:

$ cat products.csv
Product Title,Product Price,Product Rating
Agricola,$55.99,4.5
Dominion,$35.99,5
Pandemic,$27.99,4.75
$ ruby csv_transfer.rb products.csv
1: Product Title
2: Product Price
3: Product Rating
d: Done

Column to include: 1
Added Product Title.
2: Product Price
3: Product Rating
d: Done

Column to include: 2
Added Product Price.
3: Product Rating
d: Done

Column to include: d
$ cat products_new.csv
Product Title,Product Price
Agricola,$55.99
Dominion,$35.99
Pandemic,$27.99
$ cat csv_transfer.rb
#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

file = ARGV.shift or abort "USAGE: #{$PROGRAM_NAME} CSV_FILE"
columns = [ ]
FCSV.open("#{File.basename(file, '.csv')}_new.csv", "w") do |csv|
FCSV.foreach(file, :headers => true) do |row|
# The following is a simple menu selection for columns.
if columns.empty?
loop do
choices = { }
row.headers.each_with_index do |column, i|
unless columns.include? column
n = i + 1
puts "#{n}: #{column}"
choices[n] = column
end
end
puts "d: Done"
puts
print "Column to include: "
choice = gets or break
if column = choices[choice.strip.to_i]
columns << column
puts "Added #{column}."
elsif choice =~ /\Ad(?:eek:ne)?\Z/i
break
else
puts "Invalid column selection."
end
end
if columns.empty?
puts "No columns selected."
exit
end
csv << columns
end

# Copy only the selected columns.
csv << columns.map { |column| row[column] }
end
end

__END__

Hope that helps.

James Edward Gray II
 
J

Jesús Gabriel y Galán

=A0$ cat products.csv
=A0Product Title,Product Price,Product Rating
=A0Agricola,$55.99,4.5
=A0Dominion,$35.99,5
=A0Pandemic,$27.99,4.75

Good choices for the example (the ratings are over 5, right?) :)

Jesus.
 
J

James Edward Gray II

Good choices for the example (the ratings are over 5, right?) :)

Absolutely. I'm glad someone appreciated the examples. ;)

James Edward Gray II=
 
S

Sean Em

Andrew: A rails version was definitely in the pipeline on this end, so
you will have saved me quite a bit of time. Thanks for sharing the
plugin!

James: I very much appreciate the assistance. I suspect I'll learn
quite a bit as I experiment with the example code you've posted.
Thanks!

Regards,
S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top