Flat CSV to Relational DB

M

Michael Guterl

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I'm sure someone can make a better suggestion...

FasterCSV.read("data.csv", :headers => true).each do |row|
Product.create(
:name => row["Product Name"],
:category => Category.find_or_create_by_name(row["Product Category"]),
:shipping_category =>
ShippingCategory.find_or_create_by_weight(row["Weight"]),
:color => row["Product Color"],
:description => row["Product Description"]
)
end

My Product.create method call is HUGE, it keeps going and going...

Any suggestions?

TIA,
Michael Guterl
 
J

James Edward Gray II

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I'm sure someone can make a better suggestion...

FasterCSV.read("data.csv", :headers => true).each do |row|
Product.create(
:name => row["Product Name"],
:category => Category.find_or_create_by_name(row["Product
Category"]),
:shipping_category =>
ShippingCategory.find_or_create_by_weight(row["Weight"]),
:color => row["Product Color"],
:description => row["Product Description"]
)
end

My Product.create method call is HUGE, it keeps going and going...

Any suggestions?

I have the following example in FasterCSV's example directory:

#!/usr/local/bin/ruby -w

# csv_rails_import.task
#
# Created by James Edward Gray II on 2006-11-05.
# Copyright 2006 Gray Productions. All rights reserved.

namespace :my_app_name do
desc "Injects purchase.csv into the database."
task :load_purchase => [:environment] do
require "#{RAILS_ROOT}/vendor/faster_csv/lib/faster_csv"

purchase = Purchase.create!

FCSV.foreach( "#{RAILS_ROOT}/db/questions.csv",
:headers => true,
:header_converters => :symbol ) do |line|
purchase.line_items.create!(line.to_hash)
end
end
end

You're needs are slightly more complex, but this makes a decent
starting point:

1. Use :header_converters where you can
2. Hash the line
3. Edit the Hash as needed to create associated objects
4. Hand-off to create

Hope that helps.

James Edward Gray II
 
J

Jeremy Hinegardner

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I'm sure someone can make a better suggestion...

Depending on the quantity of data you may want to pick something other
than ruby to do this. Most databases (at least I know sqlite,
postgresql, mysql and db2 do) have a generic IMPORT or LOAD statement to
import from a delimited file; the most common of which being a CSV.

Many times I've wanted use ruby to parse and load CSV files into
databases, but for me there wass a data volume threshold. Many times
I've had to load hundreds of thousands to hundreds of millions of rows
into a db, and these days it is (for me) much easier to just have the
ruby script invoke the appropriate IMPORT or LOAD or whatever SQL
statement from the DB interface and pass it the CSV file to load.

Sometimes I've had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.

Just my opinion though :).

enjoy,

-jeremy
 
M

Michael Guterl

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I'm sure someone can make a better suggestion...

FasterCSV.read("data.csv", :headers => true).each do |row|
Product.create(
:name => row["Product Name"],
:category => Category.find_or_create_by_name(row["Product
Category"]),
:shipping_category =>
ShippingCategory.find_or_create_by_weight(row["Weight"]),
:color => row["Product Color"],
:description => row["Product Description"]
)
end

My Product.create method call is HUGE, it keeps going and going...

Any suggestions?

I have the following example in FasterCSV's example directory:

#!/usr/local/bin/ruby -w

# csv_rails_import.task
#
# Created by James Edward Gray II on 2006-11-05.
# Copyright 2006 Gray Productions. All rights reserved.

namespace :my_app_name do
desc "Injects purchase.csv into the database."
task :load_purchase => [:environment] do
require "#{RAILS_ROOT}/vendor/faster_csv/lib/faster_csv"

purchase = Purchase.create!

FCSV.foreach( "#{RAILS_ROOT}/db/questions.csv",
:headers => true,
:header_converters => :symbol ) do |line|
purchase.line_items.create!(line.to_hash)
end
end
end

You're needs are slightly more complex, but this makes a decent
starting point:

1. Use :header_converters where you can
2. Hash the line
3. Edit the Hash as needed to create associated objects
4. Hand-off to create

Hope that helps.
Thanks, I was unaware of :header_converters, those look like they
could be very useful. I also was unaware of the ease at which I could
hash a row, very cool. Thanks for this great library!
James Edward Gray II

Michael Guterl
 
M

Michael Guterl

Depending on the quantity of data you may want to pick something other
than ruby to do this. Most databases (at least I know sqlite,
postgresql, mysql and db2 do) have a generic IMPORT or LOAD statement to
import from a delimited file; the most common of which being a CSV.
In this particular case I do not have a large quantity of data, though
I do appreciate the suggestions.
Many times I've wanted use ruby to parse and load CSV files into
databases, but for me there wass a data volume threshold. Many times
I've had to load hundreds of thousands to hundreds of millions of rows
into a db, and these days it is (for me) much easier to just have the
ruby script invoke the appropriate IMPORT or LOAD or whatever SQL
statement from the DB interface and pass it the CSV file to load.
Yeah, my particular liking of AR is that I can easily move pieces of
the data to the correct tables, setup all the relationships,
validations, and manipulations in Ruby.
Sometimes I've had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.
I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.
Just my opinion though :).

enjoy,

-jeremy
Thanks,
Michael Guterl
 
J

Jeremy Hinegardner

[...]
Sometimes I've had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.
I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.

Glad to help. Let us know how it turns out.

enjo,

-jeremy
 
A

Alice Gheorghiu

Jeremy said:
[...]
Sometimes I've had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.
I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.

Glad to help. Let us know how it turns out.

enjo,

-jeremy

Hi Jeremy:
Can you please elaborate on how you call natively load data infile from
a ruby file ? I really need good performance and I cannot get it as you
said by using active records or DBI.
The piece of code that is slow is this:
<code>
FCSV.foreach(ARGV[0], :headers => true) do |row|
table.insert:)call_date => row[0],
:caller_Id => row[1],
:phone_no=> row[2],
:destination => row[3],
:duration => row[4],
:call_cost => row[5])

</code>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,197
Latest member
ScottChare

Latest Threads

Top