Parsing excel CVS data on a mac OSX to extract blocks of cells

A

anne001

I would like to parse some excel CVS data which has a repetitive block
pattern

"Experiment ID: 1",,,,,,,,,,,,
"Subject ID: 1013938829432171e868c340.
Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
response,,,,,
26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
2005,,,,,
23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
2005,,,,,
7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,,Sat Oct 15
17:49:24 GMT-0400 2005,,,,,
8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,,Sat Oct 15
17:49:31 GMT-0400 2005,,,,,
9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,,Sat Oct 15
17:49:34 GMT-0400 2005,,,,,
#####
more data
######
,,,,,,,,,,4374.347222,,
,,,,,,,,,,,,1.00
,,,,,,,,,,,,0.93
### and a new block starts
"Experiment ID: 3",,,,,,,,,,,,0.92
....

Question 1:
------------------
Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
has a length of 1, why?
I noticed that the puts has ^M everywhere
Arr =
IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"),sep_string="^M")
seems to split the array into lines.
Arr = Arr.split(",")
gives me a message, private method `split' called
puts Arr.length gets me no response. I expected split to take the 1D
array and transform it into a 2D array.

What is the best way to get the excel CVS data into a Ruby 2D array for
further analysis?

Question 2:
-----------------
Once I have a 2D array, what is the best way to find the index of the
key word Trial (which starts the data I want).

In matlab, I would do something like,
Find the index pair (I,J) at which Array = Trial
If J==1
Extract the array starting at row I+1 to I+82, for columns [2, 3, and
5]
and I would love some pointers at Ruby structures, methods, etc I need
to look at.
- First ruby code -- Thanks for your help
 
A

Ara.T.Howard

Question 1:

require "csv"

path = File::expand_path "~/Desktop/FaceRetest.cvs"

table = [] # 2d array
CSV::eek:pen(path, 'r'){|row| table << row}


that ought to get you going.


-a
--
===============================================================================
| ara [dot] t [dot] howard [at] gmail [dot] com
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================
 
J

James Edward Gray II

I would like to parse some excel CVS data which has a repetitive block
pattern

"Experiment ID: 1",,,,,,,,,,,,
"Subject ID: 1013938829432171e868c340.
Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
response,,,,,
26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
2005,,,,,
23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
2005,,,,,
7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,,Sat Oct 15
17:49:24 GMT-0400 2005,,,,,
8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,,Sat Oct 15
17:49:31 GMT-0400 2005,,,,,
9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,,Sat Oct 15
17:49:34 GMT-0400 2005,,,,,
#####
more data
######
,,,,,,,,,,4374.347222,,
,,,,,,,,,,,,1.00
,,,,,,,,,,,,0.93
### and a new block starts
"Experiment ID: 3",,,,,,,,,,,,0.92
....

Question 1:

Well, if you're on Windows, Ruby is looking for a \015\012 sequence
to end the line. On Unix it would be looking for a \012...
I noticed that the puts has ^M everywhere

^M (control-M) is just \015. So it's not seeing the line endings.
Some OSes actually used this line ending, like Mac OS 9 and lower,
but it's pretty rare.
Arr =
IO.readlines(File.expand_path("~/Desktop/
FaceRetest.cvs"),sep_string="^M")

That's a fine fix. You should be able to replace sep_string="^M"
with just "\r", I think.
seems to split the array into lines.
Arr = Arr.split(",")
gives me a message, private method `split' called

First, don't start Ruby variable names with a capital letter. This
isn't your problem here, but it's still not a habit to get into. A
capital variable is a constant in Ruby.

The real problem here is that Arr is an Array, and you are calling a
String function on it, split(). Try:

Arr.first.split(",")

# ... or ...

Arr.map { |row| row.split(",") }

But it's better to use a real parser as Ara suggested.
puts Arr.length gets me no response. I expected split to take the 1D
array and transform it into a 2D array.

What is the best way to get the excel CVS data into a Ruby 2D array
for
further analysis?

Try:

require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

It gives two-dimensional arrays:

Neo:~/Desktop$ cat data.csv
1,2,3
4,,5
Neo:~/Desktop$ ruby -r pp -r csv -e 'pp CSV.read("data.csv")'
[["1", "2", "3"], ["4", nil, "5"]]
Question 2:

Hmm, what about something like:

in_section = false

csv.each do |row|
if row.first == "Trial"
in_section = true
next
elsif in_section
# process row here...
end
end

Hope that gets you going.

James Edward Gray II
 
G

Gregory Brown

What is the best way to get the excel CVS data into a Ruby 2D array for
further analysis?

This might be overkill and more useful when it is more complete, but
Ruby Reports has basic support for stuff like this, via DataSets.
---
require "rubygems"
require "ruportlib"

my_data =3D Report::DataSet.load("foo.csv")
---
this will create a DataSet filled with DataRows which can be indexed
ordinally or by field name. (The first row of the CSV is used to
define field names)

example:
---
my_data.each do |row|
puts row["name"]
end
---
would print the value of the "name" column if it existed.

both DataSet and DataRow are Enumerable, which might make finding your
data easier.

if you want to try this approach,

sudo gem install ruport

should get you the latest version

The rather sparse documentation is housed at:
http://ruport.rubyforge.org/docs/

You'd want to look at (and inspect the source on) Report::DataSet and
Report::DataRow

HTH,
Greg
 
G

Gregory Brown

This might be overkill and more useful when it is more complete, but
Ruby Reports has basic support for stuff like this, via DataSets.

As a side not, you do not need to install DBI or any database related
dependencies to use this set of features.
 
A

anne001

Thank you so much for your help
The data was collected by another lab. I don't know what they use. I
saved their excel file data into a csv file. I don't know if there is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara's formulation does the same thing.

I think there is a way to replace special characters in emacs but what
I tried so far does not work.

I thought I could use readlines and a puts and then copy the ruby
output, which would not longer have the ^M, but somehow, ruby gets
mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

thank you
 
J

James Edward Gray II

Thank you so much for your help
The data was collected by another lab. I don't know what they use. I
saved their excel file data into a csv file. I don't know if there
is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

Excel might be legacy enough that it's still using \r as a return
here, for the old Mac OS. Try this:

ruby -p -i.bak -e 'sub("\r", "\n")' your_csv_file_here.csv
require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara's formulation does the same thing.

I think this is a line ending issue. The CSV format does not allow a
naked \r outside of a quoted field. Try my fix above and see if it
gets you going, or you can switch the line ending for CSV's readlines():

require "csv"
arr = CSV.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"), "\r")

If you're still having trouble and it's not a privacy concern, you
are welcome to email me the CSV file off list, and I will try to
figure out how to read it.

James Edward Gray II
 
G

Gene Tani

anne001 said:
Thank you so much for your help
The data was collected by another lab. I don't know what they use. I
saved their excel file data into a csv file. I don't know if there is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara's formulation does the same thing.

I think there is a way to replace special characters in emacs but what
I tried so far does not work.

I thought I could use readlines and a puts and then copy the ruby
output, which would not longer have the ^M, but somehow, ruby gets
mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

thank you

There's a few ways to handle converting newlines between O/S:

- require 'ptools'; File#nl_convert here:
http://ruby-miscutils.sourceforge.net/ptools.html

- specify $/ input record separator and $\ output separators for
IO#gets, readline, each_line etc.

- Komodo has a Code / Clean Line Endings which will convert to the O/S
format you're
running it on

but there's no automatic conversion like python
open("filename.ext","U"), I think
 
A

Ara.T.Howard

Thank you so much for your help
The data was collected by another lab. I don't know what they use. I
saved their excel file data into a csv file. I don't know if there is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara's formulation does the same thing.

I think there is a way to replace special characters in emacs but what
I tried so far does not work.

I thought I could use readlines and a puts and then copy the ruby
output, which would not longer have the ^M, but somehow, ruby gets
mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

run dos2unix on it. if it still fails you'll need see why the file is badly
formatted.

regards.


-a
--
===============================================================================
| ara [dot] t [dot] howard [at] gmail [dot] com
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================
 
A

anne001

I have moved onto Tiger. Still much installing to do. Sorry in the
delay in processing your responses.

The line ending is extremely simple,
http://www.delorie.com/gnu/docs/emacs/emacs_220.html

....-unix newline to separate lines.
....-dos carriage-return + linefeed to separate lines
....-mac carriage-return to separate lines

But the editors show what they want. Text wrangler lets you replace \r
with \n
but when you look, it still looks as a \r. Did it make the
substitution?
Emacs has a command to change the file to unix. But the ^M are not
changed. Did something change? hard to say. So I gave up on that.

I like the command, thank you
ruby -p -i.bak -e 'sub("\r", "\n")' your_csv_file_here.csv

It seems to work, it creates your_cvs_file_here.csv.bak and csv read no
longer gives an error, but... It seems to parse each cell, with realms
of NIL in between bits of text.
Conceptually I think it is easier to process line by line. But I will
look into CVS to see what it can do.

In any case, I can combine the ruby line command with the readlines
ruby function. So I should be OK.
 
A

Adam Sanderson

I've had a similar problem. It sounds like you used Excel somewhere.
My experience has been that Excel for the mac likes to save CSV files
with MacOS 9 line endings which Ruby doesn't like. In the future, you
can choose 'Windows CSV' or something to that extent. It's really
silly, but it works.

Ruby's CSV library will you give you back arrays if you open the file
with the CSV module.
I did notice that your file had this in it:
"Subject ID: 1013938829432171e868c340.
Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
response,,,,,

with the " not closing anywhere. You might want to look at that, Excel
might be generating invalid CSV.

Anyways, goodluck.
.adam
 
M

Martin DeMello

anne001 said:
Question 2:
-----------------
Once I have a 2D array, what is the best way to find the index of the
key word Trial (which starts the data I want).

In matlab, I would do something like,
Find the index pair (I,J) at which Array = Trial
If J==1
Extract the array starting at row I+1 to I+82, for columns [2, 3, and
5]
and I would love some pointers at Ruby structures, methods, etc I need
to look at.

This should get you started:

#------------------------------------------------

ary = [
["this", "is", "some", "code"],
["test", "please", "ignore"],
["Trial", "section", "begins", "here"],
["foo", "bar", "baz", "quux"]
]

index = nil
searchterm = "Trial"
catch:)done) do
ary.each_with_index {|row, i|
row.each_with_index {|cell, j|
puts "testing [#{i},#{j}]"
if cell =~ /^#{searchterm}/
index = [i,j]
throw :done
end
}
}
end

puts "-----------------"

puts "found #{searchterm} at #{index.inspect}"

#------------------------------------------------

martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top