Parsing excel CVS data on a mac OSX to extract blocks of cells

Discussion in 'Ruby' started by anne001, Nov 21, 2005.

  1. anne001

    anne001 Guest

    I would like to parse some excel CVS data which has a repetitive block
    pattern

    "Experiment ID: 1",,,,,,,,,,,,
    "Subject ID: 1013938829432171e868c340.
    Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
    response,,,,,
    26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
    2005,,,,,
    23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
    2005,,,,,
    7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,,Sat Oct 15
    17:49:24 GMT-0400 2005,,,,,
    8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,,Sat Oct 15
    17:49:31 GMT-0400 2005,,,,,
    9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,,Sat Oct 15
    17:49:34 GMT-0400 2005,,,,,
    #####
    more data
    ######
    ,,,,,,,,,,4374.347222,,
    ,,,,,,,,,,,,1.00
    ,,,,,,,,,,,,0.93
    ### and a new block starts
    "Experiment ID: 3",,,,,,,,,,,,0.92
    ....

    Question 1:
    ------------------
    Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
    has a length of 1, why?
    I noticed that the puts has ^M everywhere
    Arr =
    IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"),sep_string="^M")
    seems to split the array into lines.
    Arr = Arr.split(",")
    gives me a message, private method `split' called
    puts Arr.length gets me no response. I expected split to take the 1D
    array and transform it into a 2D array.

    What is the best way to get the excel CVS data into a Ruby 2D array for
    further analysis?

    Question 2:
    -----------------
    Once I have a 2D array, what is the best way to find the index of the
    key word Trial (which starts the data I want).

    In matlab, I would do something like,
    Find the index pair (I,J) at which Array = Trial
    If J==1
    Extract the array starting at row I+1 to I+82, for columns [2, 3, and
    5]
    and I would love some pointers at Ruby structures, methods, etc I need
    to look at.
    - First ruby code -- Thanks for your help
    anne001, Nov 21, 2005
    #1
    1. Advertising

  2. anne001

    anne001 Guest

    anne001, Nov 21, 2005
    #2
    1. Advertising

  3. anne001

    Ara.T.Howard Guest

    On Mon, 21 Nov 2005, anne001 wrote:

    > Question 1:


    require "csv"

    path = File::expand_path "~/Desktop/FaceRetest.cvs"

    table = [] # 2d array
    CSV::eek:pen(path, 'r'){|row| table << row}


    that ought to get you going.


    -a
    --
    ===============================================================================
    | ara [dot] t [dot] howard [at] gmail [dot] com
    | all happiness comes from the desire for others to be happy. all misery
    | comes from the desire for oneself to be happy.
    | -- bodhicaryavatara
    ===============================================================================
    Ara.T.Howard, Nov 21, 2005
    #3
  4. On Nov 20, 2005, at 7:37 PM, anne001 wrote:

    > I would like to parse some excel CVS data which has a repetitive block
    > pattern
    >
    > "Experiment ID: 1",,,,,,,,,,,,
    > "Subject ID: 1013938829432171e868c340.
    > Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
    > response,,,,,
    > 26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
    > 2005,,,,,
    > 23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
    > 2005,,,,,
    > 7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,,Sat Oct 15
    > 17:49:24 GMT-0400 2005,,,,,
    > 8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,,Sat Oct 15
    > 17:49:31 GMT-0400 2005,,,,,
    > 9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,,Sat Oct 15
    > 17:49:34 GMT-0400 2005,,,,,
    > #####
    > more data
    > ######
    > ,,,,,,,,,,4374.347222,,
    > ,,,,,,,,,,,,1.00
    > ,,,,,,,,,,,,0.93
    > ### and a new block starts
    > "Experiment ID: 3",,,,,,,,,,,,0.92
    > ....
    >
    > Question 1:
    > ------------------
    > Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
    > has a length of 1, why?


    Well, if you're on Windows, Ruby is looking for a \015\012 sequence
    to end the line. On Unix it would be looking for a \012...

    > I noticed that the puts has ^M everywhere


    ^M (control-M) is just \015. So it's not seeing the line endings.
    Some OSes actually used this line ending, like Mac OS 9 and lower,
    but it's pretty rare.

    > Arr =
    > IO.readlines(File.expand_path("~/Desktop/
    > FaceRetest.cvs"),sep_string="^M")


    That's a fine fix. You should be able to replace sep_string="^M"
    with just "\r", I think.

    > seems to split the array into lines.
    > Arr = Arr.split(",")
    > gives me a message, private method `split' called


    First, don't start Ruby variable names with a capital letter. This
    isn't your problem here, but it's still not a habit to get into. A
    capital variable is a constant in Ruby.

    The real problem here is that Arr is an Array, and you are calling a
    String function on it, split(). Try:

    Arr.first.split(",")

    # ... or ...

    Arr.map { |row| row.split(",") }

    But it's better to use a real parser as Ara suggested.

    > puts Arr.length gets me no response. I expected split to take the 1D
    > array and transform it into a 2D array.
    >
    > What is the best way to get the excel CVS data into a Ruby 2D array
    > for
    > further analysis?


    Try:

    require "csv"
    arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

    It gives two-dimensional arrays:

    Neo:~/Desktop$ cat data.csv
    1,2,3
    4,,5
    Neo:~/Desktop$ ruby -r pp -r csv -e 'pp CSV.read("data.csv")'
    [["1", "2", "3"], ["4", nil, "5"]]

    > Question 2:
    > -----------------
    > Once I have a 2D array, what is the best way to find the index of the
    > key word Trial (which starts the data I want).


    Hmm, what about something like:

    in_section = false

    csv.each do |row|
    if row.first == "Trial"
    in_section = true
    next
    elsif in_section
    # process row here...
    end
    end

    Hope that gets you going.

    James Edward Gray II
    James Edward Gray II, Nov 21, 2005
    #4
  5. On 11/20/05, anne001 <> wrote:
    > What is the best way to get the excel CVS data into a Ruby 2D array for
    > further analysis?


    This might be overkill and more useful when it is more complete, but
    Ruby Reports has basic support for stuff like this, via DataSets.
    ---
    require "rubygems"
    require "ruportlib"

    my_data =3D Report::DataSet.load("foo.csv")
    ---
    this will create a DataSet filled with DataRows which can be indexed
    ordinally or by field name. (The first row of the CSV is used to
    define field names)

    example:
    ---
    my_data.each do |row|
    puts row["name"]
    end
    ---
    would print the value of the "name" column if it existed.

    both DataSet and DataRow are Enumerable, which might make finding your
    data easier.

    if you want to try this approach,

    sudo gem install ruport

    should get you the latest version

    The rather sparse documentation is housed at:
    http://ruport.rubyforge.org/docs/

    You'd want to look at (and inspect the source on) Report::DataSet and
    Report::DataRow

    HTH,
    Greg
    Gregory Brown, Nov 21, 2005
    #5
  6. On 11/21/05, Gregory Brown <> wrote:
    > On 11/20/05, anne001 <> wrote:
    > > What is the best way to get the excel CVS data into a Ruby 2D array for
    > > further analysis?

    >
    > This might be overkill and more useful when it is more complete, but
    > Ruby Reports has basic support for stuff like this, via DataSets.


    As a side not, you do not need to install DBI or any database related
    dependencies to use this set of features.
    Gregory Brown, Nov 21, 2005
    #6
  7. anne001

    anne001 Guest

    Thank you so much for your help
    The data was collected by another lab. I don't know what they use. I
    saved their excel file data into a csv file. I don't know if there is a
    way of telling excel to do a better job. I am on panther, with
    office 2004 for mac.

    require "csv"
    arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

    I get an error:
    /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
    CSV::IllegalFormatError (CSV::IllegalFormatError)

    Ara's formulation does the same thing.

    I think there is a way to replace special characters in emacs but what
    I tried so far does not work.

    I thought I could use readlines and a puts and then copy the ruby
    output, which would not longer have the ^M, but somehow, ruby gets
    mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
    readline either. It sounds like I need to figure out how to format the
    input file better. And then I will be able to use your suggestions.

    thank you
    anne001, Nov 21, 2005
    #7
  8. On Nov 21, 2005, at 6:47 AM, anne001 wrote:

    > Thank you so much for your help
    > The data was collected by another lab. I don't know what they use. I
    > saved their excel file data into a csv file. I don't know if there
    > is a
    > way of telling excel to do a better job. I am on panther, with
    > office 2004 for mac.


    Excel might be legacy enough that it's still using \r as a return
    here, for the old Mac OS. Try this:

    ruby -p -i.bak -e 'sub("\r", "\n")' your_csv_file_here.csv

    > require "csv"
    > arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))
    >
    > I get an error:
    > /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
    > CSV::IllegalFormatError (CSV::IllegalFormatError)
    >
    > Ara's formulation does the same thing.


    I think this is a line ending issue. The CSV format does not allow a
    naked \r outside of a quoted field. Try my fix above and see if it
    gets you going, or you can switch the line ending for CSV's readlines():

    require "csv"
    arr = CSV.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"), "\r")

    If you're still having trouble and it's not a privacy concern, you
    are welcome to email me the CSV file off list, and I will try to
    figure out how to read it.

    James Edward Gray II
    James Edward Gray II, Nov 21, 2005
    #8
  9. anne001

    Gene Tani Guest

    anne001 wrote:
    > Thank you so much for your help
    > The data was collected by another lab. I don't know what they use. I
    > saved their excel file data into a csv file. I don't know if there is a
    > way of telling excel to do a better job. I am on panther, with
    > office 2004 for mac.
    >
    > require "csv"
    > arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))
    >
    > I get an error:
    > /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
    > CSV::IllegalFormatError (CSV::IllegalFormatError)
    >
    > Ara's formulation does the same thing.
    >
    > I think there is a way to replace special characters in emacs but what
    > I tried so far does not work.
    >
    > I thought I could use readlines and a puts and then copy the ruby
    > output, which would not longer have the ^M, but somehow, ruby gets
    > mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
    > readline either. It sounds like I need to figure out how to format the
    > input file better. And then I will be able to use your suggestions.
    >
    > thank you


    There's a few ways to handle converting newlines between O/S:

    - require 'ptools'; File#nl_convert here:
    http://ruby-miscutils.sourceforge.net/ptools.html

    - specify $/ input record separator and $\ output separators for
    IO#gets, readline, each_line etc.

    - Komodo has a Code / Clean Line Endings which will convert to the O/S
    format you're
    running it on

    but there's no automatic conversion like python
    open("filename.ext","U"), I think
    Gene Tani, Nov 21, 2005
    #9
  10. anne001

    Ara.T.Howard Guest

    On Mon, 21 Nov 2005, anne001 wrote:

    > Thank you so much for your help
    > The data was collected by another lab. I don't know what they use. I
    > saved their excel file data into a csv file. I don't know if there is a
    > way of telling excel to do a better job. I am on panther, with
    > office 2004 for mac.
    >
    > require "csv"
    > arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))
    >
    > I get an error:
    > /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
    > CSV::IllegalFormatError (CSV::IllegalFormatError)
    >
    > Ara's formulation does the same thing.
    >
    > I think there is a way to replace special characters in emacs but what
    > I tried so far does not work.
    >
    > I thought I could use readlines and a puts and then copy the ruby
    > output, which would not longer have the ^M, but somehow, ruby gets
    > mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
    > readline either. It sounds like I need to figure out how to format the
    > input file better. And then I will be able to use your suggestions.


    run dos2unix on it. if it still fails you'll need see why the file is badly
    formatted.

    regards.


    -a
    --
    ===============================================================================
    | ara [dot] t [dot] howard [at] gmail [dot] com
    | all happiness comes from the desire for others to be happy. all misery
    | comes from the desire for oneself to be happy.
    | -- bodhicaryavatara
    ===============================================================================
    Ara.T.Howard, Nov 21, 2005
    #10
  11. anne001

    anne001 Guest

    I have moved onto Tiger. Still much installing to do. Sorry in the
    delay in processing your responses.

    The line ending is extremely simple,
    http://www.delorie.com/gnu/docs/emacs/emacs_220.html

    ....-unix newline to separate lines.
    ....-dos carriage-return + linefeed to separate lines
    ....-mac carriage-return to separate lines

    But the editors show what they want. Text wrangler lets you replace \r
    with \n
    but when you look, it still looks as a \r. Did it make the
    substitution?
    Emacs has a command to change the file to unix. But the ^M are not
    changed. Did something change? hard to say. So I gave up on that.

    I like the command, thank you
    ruby -p -i.bak -e 'sub("\r", "\n")' your_csv_file_here.csv

    It seems to work, it creates your_cvs_file_here.csv.bak and csv read no
    longer gives an error, but... It seems to parse each cell, with realms
    of NIL in between bits of text.
    Conceptually I think it is easier to process line by line. But I will
    look into CVS to see what it can do.

    In any case, I can combine the ruby line command with the readlines
    ruby function. So I should be OK.
    anne001, Nov 23, 2005
    #11
  12. I've had a similar problem. It sounds like you used Excel somewhere.
    My experience has been that Excel for the mac likes to save CSV files
    with MacOS 9 line endings which Ruby doesn't like. In the future, you
    can choose 'Windows CSV' or something to that extent. It's really
    silly, but it works.

    Ruby's CSV library will you give you back arrays if you open the file
    with the CSV module.
    I did notice that your file had this in it:
    "Subject ID: 1013938829432171e868c340.
    Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
    response,,,,,

    with the " not closing anywhere. You might want to look at that, Excel
    might be generating invalid CSV.

    Anyways, goodluck.
    .adam
    Adam Sanderson, Nov 24, 2005
    #12
  13. anne001 <> wrote:
    >
    > Question 2:
    > -----------------
    > Once I have a 2D array, what is the best way to find the index of the
    > key word Trial (which starts the data I want).
    >
    > In matlab, I would do something like,
    > Find the index pair (I,J) at which Array = Trial
    > If J==1
    > Extract the array starting at row I+1 to I+82, for columns [2, 3, and
    > 5]
    > and I would love some pointers at Ruby structures, methods, etc I need
    > to look at.


    This should get you started:

    #------------------------------------------------

    ary = [
    ["this", "is", "some", "code"],
    ["test", "please", "ignore"],
    ["Trial", "section", "begins", "here"],
    ["foo", "bar", "baz", "quux"]
    ]

    index = nil
    searchterm = "Trial"
    catch:)done) do
    ary.each_with_index {|row, i|
    row.each_with_index {|cell, j|
    puts "testing [#{i},#{j}]"
    if cell =~ /^#{searchterm}/
    index = [i,j]
    throw :done
    end
    }
    }
    end

    puts "-----------------"

    puts "found #{searchterm} at #{index.inspect}"

    #------------------------------------------------

    martin
    Martin DeMello, Nov 28, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ddog
    Replies:
    3
    Views:
    587
    Jason Whaley
    Aug 4, 2007
  2. Replies:
    1
    Views:
    593
    GArlington
    Aug 31, 2007
  3. matt
    Replies:
    1
    Views:
    255
    George Ogata
    Aug 6, 2004
  4. David Ross
    Replies:
    5
    Views:
    153
    Nicholas Van Weerdenburg
    Dec 5, 2004
  5. Geeta Sachdeva
    Replies:
    4
    Views:
    133
    bbiker
    Oct 16, 2007
Loading...

Share This Page