Parsing CSV

Discussion in 'Ruby' started by Rafael George, Feb 26, 2007.

  1. Hi guys, im a newbie in Ruby i have to parse two CSV files to compare
    2 columns of the given files. My problem is that i tried a lot of
    different methods to handle this, i tried to put the entire column in
    an array and the other one two then test for the bigger array to make
    a loop thought it and compare both files like that. It did not work, i
    was thinking in using CSV but its limited and then i came a cross with
    fasterCSV which is the module than im stuck right now, if somebody can
    make a suggestion i really appreciate it.

    Thanks in advance.

    PS: I was told to make this tool in Java but, AFAIK Ruby is better for
    handling file text.

    --
    Grimoire Guru
    SourceMage GNU/Linux
    Rafael George, Feb 26, 2007
    #1
    1. Advertising

  2. On Mon, Feb 26, 2007 at 10:50:22PM +0900, Rafael George wrote:
    > Hi guys, im a newbie in Ruby i have to parse two CSV files to compare
    > 2 columns of the given files. My problem is that i tried a lot of
    > different methods to handle this, i tried to put the entire column in
    > an array and the other one two then test for the bigger array to make
    > a loop thought it and compare both files like that. It did not work


    Well, posting your code might allow someone to help you spot what's wrong.

    I'd suggest first you check that the two arrays are being read in properly -
    if they are called a1 and a2, then "puts a1.inspect" and "puts a2.inspect"
    will print them to the screen. Then you know whether the problem is in
    reading them, or in comparing them.

    Posting a more precise description of what you're trying to do, along with
    some sample data and what output you expect, would also make it easier for
    someone to help you.

    > PS: I was told to make this tool in Java but, AFAIK Ruby is better for
    > handling file text.


    The better language is the one which you can actually use to get the job
    done :)

    How you do this in Ruby depends on what exactly you mean by 'compare', since
    you didn't define exactly what you're trying to do. I'm guessing you mean
    check for values which are in the first file but not in the second, or vice
    versa. For a simple solution, have a look at Array#include?

    For a more efficient solution, you could first sort the two arrays and then
    walk down them with two pointers i and j. When a1 == a2[j] then you
    increment both i and j. When a1 < a2[j] then you know an item is missing
    in a2, and just increment i. When a1 > a2[j] then you know an item is
    missing in a1, and just increment j.

    Incidentally, you don't even need Ruby to do this; then shell command 'join'
    can do this for you (as long as you use 'sort' to pre-sort your input)

    HTH,

    Brian.
    Brian Candler, Feb 26, 2007
    #2
    1. Advertising

  3. This code might get you started:

    require 'FasterCSV'

    def read_csv(filename)
    return FasterCSV::Table.new( FasterCSV.read(filename) ).by_col
    end

    data1 = read_csv("data1.csv")
    data2 = read_csv("data2.csv")

    compare_column_idx = 1
    unless data1[compare_column_idx] == data2[compare_column_idx]
    puts "column #{compare_column_idx} is different"
    end

    Regards,
    Stephane

    --
    Posted via http://www.ruby-forum.com/.
    Stephane Elie, Feb 26, 2007
    #3
  4. passvalues = []
    i = 0
    IO.foreach(fsource) do |line|
    cols = []
    cols=CSV::parse_line line.chomp
    sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
    IO.foreach(tdest) do |line|
    tcols = []
    tcols=CSV::parse_line line.chomp
    testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
    if sourceval == testval
    passvalues = sourceval
    i += 1
    end
    end
    end

    Here is what i got

    On 2/26/07, Stephane Elie <> wrote:
    > This code might get you started:
    >
    > require 'FasterCSV'
    >
    > def read_csv(filename)
    > return FasterCSV::Table.new( FasterCSV.read(filename) ).by_col
    > end
    >
    > data1 = read_csv("data1.csv")
    > data2 = read_csv("data2.csv")
    >
    > compare_column_idx = 1
    > unless data1[compare_column_idx] == data2[compare_column_idx]
    > puts "column #{compare_column_idx} is different"
    > end
    >
    > Regards,
    > Stephane
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >



    --
    Grimoire Guru
    SourceMage GNU/Linux
    Rafael George, Feb 26, 2007
    #4
  5. On Feb 26, 2007, at 8:45 AM, Rafael George wrote:

    > passvalues = []
    > i = 0
    > IO.foreach(fsource) do |line|
    > cols = []
    > cols=CSV::parse_line line.chomp
    > sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
    > IO.foreach(tdest) do |line|
    > tcols = []
    > tcols=CSV::parse_line line.chomp
    > testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
    > if sourceval == testval
    > passvalues = sourceval
    > i += 1
    > end
    > end
    > end


    The direct translation of this code to FasterCSV is:

    passvalues = Array.new
    FCSV.foreach(fsource) |s_row|
    source = s_row[scomp_args[0]..scomp_args[1]].join(" ")
    FCSV.foreach(tdest) |t_row|
    if source == t_row[scomp_args[0]..scomp_args[1]].join(" ")
    passvalues << source
    end
    end
    end

    If you can afford to read one of the files into memory because it's
    not too large, you can probably speed that up quite a bit:

    require "set"

    allowed = Set.new
    FCSV.foreach(tdest) do |row|
    allowed.add(row[scomp_args[0]..scomp_args[1]].join(" "))
    end

    passvalues = FCSV.open(fsource) do |source|
    source.select do |row|
    allowed.include? row[scomp_args[0]..scomp_args[1]].join(" ")
    end
    end

    Hope that gives you some fresh ideas.

    James Edward Gray II
    James Edward Gray II, Feb 26, 2007
    #5
  6. On Feb 26, 2007, at 11:48 AM, James Edward Gray II wrote:

    > If you can afford to read one of the files into memory because it's
    > not too large, you can probably speed that up quite a bit:
    >
    > require "set"
    >
    > allowed = Set.new
    > FCSV.foreach(tdest) do |row|
    > allowed.add(row[scomp_args[0]..scomp_args[1]].join(" "))
    > end
    >
    > passvalues = FCSV.open(fsource) do |source|
    > source.select do |row|
    > allowed.include? row[scomp_args[0]..scomp_args[1]].join(" ")
    > end
    > end


    The above destroys the field order. If you need to keep the order,
    use an Array instead:

    allowed = Array.new
    FCSV.foreach(dtest) do |row|
    allowed << row[scomp_args[0]..scomp_args[1]].join(" ")
    end

    # ...

    James Edward Gray II
    James Edward Gray II, Feb 26, 2007
    #6
  7. On Feb 26, 2007, at 12:54 PM, James Edward Gray II wrote:

    > On Feb 26, 2007, at 11:48 AM, James Edward Gray II wrote:
    >
    >> If you can afford to read one of the files into memory because
    >> it's not too large, you can probably speed that up quite a bit:
    >>
    >> require "set"
    >>
    >> allowed = Set.new
    >> FCSV.foreach(tdest) do |row|
    >> allowed.add(row[scomp_args[0]..scomp_args[1]].join(" "))
    >> end
    >>
    >> passvalues = FCSV.open(fsource) do |source|
    >> source.select do |row|
    >> allowed.include? row[scomp_args[0]..scomp_args[1]].join(" ")
    >> end
    >> end

    >
    > The above destroys the field order.


    Sorry, I meant row order.

    James Edward Gray II
    James Edward Gray II, Feb 26, 2007
    #7
  8. On Feb 26, 2007, at 11:48 AM, James Edward Gray II wrote:

    > On Feb 26, 2007, at 8:45 AM, Rafael George wrote:
    >
    >> passvalues = []
    >> i = 0
    >> IO.foreach(fsource) do |line|
    >> cols = []
    >> cols=CSV::parse_line line.chomp
    >> sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
    >> IO.foreach(tdest) do |line|
    >> tcols = []
    >> tcols=CSV::parse_line line.chomp
    >> testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
    >> if sourceval == testval
    >> passvalues = sourceval
    >> i += 1
    >> end
    >> end
    >> end

    >
    > The direct translation of this code to FasterCSV is:
    >
    > passvalues = Array.new
    > FCSV.foreach(fsource) |s_row|
    > source = s_row[scomp_args[0]..scomp_args[1]].join(" ")
    > FCSV.foreach(tdest) |t_row|
    > if source == t_row[scomp_args[0]..scomp_args[1]].join(" ")
    > passvalues << source


    break # performance enhancement

    > end
    > end
    > end


    James Edward Gray II
    James Edward Gray II, Feb 26, 2007
    #8
  9. Thanks, James and the other guys i think i found the solution for my problem :)

    On 2/26/07, James Edward Gray II <> wrote:
    > On Feb 26, 2007, at 11:48 AM, James Edward Gray II wrote:
    >
    > > On Feb 26, 2007, at 8:45 AM, Rafael George wrote:
    > >
    > >> passvalues = []
    > >> i = 0
    > >> IO.foreach(fsource) do |line|
    > >> cols = []
    > >> cols=CSV::parse_line line.chomp
    > >> sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
    > >> IO.foreach(tdest) do |line|
    > >> tcols = []
    > >> tcols=CSV::parse_line line.chomp
    > >> testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
    > >> if sourceval == testval
    > >> passvalues = sourceval
    > >> i += 1
    > >> end
    > >> end
    > >> end

    > >
    > > The direct translation of this code to FasterCSV is:
    > >
    > > passvalues = Array.new
    > > FCSV.foreach(fsource) |s_row|
    > > source = s_row[scomp_args[0]..scomp_args[1]].join(" ")
    > > FCSV.foreach(tdest) |t_row|
    > > if source == t_row[scomp_args[0]..scomp_args[1]].join(" ")
    > > passvalues << source

    >
    > break # performance enhancement
    >
    > > end
    > > end
    > > end

    >
    > James Edward Gray II
    >
    >



    --
    Grimoire Guru
    SourceMage GNU/Linux
    Rafael George, Feb 26, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    873
    GIMME
    Feb 11, 2004
  2. Michal Mikolajczyk
    Replies:
    0
    Views:
    643
    Michal Mikolajczyk
    Feb 13, 2004
  3. Skip Montanaro
    Replies:
    0
    Views:
    708
    Skip Montanaro
    Feb 13, 2004
  4. Tintin92
    Replies:
    1
    Views:
    1,694
    Andrew Thompson
    Feb 14, 2007
  5. jliu66
    Replies:
    0
    Views:
    505
    jliu66
    Oct 19, 2007
Loading...

Share This Page