Parsing a comma-separated file

Discussion in 'Ruby' started by Justin To, Jun 9, 2008.

  1. Justin To

    Justin To Guest

    Hi, I had a question about parsing just one line at a time beforehand
    and now I'm working on a program to parse multiple items on each
    line-something like the following:

    name, age, gender
    Bob, 32, M
    Stacy, 14, F
    ...
    ...

    How do I parse 'Bob', knowing it's the first element on the line, '32'
    is the second, 'M' is the last...I've been reading about regular
    expressions. Is this the best way to solve this problem? And how exactly
    do you use them?

    Thanks!!
    --
    Posted via http://www.ruby-forum.com/.
     
    Justin To, Jun 9, 2008
    #1
    1. Advertising

  2. Justin To

    ThoML Guest

    ThoML, Jun 9, 2008
    #2
    1. Advertising

  3. Justin To

    Justin To Guest

    ThoML wrote:
    > Are you looking for this?
    > http://fastercsv.rubyforge.org/
    >
    > Ruby also has the csv standard library.
    >
    > Regards,
    > Thomas.


    That is great Thomas! Although, I'd like to know how to do it with the
    regular expressions as well.

    Thanks!
    --
    Posted via http://www.ruby-forum.com/.
     
    Justin To, Jun 9, 2008
    #3
  4. Justin To

    Avdi Grimm Guest

    On Mon, Jun 9, 2008 at 12:46 PM, Justin To <> wrote:
    > That is great Thomas! Although, I'd like to know how to do it with the
    > regular expressions as well.


    I'd recommend using Sring#split. In the simplest case you could just
    specify line.split(','); no regular expressions needed. If you wanted
    you could use a regular expression argument to #split in order to skip
    whitespace:

    line.split(/\s*,\s*/)

    but you could just as easily trim the values after the fact too:

    line.split(',').map{|v| v.strip}


    Regular expressions are not the best solution for parsing CSV,
    especially once you start dealing with quoted values.

    --
    Avdi

    Home: http://avdi.org
    Developer Blog: http://avdi.org/devblog/
    Twitter: http://twitter.com/avdi
    Journal: http://avdi.livejournal.com
     
    Avdi Grimm, Jun 9, 2008
    #4
  5. Justin To

    Justin To Guest

    Justin To, Jun 9, 2008
    #5
  6. Justin To

    Avdi Grimm Guest

    Avdi Grimm, Jun 9, 2008
    #6
  7. My experience (at least a year ago) was that fastercsv was a great way
    to go if you had very clean files without errors, odd characters,
    etc. Unfortunately, I had files that were a bit more problematic and
    so I ended up using a combination of either parsing it myself (split,
    regexs. etc) and catching all the errors and handling them or using
    the parse_line method in the standard csv library.
    On Jun 9, 2008, at 2:09 PM, Avdi Grimm wrote:

    > On Mon, Jun 9, 2008 at 2:08 PM, Justin To <> wrote:
    >> So is the fasterCSV the most effective way of parsing a comma-
    >> separated
    >> file?

    >
    > It is the fastest and most robust way.
    >
    > --
    > Avdi
    >
    > Home: http://avdi.org
    > Developer Blog: http://avdi.org/devblog/
    > Twitter: http://twitter.com/avdi
    > Journal: http://avdi.livejournal.com
    >
     
    Charles Walden, Jun 9, 2008
    #7
  8. Justin To

    Justin To Guest

    Justin To, Jun 9, 2008
    #8
  9. Justin To

    Greg Willits Guest

    > name, age, gender
    > Bob, 32, M
    > Stacy, 14, F
    > ...
    > How do I parse 'Bob', knowing it's the first element on the line, '32'
    > is the second, 'M' is the last...I've been reading about regular
    > expressions. Is this the best way to solve this problem? And how exactly
    > do you use them?


    This doesn't handle all CSV specs, but if you know you have pure data
    like you show above, these are the rudimentary steps without the
    one-liner tricks, so it should be pretty straight forward to understand
    each step. Arranging them as methods to a class would be good.


    # read the file into a var

    if FileTest::exist?(file_name)
    file_lines = IO.readlines(file_name)
    end

    # normalize line endings so it doesn't matter what they are

    file_lines.strip!
    file_lines.gsub!(/\r\n/,'\n')
    file_lines.gsub!(/\r/,'\n')

    # normalize comma delimiters so it doesn't matter
    # if you have one, two or one,two or one , two etc...

    file_lines.gsub!(/\s*,\s*/, ',')

    # split lines into a single array of lines

    lines_array = file_lines.split('\n')

    # split each line into an array

    final_data = []

    lines_array.each do |this_line|
    final_data << this_line.split(',')
    end

    # final_data is now an array of arrays that looks like this:

    [
    ['name', 'age', 'gender'],
    ['Bob', '32', 'M'],
    ['Stacy', '14', 'F']
    ]

    So, to get Bob, you'd have to know his line number, and index into the
    record array:

    final_data[1][0] # Bob
    final_data[2][3] # F


    -- greg willits

    --
    Posted via http://www.ruby-forum.com/.
     
    Greg Willits, Jun 9, 2008
    #9
  10. Justin To

    James Gray Guest

    On Jun 9, 2008, at 4:52 PM, Charles Walden wrote:

    > My experience (at least a year ago) was that fastercsv was a great
    > way to go if you had very clean files without errors, odd
    > characters, etc. Unfortunately, I had files that were a bit more
    > problematic and so I ended up using a combination of either parsing
    > it myself (split, regexs. etc) and catching all the errors and
    > handling them or using the parse_line method in the standard csv
    > library.


    FasterCSV has a parse_line() method as well, just FYI.

    James Edward Gray II
     
    James Gray, Jun 10, 2008
    #10
  11. Justin To

    Justin To Guest


    >
    > final_data[1][0] # Bob
    > final_data[2][3] # F
    >


    should the last one be:
    final_data[2][2] # F
    ??

    Thanks! Also, is this an effective way to parse a large file. What if I
    had to read a million lines with multiple columns? Would this solution
    still be practical?

    Thanks again!

    --
    Posted via http://www.ruby-forum.com/.
     
    Justin To, Jun 10, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Miles
    Replies:
    1
    Views:
    614
    J├╝rgen Exner
    May 28, 2004
  2. RogerTBrick
    Replies:
    3
    Views:
    4,211
    RogerTBrick
    Mar 7, 2005
  3. ronan_40060

    Parsing a Comma Separated String in C

    ronan_40060, Sep 6, 2006, in forum: C Programming
    Replies:
    1
    Views:
    2,332
    Rudresh R kaddipudi
    Dec 22, 2006
  4. Roundy
    Replies:
    6
    Views:
    212
    teknohippy
    Feb 4, 2005
  5. hu8
    Replies:
    5
    Views:
    266
    Stefan Lang
    Jan 31, 2005
Loading...

Share This Page