1.9 CSV Parsing Issues

K

Kenny Lam

I'm currently porting a script to 1.9 and I'm having problems getting
CSV parsing to work. This script worked fine in 1.8.7 and used the
FasterCSV library for parsing. After playing around in the IRB, I have
determined that the current parser seems incapable of handling newlines
as row seperators (a rather basic and important feature).

I tested with a simple file whose contents are:
field1,field2
field3,field4

This file was created using a basic text editor and does not contain any
unorthodox newline characters. Attempting to parse this file results in
the following error:

C:/Ruby192/lib/ruby/1.9.1/csv.rb:1885:in `block (2 levels) in shift':
Unquoted fields do not allow \r or \n (line 1). (CSV::MalformedCSVError)
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1856:in `each'
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1856:in `block in shift'
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1818:in `loop'
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1818:in `shift'
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1760:in `each'

The return value of the opened csv file shows row_sep to be "\r\n" which
seems correct. I have tried manually setting the value of row_sep when
calling CSV::eek:pen but I get the same issue.

Once again, I do not have this problem with FasterCSV under 1.8.7 (which
as I understand, is the same code used in 1.9's csv library). I'm using
Ruby 1.9.2p0 on Windows XP. I would greatly appreciate any help.
 
J

James Edward Gray II

I'm currently porting a script to 1.9 and I'm having problems getting
CSV parsing to work.
I tested with a simple file whose contents are:
field1,field2
field3,field4

CSV should definitely handle that data. Indeed it does for me:

$ ruby -v -r csv -e 'p CSV.parse("field1,field2\r\nfield3,field4\r\n")'
ruby 1.9.2dev (2010-04-28 trunk 27536) [x86_64-darwin10.3.0]
[["field1", "field2"], ["field3", "field4"]]
This file was created using a basic text editor and does not contain any
unorthodox newline characters.

Can we see exactly what the file does contain, with code like:

$ ruby -e 'p File.read("path/to/file.csv")'

?

James Edward Gray II
 
K

Kenny Lam

File.read shows "field1,field2\nfield3,field4\n"
I have played around with the some of the other methods and have
determined that this problem only seems to occur when using CSV::eek:pen
and then looped through with CSV::each. CSV::foreach and CSV::parse
seem fine. Unfortunately, I need to use CSV::eek:pen because I need a
reference to the opened file object in order to do some file cursor
manipulation.

Other things I have noted is that when running CSV.open('file','r') the
result is show:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\r\n" quote_char:"\"">

While CSV.open('test.log','r',:row_sep => '\r\n') shows result:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\\r\\n" quote_char:"\"">

The double backslashes make me question if the escape character is being
processed correctly. I am relatively new to Ruby, am I using the
language incorrectly or is this a bug?
 
J

James Edward Gray II

File.read shows "field1,field2\nfield3,field4\n"

Great. That's what we expected to see. You are right about the =
content.
I have played around with the some of the other methods and have
determined that this problem only seems to occur when using CSV::eek:pen
and then looped through with CSV::each. CSV::foreach and CSV::parse
seem fine.

Ah, and let me guess, you always pass a read mode of 'r' to open(), =
right? CSV is clever and it shuts off Ruby's line ending translation on =
Windows using 'rb' if you don't specify a mode. By specify a mode, you =
leave this feature on which allows Ruby to switch \r\n to \n as it did =
with the read above.
Unfortunately, I need to use CSV::eek:pen because I need a
reference to the opened file object in order to do some file cursor
manipulation.

No worries, open() is going to work for you.
Other things I have noted is that when running CSV.open('file','r') = the
result is show:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\r\n" quote_char:"\"">
=20
While CSV.open('test.log','r',:row_sep =3D> '\r\n') shows result:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0=20
col_sep:"," row_sep:"\\r\\n" quote_char:"\"">
=20
The double backslashes make me question if the escape character is = being
processed correctly. I am relatively new to Ruby, am I using the
language incorrectly or is this a bug?

You have a misunderstanding of Ruby Strings. Double quotes allow for =
escapes like \r or \n, but single quotes do not. You've set the =
:row_sep to literally slash, r, slash, and n.

I image all you need to do is switch your open() call to:

CSV.open('path/to/file')

The library should take it from there.

Hope that helps.

James Edward Gray II=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top