using FasterCSV to clean CSV file

J

John Mcleod

Hello all,
I was wondering, can FasterCSV be used to clean a csv file of unwanted
characters, endline spaces, tab spaces and other characters?
I have a csv file 6,000 lines long and it comes from another department.
I have been working to read this file into a database via FasterCSV.
Went I start to read the file in, I get the dreaded
"FasterCSV::MalformedCSVError" error message. I've worked with this in
the past and the solution was to "clean" the data. EOL spaces and other
undesired characters were scattered within the file.

Is it possible to use FasterCSV for this task?

JohnM
 
J

James Edward Gray II

I was wondering, can FasterCSV be used to clean a csv file of unwanted
characters, endline spaces, tab spaces and other characters?

I believe the answer to this question is yes. I've already shown you =
how to modify headers as they are read. There's a similar system for =
normal fields.

Now, this does not modify the file on disk, obviously. If you want to =
do that, it would be best to write out a new and corrected file. You =
could then rename the new file to replace the old, if needed.
I have a csv file 6,000 lines long and it comes from another = department.
I have been working to read this file into a database via FasterCSV.
Went I start to read the file in, I get the dreaded
"FasterCSV::MalformedCSVError" error message. I've worked with this = in
the past and the solution was to "clean" the data. EOL spaces and = other
undesired characters were scattered within the file.

FasterCSV does need valid CSV data, yes. If you have non-valid CSV =
data, it will need to be cleaned before FasterCSV can read it.
Is it possible to use FasterCSV for this task?

If your data is valid CSV, FasterCSV can read and transform it. If your =
data is not valid CSV, you will need to fix it before FasterCSV can read =
it.

Hope that helps.

James Edward Gray II=
 
J

John Mcleod

Thanks for the reply James.

I have read and imported into DB simple CSV data in the past and never
had problems like this file.
FasterCSV does need valid CSV data, yes. If you have non-valid CSV
data, it will need to be cleaned before FasterCSV can read it.

The question of validity is confusing. When a file is saved via
MS-Excel isn't it valid?
I understand that there are several different formats (MS-DOS, Windows,
MAC) when considering saving a CSV. Which one should be chosen?

John
 
M

Marnen Laibow-Koser

John said:
Thanks for the reply James.

I have read and imported into DB simple CSV data in the past and never
had problems like this file.


The question of validity is confusing. When a file is saved via
MS-Excel isn't it valid?

I wouldn't trust any Microsoft program to produce valid data in any
interchange format.
I understand that there are several different formats (MS-DOS, Windows,
MAC) when considering saving a CSV. Which one should be chosen?

Those have to do with text encodings and line breaks. They're probably
beside the point. (And you mean "Mac", not "MAC" [= Media Access
Card].)

Best,
 
J

James Edward Gray II

=20
The question of validity is confusing. When a file is saved via=20
MS-Excel isn't it valid?

I expect assume FasterCSV would read the files Excel saves, yes. (Hope =
I don't end up regretting that.)
I understand that there are several different formats (MS-DOS, = Windows,=20
MAC) when considering saving a CSV. Which one should be chosen?

I doubt it matters. The only different should be line endings, which =
FasterCSV is going to try and guess anyway.

I'm betting your problem is one of encodings. Do you have some =
non-ASCII data in the spreadsheet? If that data is not UTF-8, it could =
be tripping up FasterCSV's parser.

James Edward Gray II=
 
J

John Mcleod

Well,
As you suggested below, it was a problem with encoding.
I'm betting your problem is one of encodings. Do you have some
non-ASCII data in the spreadsheet? If that data is not UTF-8, it could
be tripping up FasterCSV's parser.

What I did was I took the file and opened it in 'TextEdit' a Mac version
of MS-Notepad. Saved the file in UTF-8 and everything took off.
I'm going to try the same task in MS-Notepad (when I fire up the Windows
VM).

Thank you again for all your support.

John
 
M

Marnen Laibow-Koser

John said:
Well,
As you suggested below, it was a problem with encoding.


What I did was I took the file and opened it in 'TextEdit' a Mac version
of MS-Notepad.

TextEdit isn't a Mac version of Notepad. It's a text editor that comes
with the OS, but the similarities end there.
Saved the file in UTF-8 and everything took off.
I'm going to try the same task in MS-Notepad (when I fire up the Windows
VM).

It may not work. Notepad is ridiculously underpowered compared to
TextEdit.
Thank you again for all your support.

John

Best,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top