Changing seperator in large CSV files?

D

dumbledad

Hi All,

I have a series of large CSV files (the largest is several GB) that I
need to load into SQL. To perform the load I'm hoping to use the SQL
command BULK INSERT. But I have a problem, I have lines in the CSV file
like the second one here:

12,some text,34
56,"some text, with a comma in",78

The BULK INSERT command fails to understand the significance of the
quotation marks and treats the second comma in the second line as a
delimiter.

Thus I would like to replace each occurrence of a comma in the files,
when and only when used as a separator, with a different separator ("/"
for example). Does anyone have a Perl script they can share with me
that will do this, i.e. that will find and replace commas which are not
enclosed in text within quote marks?

Cheers,

Tim.
 
U

usenet

Thus I would like to replace each occurrence of a comma in the files...

You might benefit from a recent thread in this group:

http://tinyurl.com/9ut7w

The thread discussed an aspect of Damian Conway's (outstanding) book,
"Perl Best Practices," but it just happens to also address your
question.
 
X

xhoster

Hi All,

I have a series of large CSV files (the largest is several GB) that I
need to load into SQL.

SQL is a language. You cannot insert data into a language. What is the
SQL-processing database server you are trying to use?

To perform the load I'm hoping to use the SQL
command BULK INSERT. But I have a problem, I have lines in the CSV file
like the second one here:

12,some text,34
56,"some text, with a comma in",78

The BULK INSERT command fails to understand the significance of the
quotation marks and treats the second comma in the second line as a
delimiter.

I don't know "BULK INSERT", but I do know Oracles sqlldr and MySQL's
"LOAD DATA" or mysqlimport. Both of them allow you specify a "Fields
enclosed by" character, such they can take your large data files just as
they are. I would be very surprised if "BULK INSERT" is not similarly
configurable.
Thus I would like to replace each occurrence of a comma in the files,
when and only when used as a separator, with a different separator ("/"
for example). Does anyone have a Perl script they can share with me
that will do this, i.e. that will find and replace commas which are not
enclosed in text within quote marks?

Text::CSV_XS

Xho
 
T

Tad McClellan

i.e. that will find and replace commas which are not
enclosed in text within quote marks?


Your Question is Asked Frequently, though it's not as easy to
find as it ought to be:

How can I split a [character] delimited string except when inside
[character]?
 
D

dumbledad

Hi All,

Thanks all for your help, that certainly gives me scripts to try, a
library to pick over, new search terms to try, and I've ordered the
recommended book by Conway.

Xho asked:
I'm using Microsoft's SQL Server 2005. It has a tool for doing this
called SSIS, but because of problems with my installation I wanted to
get back to basics and do it in SQL using BULK INSERT. Unfortunately it
looks like the tantalisingly useful "fields enclosed by" character that
Xho refers to is not configurable:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ba-bz_4fec.asp
( http://tinyurl.com/55ysl )

Cheers,

Tim.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top