Changing seperator in large CSV files?

Discussion in 'Perl Misc' started by dumbledad@gmail.com, Jan 18, 2006.

  1. Guest

    Hi All,

    I have a series of large CSV files (the largest is several GB) that I
    need to load into SQL. To perform the load I'm hoping to use the SQL
    command BULK INSERT. But I have a problem, I have lines in the CSV file
    like the second one here:

    12,some text,34
    56,"some text, with a comma in",78

    The BULK INSERT command fails to understand the significance of the
    quotation marks and treats the second comma in the second line as a
    delimiter.

    Thus I would like to replace each occurrence of a comma in the files,
    when and only when used as a separator, with a different separator ("/"
    for example). Does anyone have a Perl script they can share with me
    that will do this, i.e. that will find and replace commas which are not
    enclosed in text within quote marks?

    Cheers,

    Tim.
    , Jan 18, 2006
    #1
    1. Advertising

  2. Guest

    wrote:
    > Thus I would like to replace each occurrence of a comma in the files...


    You might benefit from a recent thread in this group:

    http://tinyurl.com/9ut7w

    The thread discussed an aspect of Damian Conway's (outstanding) book,
    "Perl Best Practices," but it just happens to also address your
    question.
    , Jan 18, 2006
    #2
    1. Advertising

  3. Dr.Ruud Guest

    schreef:

    > The BULK INSERT command fails to understand the significance of the
    > quotation marks and treats the second comma in the second line as a
    > delimiter.


    Which SQL-environment?

    http://dev.mysql.com/doc/refman/5.0/en/load-data.html

    I prefer Tab as the field delimiter.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Jan 18, 2006
    #3
  4. Guest

    "" <> wrote:
    > Hi All,
    >
    > I have a series of large CSV files (the largest is several GB) that I
    > need to load into SQL.


    SQL is a language. You cannot insert data into a language. What is the
    SQL-processing database server you are trying to use?


    > To perform the load I'm hoping to use the SQL
    > command BULK INSERT. But I have a problem, I have lines in the CSV file
    > like the second one here:
    >
    > 12,some text,34
    > 56,"some text, with a comma in",78
    >
    > The BULK INSERT command fails to understand the significance of the
    > quotation marks and treats the second comma in the second line as a
    > delimiter.


    I don't know "BULK INSERT", but I do know Oracles sqlldr and MySQL's
    "LOAD DATA" or mysqlimport. Both of them allow you specify a "Fields
    enclosed by" character, such they can take your large data files just as
    they are. I would be very surprised if "BULK INSERT" is not similarly
    configurable.

    > Thus I would like to replace each occurrence of a comma in the files,
    > when and only when used as a separator, with a different separator ("/"
    > for example). Does anyone have a Perl script they can share with me
    > that will do this, i.e. that will find and replace commas which are not
    > enclosed in text within quote marks?


    Text::CSV_XS

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Jan 18, 2006
    #4
  5. <> wrote:


    > i.e. that will find and replace commas which are not
    > enclosed in text within quote marks?



    Your Question is Asked Frequently, though it's not as easy to
    find as it ought to be:

    How can I split a [character] delimited string except when inside
    [character]?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jan 18, 2006
    #5
  6. Guest

    Hi All,

    Thanks all for your help, that certainly gives me scripts to try, a
    library to pick over, new search terms to try, and I've ordered the
    recommended book by Conway.

    Xho asked:
    >>> I don't know "BULK INSERT", but I do know Oracles sqlldr and MySQL's "LOAD DATA" or mysqlimport. Both of them allow you specify a "Fields enclosed by" character, such they can take your large data files just as they are. I would be very surprised if "BULK INSERT" is not similarly configurable <<<


    I'm using Microsoft's SQL Server 2005. It has a tool for doing this
    called SSIS, but because of problems with my installation I wanted to
    get back to basics and do it in SQL using BULK INSERT. Unfortunately it
    looks like the tantalisingly useful "fields enclosed by" character that
    Xho refers to is not configurable:
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ba-bz_4fec.asp
    ( http://tinyurl.com/55ysl )

    Cheers,

    Tim.
    , Jan 18, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. xmldig
    Replies:
    0
    Views:
    522
    xmldig
    Nov 30, 2005
  2. Replies:
    35
    Views:
    1,293
    Steve Holden
    Jan 4, 2005
  3. Replies:
    3
    Views:
    3,935
    barryman9000
    Jul 16, 2008
  4. Thousand Seperator

    , Mar 14, 2008, in forum: Python
    Replies:
    5
    Views:
    344
    Jeroen Ruigrok van der Werven
    Mar 14, 2008
  5. greymaus

    Record seperator

    greymaus, Aug 26, 2011, in forum: Python
    Replies:
    10
    Views:
    371
    greymaus
    Aug 28, 2011
Loading...

Share This Page