Regex to extract CSV file

Discussion in 'Perl Misc' started by Vito Corleone, Jul 5, 2004.

  1. Hi,

    I have CSV file that looks like this:
    1,this is the title,2004/03/05,this is details
    2,another title,2004/05/05,another details

    And I extract it using split like this:
    @row = split(",", $line);

    The problem is, if there is coma in text, it will turn to this:
    1,"title , with coma",2004/03/05,and the details
    2,title without coma,2004/05/09,"but details, has coma"

    Is there any efficient way to extract this?
    Vito Corleone, Jul 5, 2004
    #1
    1. Advertising

  2. Vito Corleone

    Anno Siegel Guest

    Vito Corleone <> wrote in comp.lang.perl.misc:
    > Hi,
    >
    > I have CSV file that looks like this:
    > 1,this is the title,2004/03/05,this is details
    > 2,another title,2004/05/05,another details
    >
    > And I extract it using split like this:
    > @row = split(",", $line);
    >
    > The problem is, if there is coma in text, it will turn to this:
    > 1,"title , with coma",2004/03/05,and the details
    > 2,title without coma,2004/05/09,"but details, has coma"


    That is a FAQ: "How can I split a [character] delimited string except
    when inside [character]?"

    > Is there any efficient way to extract this?


    A search on CPAN for "CSV" would have shown you a handful of modules
    for that purpose.

    Anno
    Anno Siegel, Jul 5, 2004
    #2
    1. Advertising

  3. Vito Corleone wrote:
    > Hi,
    >
    > I have CSV file that looks like this:
    > 1,this is the title,2004/03/05,this is details
    > 2,another title,2004/05/05,another details
    >
    > And I extract it using split like this:
    > @row = split(",", $line);
    >
    > The problem is, if there is coma in text, it will turn to this:
    > 1,"title , with coma",2004/03/05,and the details
    > 2,title without coma,2004/05/09,"but details, has coma"
    >
    > Is there any efficient way to extract this?


    Use Text::CSV from CPAN.

    #! /usr/bin/perl -w

    use Text::CSV;

    $line = '1,"title , with coma",2004/03/05,and the details';
    $csv = Text::CSV->new();
    $status = $csv->parse($line);
    @columns = $csv->fields();
    print $columns[1], "\n";

    exit 0;

    prints

    title , with coma

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Jul 5, 2004
    #3
  4. On Mon, 05 Jul 2004 09:05:21 +0200
    Josef Moellers <> wrote:

    > Use Text::CSV from CPAN.


    Thank you very much :)
    Vito Corleone, Jul 5, 2004
    #4
  5. Vito Corleone

    John Bokma Guest

    John Bokma, Jul 5, 2004
    #5
  6. John Bokma wrote:
    > Josef Moellers wrote:
    >
    >> #! /usr/bin/perl -w

    >
    >
    > remove -w and:
    >
    > use strict;
    > use warnings;


    Yes, and a mailer and a news reader were also missing.

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Jul 5, 2004
    #6
  7. John Bokma wrote:
    > Josef Moellers wrote:
    >
    >> #! /usr/bin/perl -w

    >
    >
    > remove -w and:


    From the perl manpage on my system:

    "Did we mention that you should definitely consider using the -w switch?"

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Jul 5, 2004
    #7
  8. Vito Corleone

    John Bokma Guest

    Josef Moellers wrote:

    > John Bokma wrote:
    >
    >> Josef Moellers wrote:
    >>
    >>> #! /usr/bin/perl -w

    >>
    >> remove -w and:

    >
    > From the perl manpage on my system:
    >
    > "Did we mention that you should definitely consider using the -w switch?"


    " The "warnings" pragma is a replacement for the command line flag "-w",

    perldoc warnings
    perldoc perllexwarn

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Jul 5, 2004
    #8
  9. Vito Corleone

    John Bokma Guest

    John Bokma, Jul 5, 2004
    #9
  10. Josef Moellers <> wrote:

    > John Bokma wrote:
    >
    > > Josef Moellers wrote:
    > >
    > > > #! /usr/bin/perl -w

    > >
    > > remove -w and:

    >
    > From the perl manpage on my system:
    >
    > "Did we mention that you should definitely consider using the -w
    > switch?"


    It is, in a way, cleaner to "use warnings", but since it was
    introduced very recently it can't be used in scripts that are to
    be processed by older versions of perl.

    I only use "-w" in one-liners, otherwise I use

    #!/usr/bin/env perl
    ...
    BEGIN { $^W = 1 } # equivalent to "-w" option

    Peter

    --
    #!/local/bin/perl5 -wp -*- mode: cperl; coding: iso-8859-1; -*-
    # matlab comment stripper (strips comments from Matlab m-files)
    s/^((?:(?:[])}\w.]'+|[^'%])+|'[^'\n]*(?:''[^'\n]*)*')*).*/$1/x;
    Peter J. Acklam, Jul 5, 2004
    #10
  11. Vito Corleone

    Joe Smith Guest

    Josef Moellers wrote:

    > John Bokma wrote:
    >
    >> Josef Moellers wrote:
    >>
    >>> #! /usr/bin/perl -w

    >>
    >> remove -w and:
    >> use warnings;

    >
    > From the perl manpage on my system:
    > "Did we mention that you should definitely consider using the -w switch?"


    How old is your version of perl? If your version of perl understands
    use warnings;
    then that is recommended in place of the -w switch.
    Joe Smith, Jul 5, 2004
    #11
  12. Joe Smith wrote:
    > Josef Moellers wrote:
    >
    >> John Bokma wrote:
    >>
    >>> Josef Moellers wrote:
    >>>
    >>>> #! /usr/bin/perl -w
    >>>
    >>>
    >>> remove -w and:
    >>> use warnings;

    >>
    >>
    >> From the perl manpage on my system:
    >> "Did we mention that you should definitely consider using the -w switch?"

    >
    >
    > How old is your version of perl? If your version of perl understands
    > use warnings;
    > then that is recommended in place of the -w switch.


    It's perl v5.8.1. Maybe the version of my brain is quite old, so go figure.

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Jul 5, 2004
    #12
  13. Vito Corleone

    Alf Timms Guest

    -berlin.de (Anno Siegel) wrote in message news:<ccau8v$3q8$-Berlin.DE>...
    > Vito Corleone <> wrote in comp.lang.perl.misc:
    > > Hi,
    > >
    > > I have CSV file that looks like this:
    > > 1,this is the title,2004/03/05,this is details
    > > 2,another title,2004/05/05,another details
    > >
    > > And I extract it using split like this:
    > > @row = split(",", $line);
    > >
    > > The problem is, if there is coma in text, it will turn to this:
    > > 1,"title , with coma",2004/03/05,and the details
    > > 2,title without coma,2004/05/09,"but details, has coma"

    >
    > That is a FAQ: "How can I split a [character] delimited string except
    > when inside [character]?"
    >
    > > Is there any efficient way to extract this?

    >
    > A search on CPAN for "CSV" would have shown you a handful of modules
    > for that purpose.
    >
    > Anno


    vito,

    these things are best done quickly with a regular expression. no need for cpan here:

    @row = $line =~ /([^",]+|"[^"]+")/g;

    still one problem. double-quoted elements are still double-quoted. easily fixed:

    foreach( @row ) { s/^"(.*)"$/$1/ }

    alf
    Alf Timms, Jul 5, 2004
    #13
  14. Vito Corleone

    Uri Guttman Guest

    >>>>> "AT" == Alf Timms <> writes:

    AT> these things are best done quickly with a regular expression. no need for cpan here:

    AT> @row = $line =~ /([^",]+|"[^"]+")/g;

    and what about embedded "'s in a field?

    AT> still one problem. double-quoted elements are still double-quoted. easily fixed:

    AT> foreach( @row ) { s/^"(.*)"$/$1/ }

    more than one problem left.

    these things are best done quickly and CORRECTLY by a module.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Jul 5, 2004
    #14
  15. >>>>> "VC" == Vito Corleone <> writes:

    VC> Hi, I have CSV file that looks like this: 1,this is the
    VC> title,2004/03/05,this is details 2,another
    VC> title,2004/05/05,another details

    VC> And I extract it using split like this: @row = split(",",
    VC> $line);

    VC> The problem is, if there is coma in text, it will turn to
    VC> this: 1,"title , with coma",2004/03/05,and the details 2,title
    VC> without coma,2004/05/09,"but details, has coma"

    VC> Is there any efficient way to extract this?


    This is ironic. Just the other night I was flipping through
    "Mastering Regular Expressions" 1st ed and found this example

    @fields=();
    while ($text =~ m/"([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,/g){
    push (@fields, defined($1)?$1:$3);
    }
    push (@fields,undef) if $text =~ m/,$/;


    Of course the text doesn't say anything about its efficiency. The
    most efficient solution is probably to use Text::CSV as others
    have pointed out.

    If you're interested in a super overkill solution, you can use
    the DBI module with the csv DBD (can't remember what its called)
    and access your csv file like a database. :)


    --
    Dale Henderson

    "Imaginary universes are so much more beautiful than this stupidly-
    constructed 'real' one..." -- G. H. Hardy
    Dale Henderson, Jul 7, 2004
    #15
  16. Vito Corleone

    Eric Bohlman Guest

    Dale Henderson <> wrote in news:-
    commerce.edu:

    > If you're interested in a super overkill solution, you can use
    > the DBI module with the csv DBD (can't remember what its called)


    Would you believe DBD::CSV?

    > and access your csv file like a database. :)
    Eric Bohlman, Jul 8, 2004
    #16
  17. Vito Corleone

    Clyde Ingram Guest

    Vito,

    > VC> The problem is, if there is coma in text, it will turn to
    > VC> this: 1,"title , with coma",2004/03/05,and the details 2,title
    > VC> without coma,2004/05/09,"but details, has coma"
    >
    > VC> Is there any efficient way to extract this?


    perldoc Text::parseWords

    Perhaps:
    @words = quotewords( ',', 0, $line );

    Regards,
    Clyde
    Clyde Ingram, Jul 8, 2004
    #17
  18. Vito Corleone

    Clyde Ingram Guest

    Eric,

    "Eric Bohlman" <> wrote in message
    news:Xns951FD4A541E8Cebohlmanomsdevcom@130.133.1.4...
    > Dale Henderson <> wrote in

    news:-
    > commerce.edu:
    >
    > > If you're interested in a super overkill solution, you can use
    > > the DBI module with the csv DBD (can't remember what its called)

    >
    > Would you believe DBD::CSV?


    Interesting ... Isn't DBD::CSV trustworthy?

    It may be a bit heavy handed, of course.
    But for fun, I once knocked up this (on Windoze XPee):

    #!e:/bin/perl.exe -w

    use strict;
    use Data::Dumper;
    local $Data::Dumper::Terse = 0;
    local $Data::Dumper::Indent = 1;

    use DBI;

    my $CSV_DIR="D:/Clyde/perldev/Trial";

    my $dbh = DBI->connect("DBI:CSV:f_dir=$CSV_DIR")
    or die "Cannot connect: " . $DBI::errstr;
    $dbh->{'csv_tables'}->{'fractions'} = { 'file' => 'fractions.csv'};

    $dbh->{'RaiseError'} = 1;
    $@ = '';
    eval {
    my $sth = $dbh->prepare("SELECT * FROM fractions")
    or die "Cannot prepare: " . $dbh->errstr();
    $sth->execute() or die "Cannot execute: " . $sth->errstr();

    while (my $row = $sth->fetchrow_hashref) {
    print("Found result row:\n" . Data::Dumper->Dump( [$row] ) . "\n");
    }
    $sth->finish();
    $dbh->disconnect();

    };
    if ($@) { die "SQL database error: $@"; }

    The data file "fractions.csv" starts:

    Numerator,Denominator,Decimal,Percentage,Total
    5,6,0.833333333,83.33333333,84.16666667
    8,3,2.666666667,266.6666667,269.3333333


    And the output starts:
    Found result row:
    $VAR1 = {
    'Total' => '84.16666667',
    'Numerator' => '5',
    'Denominator' => '6',
    'Percentage' => '83.33333333',
    'Decimal' => '0.833333333'
    };

    Found result row:
    $VAR1 = {
    'Total' => '269.3333333',
    'Numerator' => '8',
    'Denominator' => '3',
    'Percentage' => '266.6666667',
    'Decimal' => '2.666666667'
    };

    (I don't recall having to register the database with XP)

    Regards,
    Clyde
    Clyde Ingram, Jul 8, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. MUK
    Replies:
    2
    Views:
    1,050
    Roedy Green
    Oct 22, 2005
  2. Tintin92
    Replies:
    1
    Views:
    1,672
    Andrew Thompson
    Feb 14, 2007
  3. jliu66
    Replies:
    0
    Views:
    492
    jliu66
    Oct 19, 2007
  4. Replies:
    3
    Views:
    728
    Reedick, Andrew
    Jul 1, 2008
  5. Replies:
    6
    Views:
    161
    Luca Cerone
    Sep 28, 2013
Loading...

Share This Page