Regex to extract CSV file

Vito Corleone · Jul 5, 2004

Hi,

I have CSV file that looks like this:
1,this is the title,2004/03/05,this is details
2,another title,2004/05/05,another details

And I extract it using split like this:
@row = split(",", $line);

The problem is, if there is coma in text, it will turn to this:
1,"title , with coma",2004/03/05,and the details
2,title without coma,2004/05/09,"but details, has coma"

Is there any efficient way to extract this?

Anno Siegel · Jul 5, 2004

Vito Corleone said:
Hi,

I have CSV file that looks like this:
1,this is the title,2004/03/05,this is details
2,another title,2004/05/05,another details

And I extract it using split like this:
@row = split(",", $line);

The problem is, if there is coma in text, it will turn to this:
1,"title , with coma",2004/03/05,and the details
2,title without coma,2004/05/09,"but details, has coma"

That is a FAQ: "How can I split a [character] delimited string except
when inside [character]?"

Is there any efficient way to extract this?

A search on CPAN for "CSV" would have shown you a handful of modules
for that purpose.

Anno

Josef Moellers · Jul 5, 2004

Vito said:
Hi,

I have CSV file that looks like this:
1,this is the title,2004/03/05,this is details
2,another title,2004/05/05,another details

And I extract it using split like this:
@row = split(",", $line);

The problem is, if there is coma in text, it will turn to this:
1,"title , with coma",2004/03/05,and the details
2,title without coma,2004/05/09,"but details, has coma"

Is there any efficient way to extract this?

Use Text::CSV from CPAN.

#! /usr/bin/perl -w

use Text::CSV;

$line = '1,"title , with coma",2004/03/05,and the details';
$csv = Text::CSV->new();
$status = $csv->parse($line);
@columns = $csv->fields();
print $columns[1], "\n";

exit 0;

prints

title , with coma

Vito Corleone · Jul 5, 2004

Use Text::CSV from CPAN.

Thank you very much

John Bokma · Jul 5, 2004

Josef said:
#! /usr/bin/perl -w

remove -w and:

use strict;
use warnings;

Josef Moellers · Jul 5, 2004

John said:
remove -w and:

use strict;
use warnings;

Yes, and a mailer and a news reader were also missing.

Josef Moellers · Jul 5, 2004

John said:
remove -w and:

From the perl manpage on my system:

"Did we mention that you should definitely consider using the -w switch?"

John Bokma · Jul 5, 2004

Josef said:
From the perl manpage on my system:

"Did we mention that you should definitely consider using the -w switch?"

" The "warnings" pragma is a replacement for the command line flag "-w",

perldoc warnings
perldoc perllexwarn

John Bokma · Jul 5, 2004

Josef said:
Yes, and a mailer and a news reader were also missing.

Didn't notice that. I missed use strict, and several my's however :-D

Peter J. Acklam · Jul 5, 2004

Josef Moellers said:
From the perl manpage on my system:

"Did we mention that you should definitely consider using the -w
switch?"

It is, in a way, cleaner to "use warnings", but since it was
introduced very recently it can't be used in scripts that are to
be processed by older versions of perl.

I only use "-w" in one-liners, otherwise I use

#!/usr/bin/env perl
...
BEGIN { $^W = 1 } # equivalent to "-w" option

Peter

Joe Smith · Jul 5, 2004

Josef said:
From the perl manpage on my system:
"Did we mention that you should definitely consider using the -w switch?"

How old is your version of perl? If your version of perl understands
use warnings;
then that is recommended in place of the -w switch.

Josef Moellers · Jul 5, 2004

Joe said:
How old is your version of perl? If your version of perl understands
use warnings;
then that is recommended in place of the -w switch.

It's perl v5.8.1. Maybe the version of my brain is quite old, so go figure.

Alf Timms · Jul 5, 2004

Vito Corleone said:
Vito Corleone said:

Hi,

I have CSV file that looks like this:
1,this is the title,2004/03/05,this is details
2,another title,2004/05/05,another details

And I extract it using split like this:
@row = split(",", $line);

The problem is, if there is coma in text, it will turn to this:
1,"title , with coma",2004/03/05,and the details
2,title without coma,2004/05/09,"but details, has coma"

Click to expand...

That is a FAQ: "How can I split a [character] delimited string except
when inside [character]?"

Is there any efficient way to extract this?

Click to expand...

A search on CPAN for "CSV" would have shown you a handful of modules
for that purpose.

Anno

vito,

these things are best done quickly with a regular expression. no need for cpan here:

@row = $line =~ /([^",]+|"[^"]+")/g;

still one problem. double-quoted elements are still double-quoted. easily fixed:

foreach( @row ) { s/^"(.*)"$/$1/ }

alf

Uri Guttman · Jul 5, 2004

AT> these things are best done quickly with a regular expression. no need for cpan here:

AT> @row = $line =~ /([^",]+|"[^"]+")/g;

and what about embedded "'s in a field?

AT> still one problem. double-quoted elements are still double-quoted. easily fixed:

AT> foreach( @row ) { s/^"(.*)"$/$1/ }

more than one problem left.

these things are best done quickly and CORRECTLY by a module.

uri

Dale Henderson · Jul 7, 2004

VC> Hi, I have CSV file that looks like this: 1,this is the
VC> title,2004/03/05,this is details 2,another
VC> title,2004/05/05,another details

VC> And I extract it using split like this: @row = split(",",
VC> $line);

VC> The problem is, if there is coma in text, it will turn to
VC> this: 1,"title , with coma",2004/03/05,and the details 2,title
VC> without coma,2004/05/09,"but details, has coma"

VC> Is there any efficient way to extract this?

This is ironic. Just the other night I was flipping through
"Mastering Regular Expressions" 1st ed and found this example

@fields=();
while ($text =~ m/"([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,/g){
push (@fields, defined($1)?$1:$3);
}
push (@fields,undef) if $text =~ m/,$/;

Of course the text doesn't say anything about its efficiency. The
most efficient solution is probably to use Text::CSV as others
have pointed out.

If you're interested in a super overkill solution, you can use
the DBI module with the csv DBD (can't remember what its called)
and access your csv file like a database.

Eric Bohlman · Jul 8, 2004

If you're interested in a super overkill solution, you can use
the DBI module with the csv DBD (can't remember what its called)

Would you believe DBD::CSV?

Clyde Ingram · Jul 8, 2004

Vito,

VC> The problem is, if there is coma in text, it will turn to
VC> this: 1,"title , with coma",2004/03/05,and the details 2,title
VC> without coma,2004/05/09,"but details, has coma"

VC> Is there any efficient way to extract this?

perldoc Text:

arseWords

Perhaps:
@words = quotewords( ',', 0, $line );

Regards,
Clyde

Clyde Ingram · Jul 8, 2004

Eric,

Eric Bohlman said:
Would you believe DBD::CSV?

Interesting ... Isn't DBD::CSV trustworthy?

It may be a bit heavy handed, of course.
But for fun, I once knocked up this (on Windoze XPee):

#!e:/bin/perl.exe -w

use strict;
use Data:

umper;
local $Data:

umper::Terse = 0;
local $Data:

umper::Indent = 1;

use DBI;

my $CSV_DIR="D:/Clyde/perldev/Trial";

my $dbh = DBI->connect("DBI:CSV:f_dir=$CSV_DIR")
or die "Cannot connect: " . $DBI::errstr;
$dbh->{'csv_tables'}->{'fractions'} = { 'file' => 'fractions.csv'};

$dbh->{'RaiseError'} = 1;
$@ = '';
eval {
my $sth = $dbh->prepare("SELECT * FROM fractions")
or die "Cannot prepare: " . $dbh->errstr();
$sth->execute() or die "Cannot execute: " . $sth->errstr();

while (my $row = $sth->fetchrow_hashref) {
print("Found result row:\n" . Data:

umper->Dump( [$row] ) . "\n");
}
$sth->finish();
$dbh->disconnect();

};
if ($@) { die "SQL database error: $@"; }

The data file "fractions.csv" starts:

Numerator,Denominator,Decimal,Percentage,Total
5,6,0.833333333,83.33333333,84.16666667
8,3,2.666666667,266.6666667,269.3333333

And the output starts:
Found result row:
$VAR1 = {
'Total' => '84.16666667',
'Numerator' => '5',
'Denominator' => '6',
'Percentage' => '83.33333333',
'Decimal' => '0.833333333'
};

Found result row:
$VAR1 = {
'Total' => '269.3333333',
'Numerator' => '8',
'Denominator' => '3',
'Percentage' => '266.6666667',
'Decimal' => '2.666666667'
};

(I don't recall having to register the database with XP)

Regards,
Clyde

How to put loop result in csv file	1	Jan 3, 2023
How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
Php combine identical lines in text file	4	Oct 11, 2023
KML to CSV file conversion using Python and Windows Powershell	0	Oct 14, 2022
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Errors When Pulling Information from CSV File to Python	0	Dec 10, 2020
vi regex to preserve interior commas in CSV string	15	Dec 13, 2011
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022

Regex to extract CSV file

Vito Corleone

Anno Siegel

Josef Moellers

Vito Corleone

John Bokma

Josef Moellers

Josef Moellers

John Bokma

John Bokma

Peter J. Acklam

Joe Smith

Josef Moellers

Alf Timms

Uri Guttman

Dale Henderson

Eric Bohlman

Clyde Ingram

Clyde Ingram

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads