How to delimit string except when inside quotes?

M

Mike G.

I am trying to split a string using "|" as the record separator. But I
don't want to delimit any records that may be enclosed in quotes.

I found this article in the Perl FAQ:
How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)

http://www.perldoc.com/perl5.8.0/po...n-inside [character]--(Comma-separated-files)

*****************************
@new = ();
push(@new, $+) while $line =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($line,-1,1) eq ',';
*****************************

I have two questions with this, I am trying to convert this regular
expression so that it uses "|" (pipes) as the record separator. And also
have this code recognize empty fields.

I'm not sure where to substitute the commas(",") for pipes ("|"), I tried
this bit of code, which seems to work, but it does not recognize empty
fields, it just skips over them. (I'm using $recSep to hold the record
separator)

*****************************
$line = "F1|\"Hello|This is|Field2\"||Field4";
$recSep = "|";
@new = ();
push(@new, $+) while $line =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^$recSep]+),?
| ,
}gx;
push(@new, undef) if substr($line,-1,1) eq $recSep;
foreach $var (@new) { print "$var\n"; }
*****************************

If you run this code, you can see that the empty field, which I would
consider the third field, does not get captured.

But if I change the $recSep and $line to:

$line = 'F1,"Hello,This is,Field2",,Field4';
$recSep = "|";

It does recognize the empty field. So when it prints out, you can see an
empty string where the third field is.

I'm not too familiar with complicated regular expressions, and the FAQ that
I got the code from does not explain what is going on.

Could someone help me.

Thanks,

-Mike
(I hope I explained this correctly)
 
T

Tad McClellan

Mike G. said:
I found this article in the Perl FAQ:
How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)


Did you read it all the way until the end?

I am trying to convert this regular
expression


Why not use one of the other methods mentioned in the FAQ answer?

Could someone help me.


--------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use Text::parseWords;

my $line = "F1|\"Hello|This is|Field2\"||Field4";
my $recSep = '\|';

my @new = quotewords($recSep, 0, $line);

foreach my $var (@new) { print "$var\n"; }
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top