M
Mike G.
I am trying to split a string using "|" as the record separator. But I
don't want to delimit any records that may be enclosed in quotes.
I found this article in the Perl FAQ:
How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)
http://www.perldoc.com/perl5.8.0/po...n-inside [character]--(Comma-separated-files)
*****************************
@new = ();
push(@new, $+) while $line =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($line,-1,1) eq ',';
*****************************
I have two questions with this, I am trying to convert this regular
expression so that it uses "|" (pipes) as the record separator. And also
have this code recognize empty fields.
I'm not sure where to substitute the commas(",") for pipes ("|"), I tried
this bit of code, which seems to work, but it does not recognize empty
fields, it just skips over them. (I'm using $recSep to hold the record
separator)
*****************************
$line = "F1|\"Hello|This is|Field2\"||Field4";
$recSep = "|";
@new = ();
push(@new, $+) while $line =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^$recSep]+),?
| ,
}gx;
push(@new, undef) if substr($line,-1,1) eq $recSep;
foreach $var (@new) { print "$var\n"; }
*****************************
If you run this code, you can see that the empty field, which I would
consider the third field, does not get captured.
But if I change the $recSep and $line to:
$line = 'F1,"Hello,This is,Field2",,Field4';
$recSep = "|";
It does recognize the empty field. So when it prints out, you can see an
empty string where the third field is.
I'm not too familiar with complicated regular expressions, and the FAQ that
I got the code from does not explain what is going on.
Could someone help me.
Thanks,
-Mike
(I hope I explained this correctly)
don't want to delimit any records that may be enclosed in quotes.
I found this article in the Perl FAQ:
How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)
http://www.perldoc.com/perl5.8.0/po...n-inside [character]--(Comma-separated-files)
*****************************
@new = ();
push(@new, $+) while $line =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($line,-1,1) eq ',';
*****************************
I have two questions with this, I am trying to convert this regular
expression so that it uses "|" (pipes) as the record separator. And also
have this code recognize empty fields.
I'm not sure where to substitute the commas(",") for pipes ("|"), I tried
this bit of code, which seems to work, but it does not recognize empty
fields, it just skips over them. (I'm using $recSep to hold the record
separator)
*****************************
$line = "F1|\"Hello|This is|Field2\"||Field4";
$recSep = "|";
@new = ();
push(@new, $+) while $line =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^$recSep]+),?
| ,
}gx;
push(@new, undef) if substr($line,-1,1) eq $recSep;
foreach $var (@new) { print "$var\n"; }
*****************************
If you run this code, you can see that the empty field, which I would
consider the third field, does not get captured.
But if I change the $recSep and $line to:
$line = 'F1,"Hello,This is,Field2",,Field4';
$recSep = "|";
It does recognize the empty field. So when it prints out, you can see an
empty string where the third field is.
I'm not too familiar with complicated regular expressions, and the FAQ that
I got the code from does not explain what is going on.
Could someone help me.
Thanks,
-Mike
(I hope I explained this correctly)