M
Michael O'Connor
I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.
I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).
As perl doesn't support variable length look-behind, I'm reversing the
string and using look-ahead instead. The nearest I've got this to
working is:
my $escape = '\\';
my $separator = '|';
my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);
At this point I'm expecting @fields to be a reversed list of
individually reversed fields, but I'm finding it contains extra fields
(either empty or containing 2 escape characters).
E.g. when trying to split string 'A|B\|C|D\\|E' (reverse =
'E|\\D|C|\B|A'), I'm hoping to get @fields = ('E', '\\D', 'C|\B',
'A'), but instead I get ('E', '\\', '\\D', '', 'C|\B', '', 'A').
Can anyone tell me how I avoid getting these extra fields in the
result?
Thanks,
Michael
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.
I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).
As perl doesn't support variable length look-behind, I'm reversing the
string and using look-ahead instead. The nearest I've got this to
working is:
my $escape = '\\';
my $separator = '|';
my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);
At this point I'm expecting @fields to be a reversed list of
individually reversed fields, but I'm finding it contains extra fields
(either empty or containing 2 escape characters).
E.g. when trying to split string 'A|B\|C|D\\|E' (reverse =
'E|\\D|C|\B|A'), I'm hoping to get @fields = ('E', '\\D', 'C|\B',
'A'), but instead I get ('E', '\\', '\\D', '', 'C|\B', '', 'A').
Can anyone tell me how I avoid getting these extra fields in the
result?
Thanks,
Michael