split problem

M

Michael O'Connor

I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.

I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).

As perl doesn't support variable length look-behind, I'm reversing the
string and using look-ahead instead. The nearest I've got this to
working is:

my $escape = '\\';
my $separator = '|';

my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);

At this point I'm expecting @fields to be a reversed list of
individually reversed fields, but I'm finding it contains extra fields
(either empty or containing 2 escape characters).

E.g. when trying to split string 'A|B\|C|D\\|E' (reverse =
'E|\\D|C|\B|A'), I'm hoping to get @fields = ('E', '\\D', 'C|\B',
'A'), but instead I get ('E', '\\', '\\D', '', 'C|\B', '', 'A').

Can anyone tell me how I avoid getting these extra fields in the
result?

Thanks,

Michael
 
A

Anno Siegel

Michael O'Connor said:
I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.

[...]

That's a FAQ:

How can I split a [character] delimited string except when inside
[character]?

The answer points (mostly) to a number of modules that handle this
type of data.

Anno
 
G

gnari

Michael O'Connor said:
I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.

I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).

this is a FAQ, of course.

but in your specific case,
my $escape = '\\';
my $separator = '|';

my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);
E.g. when trying to split string 'A|B\|C|D\\|E' (reverse =
'E|\\D|C|\B|A'), I'm hoping to get @fields = ('E', '\\D', 'C|\B',
'A'), but instead I get ('E', '\\', '\\D', '', 'C|\B', '', 'A').

Can anyone tell me how I avoid getting these extra fields in the
result?

your inner () group is interfering (interleaving) with the split.

try:
my @fields =
split(/\Q$separator\E(?=(?:\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);

gnari
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).

As perl doesn't support variable length look-behind, I'm reversing the
string and using look-ahead instead. The nearest I've got this to
working is:

Oooh. I call this "sexeger".
my $escape = '\\';
my $separator = '|';

my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);

At this point I'm expecting @fields to be a reversed list of
individually reversed fields, but I'm finding it contains extra fields
(either empty or containing 2 escape characters).

Yeah, because your split() regex has capturing parentheses in it, in the
look-ahead. Change (\Q$escape$escape\E)* to (?:\Q$escape$escape\E)* to
get rid of the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,282
Latest member
RoseannaBa

Latest Threads

Top