split problem

M

Michael O'Connor

I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.

I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).

As perl doesn't support variable length look-behind, I'm reversing the
string and using look-ahead instead. The nearest I've got this to
working is:

my $escape = '\\';
my $separator = '|';

my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);

At this point I'm expecting @fields to be a reversed list of
individually reversed fields, but I'm finding it contains extra fields
(either empty or containing 2 escape characters).

E.g. when trying to split string 'A|B\|C|D\\|E' (reverse =
'E|\\D|C|\B|A'), I'm hoping to get @fields = ('E', '\\D', 'C|\B',
'A'), but instead I get ('E', '\\', '\\D', '', 'C|\B', '', 'A').

Can anyone tell me how I avoid getting these extra fields in the
result?

Thanks,

Michael
 
A

Anno Siegel

Michael O'Connor said:
I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.

[...]

That's a FAQ:

How can I split a [character] delimited string except when inside
[character]?

The answer points (mostly) to a number of modules that handle this
type of data.

Anno
 
G

gnari

Michael O'Connor said:
I have a string containing pipe-delimited fields, but the string can
contain binary data which may itself include pipe characters - in
which case the pipe is escaped with a backslash. Backslash characters
are also escaped in the string (with another backslash) to distinguish
them from other escaped characters.

I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).

this is a FAQ, of course.

but in your specific case,
my $escape = '\\';
my $separator = '|';

my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);
E.g. when trying to split string 'A|B\|C|D\\|E' (reverse =
'E|\\D|C|\B|A'), I'm hoping to get @fields = ('E', '\\D', 'C|\B',
'A'), but instead I get ('E', '\\', '\\D', '', 'C|\B', '', 'A').

Can anyone tell me how I avoid getting these extra fields in the
result?

your inner () group is interfering (interleaving) with the split.

try:
my @fields =
split(/\Q$separator\E(?=(?:\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);

gnari
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

I'm trying to split this string into its constituent fields. Based on
the above, the delimiter is defined as a pipe-character that is not
preceeded by an odd number of backslash characters (i.e. it's
preceeded by zero or an even number of backslash characters).

As perl doesn't support variable length look-behind, I'm reversing the
string and using look-ahead instead. The nearest I've got this to
working is:

Oooh. I call this "sexeger".
my $escape = '\\';
my $separator = '|';

my @fields = split(/\Q$separator\E(?=(\Q$escape$escape\E)*[^\Q$escape\E])/,
reverse $value);

At this point I'm expecting @fields to be a reversed list of
individually reversed fields, but I'm finding it contains extra fields
(either empty or containing 2 escape characters).

Yeah, because your split() regex has capturing parentheses in it, in the
look-ahead. Change (\Q$escape$escape\E)* to (?:\Q$escape$escape\E)* to
get rid of the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,269
Messages
2,571,100
Members
48,773
Latest member
Kaybee

Latest Threads

Top