Regexp for variable length tags

Jon Burroughs · Jul 18, 2005

I am processing some data that has a up to three key-value pairs
concatenated together. The keys can be "ADD, REM, EQD". Values are
variable length.

There will always be an "ADD" section, followed by 0 to 1 "REM"
sections, followed by 0 to 1 "EQD" sections. For example:
ADDxxxxxxxxREMyyyyyEQDzzzzz

I'm trying to find a regular expression that will split this apart into
separarate sections in one step.

So far, I have this:

$rec =~ /(ADD.+)(REM.+)(EQD.+)/;

But, this only works if I know the record has all three tokens.

This gobbles too much:
$rec =~ /(ADD.+)(REM.+)?(EQD.+)?/;

Any ideas?

-Jon

John W. Krahn · Jul 18, 2005

Jon said:
I am processing some data that has a up to three key-value pairs
concatenated together. The keys can be "ADD, REM, EQD". Values are
variable length.

There will always be an "ADD" section, followed by 0 to 1 "REM"
sections, followed by 0 to 1 "EQD" sections. For example:
ADDxxxxxxxxREMyyyyyEQDzzzzz

I'm trying to find a regular expression that will split this apart into
separarate sections in one step.

So far, I have this:

$rec =~ /(ADD.+)(REM.+)(EQD.+)/;

But, this only works if I know the record has all three tokens.

This gobbles too much:
$rec =~ /(ADD.+)(REM.+)?(EQD.+)?/;

Any ideas?

Try using non-greedy quantifiers.

perldoc perlre

John

Gunnar Hjalmarsson · Jul 18, 2005

Jon said:
There will always be an "ADD" section, followed by 0 to 1 "REM"
sections, followed by 0 to 1 "EQD" sections. For example:
ADDxxxxxxxxREMyyyyyEQDzzzzz

I'm trying to find a regular expression that will split this apart into
separarate sections in one step.

Why regex?

my @rec;
while (<DATA>) {
chomp;
for my $key ( qw/EQD REM ADD/ ) {
if( (my $pos = index $_, $key) >= 0 ) {
$rec[$.-1]{$key} = substr $_, $pos+3;
substr $_, $pos, 100, '';
}
}
}
use Data:

umper;
print Dumper \@rec;

__DATA__
ADDxxxxxxREMyyyyyEQDzzzzz
ADD2222REM666666
ADD7777777EQD8888

Teach me how to fish, regexp	30	Oct 7, 2003
RegExp to validate an MVS dataset name	7	Feb 23, 2006
Simple regexp question	0	Oct 26, 2005
UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug	2	Aug 5, 2009
know-how(-not) about regular expressions	11	Feb 12, 2010
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012

Regexp for variable length tags

Jon Burroughs

John W. Krahn

Gunnar Hjalmarsson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads