Regexp for variable length tags

Discussion in 'Perl Misc' started by Jon Burroughs, Jul 18, 2005.

  1. I am processing some data that has a up to three key-value pairs
    concatenated together. The keys can be "ADD, REM, EQD". Values are
    variable length.

    There will always be an "ADD" section, followed by 0 to 1 "REM"
    sections, followed by 0 to 1 "EQD" sections. For example:
    ADDxxxxxxxxREMyyyyyEQDzzzzz

    I'm trying to find a regular expression that will split this apart into
    separarate sections in one step.

    So far, I have this:

    $rec =~ /(ADD.+)(REM.+)(EQD.+)/;

    But, this only works if I know the record has all three tokens.

    This gobbles too much:
    $rec =~ /(ADD.+)(REM.+)?(EQD.+)?/;

    Any ideas?

    -Jon
     
    Jon Burroughs, Jul 18, 2005
    #1
    1. Advertising

  2. Jon Burroughs wrote:
    > I am processing some data that has a up to three key-value pairs
    > concatenated together. The keys can be "ADD, REM, EQD". Values are
    > variable length.
    >
    > There will always be an "ADD" section, followed by 0 to 1 "REM"
    > sections, followed by 0 to 1 "EQD" sections. For example:
    > ADDxxxxxxxxREMyyyyyEQDzzzzz
    >
    > I'm trying to find a regular expression that will split this apart into
    > separarate sections in one step.
    >
    > So far, I have this:
    >
    > $rec =~ /(ADD.+)(REM.+)(EQD.+)/;
    >
    > But, this only works if I know the record has all three tokens.
    >
    > This gobbles too much:
    > $rec =~ /(ADD.+)(REM.+)?(EQD.+)?/;
    >
    > Any ideas?


    Try using non-greedy quantifiers.

    perldoc perlre


    John
     
    John W. Krahn, Jul 18, 2005
    #2
    1. Advertising

  3. Jon Burroughs wrote:
    > There will always be an "ADD" section, followed by 0 to 1 "REM"
    > sections, followed by 0 to 1 "EQD" sections. For example:
    > ADDxxxxxxxxREMyyyyyEQDzzzzz
    >
    > I'm trying to find a regular expression that will split this apart into
    > separarate sections in one step.


    Why regex?

    my @rec;
    while (<DATA>) {
    chomp;
    for my $key ( qw/EQD REM ADD/ ) {
    if( (my $pos = index $_, $key) >= 0 ) {
    $rec[$.-1]{$key} = substr $_, $pos+3;
    substr $_, $pos, 100, '';
    }
    }
    }
    use Data::Dumper;
    print Dumper \@rec;

    __DATA__
    ADDxxxxxxREMyyyyyEQDzzzzz
    ADD2222REM666666
    ADD7777777EQD8888

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 18, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    5
    Views:
    2,746
    Eric J. Roode
    Jul 17, 2003
  2. =?Utf-8?B?SG96aQ==?=
    Replies:
    1
    Views:
    6,965
    Ken Cox [Microsoft MVP]
    Jun 2, 2004
  3. Sam
    Replies:
    3
    Views:
    14,110
    Karl Seguin
    Feb 17, 2005
  4. Joao Silva
    Replies:
    16
    Views:
    363
    7stud --
    Aug 21, 2009
  5. Replies:
    4
    Views:
    104
Loading...

Share This Page