Can this regex be simplified ?

Discussion in 'Perl Misc' started by Niall, Jun 27, 2005.

  1. Niall

    Niall Guest

    I am processing some data where normally there are the same number of
    tokens in each line but occasionally one value may be missing . In the
    attached example there are normally 4 values per line but the second
    line has field 3 missing. I think I could use a multiplier
    [\s+(\d+)]{0,1} which would work here , but this would not work if the
    data in column 4 happened to also be numeric.

    I would be grateful for any suggestion as to how the 2 regexes could be
    combined if this is possible.

    use strict;
    use warnings;

    while(<DATA>)
    {
    chomp;
    if(/(\S+)\s+(\d+)\s+(\d+)\s+(\w+)/)
    {
    print ("\nMatch 1 Got[$1][$2][$3][$4]");
    }
    elsif(/(\S+)\s+(\d+)\s+(\w+)/)
    {
    print ("\nMatch 2 Got[$1][$2][$3]");
    }
    else
    {
    print ("\nNo match");
    }
    }
    ################################
    __END__
    ABC 1233 456 XYZ
    ZZZ 66555 JKL
    YYY 1717 284 MNOP
     
    Niall, Jun 27, 2005
    #1
    1. Advertising

  2. Niall <> wrote:

    > I am processing some data



    Can there be space characters in the field values?

    Are the fields at fixed positions, and you typo'd one too many
    spaces in the last line?


    > where normally there are the same number of
    > tokens in each line but occasionally one value may be missing . In the
    > attached example there are normally 4 values per line but the second
    > line has field 3 missing.


    > I would be grateful for any suggestion as to how the 2 regexes could be
    > combined if this is possible.



    At this point, I'm not convinced that regexes are even the
    Right Tool for the job.

    If the fields don't contain spaces:

    my @f = split;

    (but you won't know which is the missing one.)

    If the fields are in fixed positions, then pack() or substr()
    is the right tool, and they will be able to indicate the missing one.


    > __END__
    > ABC 1233 456 XYZ
    > ZZZ 66555 JKL
    > YYY 1717 284 MNOP



    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jun 27, 2005
    #2
    1. Advertising

  3. Niall

    Niall Guest

    Tad McClellan wrote:
    >
    > Can there be space characters in the field values?
    >
    > Are the fields at fixed positions, and you typo'd one too many
    > spaces in the last line?
    >
    >

    Thanks for the suggestions Tad

    The data given in the example was just a test prog. In the real data I
    am dealing with it looks as if the fields are actually in fixed
    positions, so I guess my code should be;

    my @fields = ();
    $fields[0] = substr($line, 0, 8)
    $fields[1] = substr($line, 10, 3)
    ......
    $fields[8] = substr($line, 60, 15)

    (My real data has 9 fields)

    However this sems to be quite long winded and doesn't do the sanity
    checking (i.e check that certain fields are numeric) that I can get
    from using the regexp.

    I guess what might be better is to use a single regexp (going back to
    the test data) of

    (/(\S+)\s+(\d+)\s+(.*)/)

    which will match the first 2 fields, slurp the rest of the string into
    a single variable , and then split on this string to see if it contains
    one or two values.

    my ($thirdvar, $fourthvar) = split (/\s+/, $3)
    if($fourthvar eq "")
    {
    $fourthvar = $thirdvar;
    $thirdvar = "";
    }

    Still seems very messy though :(
     
    Niall, Jun 27, 2005
    #3
  4. Niall <> wrote:

    > the fields are actually in fixed
    > positions, so I guess my code should be;
    >
    > my @fields = ();
    > $fields[0] = substr($line, 0, 8)
    > $fields[1] = substr($line, 10, 3)
    > .....
    > $fields[8] = substr($line, 60, 15)


    > However this sems to be quite long winded



    A single call to unpack() will be much prettier.


    > and doesn't do the sanity
    > checking (i.e check that certain fields are numeric)



    But it still won't do that part.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jun 27, 2005
    #4
  5. Tad McClellan <> kirjoitti 27.06.2005:
    > Niall <> wrote:
    >
    >> the fields are actually in fixed
    >> positions, so I guess my code should be;

    [snip]
    >
    > A single call to unpack() will be much prettier.
    >
    >> and doesn't do the sanity
    >> checking (i.e check that certain fields are numeric)

    >
    > But it still won't do that part.


    ....which is why you do that _after_ unpacking:

    my @fields = unpack "A10 A3 ...whatever... A15", $_;

    die "Error on input line $.\n" unless
    $fields[0] =~ /^\d+$/ and
    $fields[1] =~ /^whatever$/ and
    ...
    $fields[8] =~ /^[A-Z]+$/;

    Finally, I'd advise the OP to first find out in what format his data
    really is. For example, the fields might actually be tab-delimited,
    not fixed-length. In that case, split /\t/ should be used instead of
    unpack.

    --
    Ilmari Karonen
    To reply by e-mail, please replace ".invalid" with ".net" in address.
     
    Ilmari Karonen, Jul 3, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Grey

    simplified chinese encoding

    Grey, Apr 20, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    508
  2. Wandy Tang
    Replies:
    4
    Views:
    4,513
    Roedy Green
    Jul 20, 2004
  3. Eric Lilja

    Can this conversion code be simplified?

    Eric Lilja, Apr 8, 2006, in forum: C Programming
    Replies:
    8
    Views:
    433
    Michael Mair
    Apr 9, 2006
  4. Replies:
    3
    Views:
    773
    Reedick, Andrew
    Jul 1, 2008
  5. fulio pen

    Can the code be simplified?

    fulio pen, Sep 19, 2008, in forum: Javascript
    Replies:
    12
    Views:
    160
    fulio pen
    Sep 20, 2008
Loading...

Share This Page