Can this regex be simplified ?

N

Niall

I am processing some data where normally there are the same number of
tokens in each line but occasionally one value may be missing . In the
attached example there are normally 4 values per line but the second
line has field 3 missing. I think I could use a multiplier
[\s+(\d+)]{0,1} which would work here , but this would not work if the
data in column 4 happened to also be numeric.

I would be grateful for any suggestion as to how the 2 regexes could be
combined if this is possible.

use strict;
use warnings;

while(<DATA>)
{
chomp;
if(/(\S+)\s+(\d+)\s+(\d+)\s+(\w+)/)
{
print ("\nMatch 1 Got[$1][$2][$3][$4]");
}
elsif(/(\S+)\s+(\d+)\s+(\w+)/)
{
print ("\nMatch 2 Got[$1][$2][$3]");
}
else
{
print ("\nNo match");
}
}
################################
__END__
ABC 1233 456 XYZ
ZZZ 66555 JKL
YYY 1717 284 MNOP
 
T

Tad McClellan

Niall said:
I am processing some data


Can there be space characters in the field values?

Are the fields at fixed positions, and you typo'd one too many
spaces in the last line?

where normally there are the same number of
tokens in each line but occasionally one value may be missing . In the
attached example there are normally 4 values per line but the second
line has field 3 missing.
I would be grateful for any suggestion as to how the 2 regexes could be
combined if this is possible.


At this point, I'm not convinced that regexes are even the
Right Tool for the job.

If the fields don't contain spaces:

my @f = split;

(but you won't know which is the missing one.)

If the fields are in fixed positions, then pack() or substr()
is the right tool, and they will be able to indicate the missing one.
 
N

Niall

Tad said:
Can there be space characters in the field values?

Are the fields at fixed positions, and you typo'd one too many
spaces in the last line?
Thanks for the suggestions Tad

The data given in the example was just a test prog. In the real data I
am dealing with it looks as if the fields are actually in fixed
positions, so I guess my code should be;

my @fields = ();
$fields[0] = substr($line, 0, 8)
$fields[1] = substr($line, 10, 3)
......
$fields[8] = substr($line, 60, 15)

(My real data has 9 fields)

However this sems to be quite long winded and doesn't do the sanity
checking (i.e check that certain fields are numeric) that I can get
from using the regexp.

I guess what might be better is to use a single regexp (going back to
the test data) of

(/(\S+)\s+(\d+)\s+(.*)/)

which will match the first 2 fields, slurp the rest of the string into
a single variable , and then split on this string to see if it contains
one or two values.

my ($thirdvar, $fourthvar) = split (/\s+/, $3)
if($fourthvar eq "")
{
$fourthvar = $thirdvar;
$thirdvar = "";
}

Still seems very messy though :(
 
T

Tad McClellan

Niall said:
the fields are actually in fixed
positions, so I guess my code should be;

my @fields = ();
$fields[0] = substr($line, 0, 8)
$fields[1] = substr($line, 10, 3)
.....
$fields[8] = substr($line, 60, 15)
However this sems to be quite long winded


A single call to unpack() will be much prettier.

and doesn't do the sanity
checking (i.e check that certain fields are numeric)


But it still won't do that part.
 
I

Ilmari Karonen

Tad McClellan said:
Niall said:
the fields are actually in fixed
positions, so I guess my code should be;
[snip]

A single call to unpack() will be much prettier.
and doesn't do the sanity
checking (i.e check that certain fields are numeric)

But it still won't do that part.

....which is why you do that _after_ unpacking:

my @fields = unpack "A10 A3 ...whatever... A15", $_;

die "Error on input line $.\n" unless
$fields[0] =~ /^\d+$/ and
$fields[1] =~ /^whatever$/ and
...
$fields[8] =~ /^[A-Z]+$/;

Finally, I'd advise the OP to first find out in what format his data
really is. For example, the fields might actually be tab-delimited,
not fixed-length. In that case, split /\t/ should be used instead of
unpack.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top