Can this regex be simplified ?

Niall · Jun 27, 2005

I am processing some data where normally there are the same number of
tokens in each line but occasionally one value may be missing . In the
attached example there are normally 4 values per line but the second
line has field 3 missing. I think I could use a multiplier
[\s+(\d+)]{0,1} which would work here , but this would not work if the
data in column 4 happened to also be numeric.

I would be grateful for any suggestion as to how the 2 regexes could be
combined if this is possible.

use strict;
use warnings;

while(<DATA>)
{
chomp;
if(/(\S+)\s+(\d+)\s+(\d+)\s+(\w+)/)
{
print ("\nMatch 1 Got[$1][$2][$3][$4]");
}
elsif(/(\S+)\s+(\d+)\s+(\w+)/)
{
print ("\nMatch 2 Got[$1][$2][$3]");
}
else
{
print ("\nNo match");
}
}
################################
__END__
ABC 1233 456 XYZ
ZZZ 66555 JKL
YYY 1717 284 MNOP

Tad McClellan · Jun 27, 2005

Niall said:
I am processing some data

Can there be space characters in the field values?

Are the fields at fixed positions, and you typo'd one too many
spaces in the last line?

where normally there are the same number of
tokens in each line but occasionally one value may be missing . In the
attached example there are normally 4 values per line but the second
line has field 3 missing.

I would be grateful for any suggestion as to how the 2 regexes could be
combined if this is possible.

At this point, I'm not convinced that regexes are even the
Right Tool for the job.

If the fields don't contain spaces:

my @f = split;

(but you won't know which is the missing one.)

If the fields are in fixed positions, then pack() or substr()
is the right tool, and they will be able to indicate the missing one.

Niall · Jun 27, 2005

Tad said:
Can there be space characters in the field values?

Are the fields at fixed positions, and you typo'd one too many
spaces in the last line?

Thanks for the suggestions Tad

The data given in the example was just a test prog. In the real data I
am dealing with it looks as if the fields are actually in fixed
positions, so I guess my code should be;

my @fields = ();
$fields[0] = substr($line, 0, 8)
$fields[1] = substr($line, 10, 3)
......
$fields[8] = substr($line, 60, 15)

(My real data has 9 fields)

However this sems to be quite long winded and doesn't do the sanity
checking (i.e check that certain fields are numeric) that I can get
from using the regexp.

I guess what might be better is to use a single regexp (going back to
the test data) of

(/(\S+)\s+(\d+)\s+(.*)/)

which will match the first 2 fields, slurp the rest of the string into
a single variable , and then split on this string to see if it contains
one or two values.

my ($thirdvar, $fourthvar) = split (/\s+/, $3)
if($fourthvar eq "")
{
$fourthvar = $thirdvar;
$thirdvar = "";
}

Still seems very messy though

Tad McClellan · Jun 27, 2005

Niall said:
the fields are actually in fixed
positions, so I guess my code should be;

my @fields = ();
$fields[0] = substr($line, 0, 8)
$fields[1] = substr($line, 10, 3)
.....
$fields[8] = substr($line, 60, 15)

However this sems to be quite long winded

A single call to unpack() will be much prettier.

and doesn't do the sanity
checking (i.e check that certain fields are numeric)

But it still won't do that part.

Ilmari Karonen · Jul 3, 2005

Tad McClellan said:
Niall said:

the fields are actually in fixed
positions, so I guess my code should be;

Click to expand...

[snip]

A single call to unpack() will be much prettier.

and doesn't do the sanity
checking (i.e check that certain fields are numeric)

Click to expand...

But it still won't do that part.

....which is why you do that _after_ unpacking:

my @fields = unpack "A10 A3 ...whatever... A15", $_;

die "Error on input line $.\n" unless
$fields[0] =~ /^\d+$/ and
$fields[1] =~ /^whatever$/ and
...
$fields[8] =~ /^[A-Z]+$/;

Finally, I'd advise the OP to first find out in what format his data
really is. For example, the fields might actually be tab-delimited,
not fixed-length. In that case, split /\t/ should be used instead of
unpack.

Help!! Can anyone provide this solution?	1	Jan 30, 2022
Need help with this script	4	Mar 12, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Can this conversion code be simplified?	8	Apr 8, 2006
Perl regex expression to return values	4	Sep 8, 2009
regex problem	7	Jun 12, 2009
I dont know how to modify this. Can someone help.	0	Sep 25, 2014
URGENT	1	Jan 31, 2023

Can this regex be simplified ?

Niall

Tad McClellan

Niall

Tad McClellan

Ilmari Karonen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads