Why does perl think there's a regex?

C

Chris

Dear all,

I have a (fairly complicated) script which unmasks a string (or protein
sequence, for those who are interested) where it has previously been
partially masked by Xs. BTW not all lines are masked.

For example this (all one line):

QUERY 438
QDYPRYXXXXXXXXXXXXXXXXXXXXY-RVVCLRDNRRRD-----------K-------T 478

Is converted to this (also all one line):

QUERY 438
QDYPRYVPGFVVTVVTTFVAGVLVFVY-RVVCLRDNRRRD-----------K-------T 478

This has been working well for a while, but today for some strings the
masked version has also got some '*' in the string. e.g.:

szpo03656 61
FYCSFLAAKLPSQLISKKIGPNRWIPTQMVLWSIVYICQYSLSG-T-A-LF*I-MRWLLG 116

When my script comes across the '*' it fails with the following error
(where <BLAST> is the input file being read):

'Quantifier follows nothing before HERE mark in regex m/* << HERE / at
unfilter_edit.pl line 214, <BLAST> line 43.'

My question is why does perl think there is a regex on line 214 when
there isn't? I still get the error even if I replace the '*' with
s/\*/-/ !!! What on earth is going on!

Any help appreciated.
Chris.


The script does a lot of pre-processing and parsing of the original
file, but the key subroutine is shown below. It is passed the 'masked'
string '$seq', an array reference to the unmasked string (cut up into
one-letter chunks) '$aref' and a marker of where to start the comparisons.

sub unfilt {

my ($seq, $aref, $seq_start) = @_ or return;

print "RAW1: $seq\n";

$seq =~ s/\*/-/;

print "RAW2: $seq\n";

my @split_seq = split //, $seq;

foreach my $code (@split_seq) {
if ($code eq '-') { <--------- line 214
print $code;
}
elsif ($code =~ /x/i) {
$code = $aref->[$seq_start-1];
print $code;
++$seq_start;
}
elsif ($code !~ /$aref->[$seq_start-1]/i) {
printf "position %d doesn't agree
$code:$aref->[$seq_start-1]\n", $seq_start-1;
}
else {
print $code;
++$seq_start;
}
}
}

BTW using perl 5.6.1 built for i686-linux.
 
B

Brian Harnish

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear all,

I have a (fairly complicated) script which unmasks a string (or protein
sequence, for those who are interested) where it has previously been
partially masked by Xs. BTW not all lines are masked.

You're supposed to make a sample script that can be ran and shows your
problem. The script you provided doesn't run, because all it is is a
subroutine that is never called. Usually when you try creating the smaller
script, you figure the problem out for yourself.

- Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/V0dDiK/rA3tCpFYRAn7cAJwP/VdfMlUbLwXTizbgxhXyp+T6+ACdHGUg
wi6cUArF359LnEY+DOAPYYo=
=DKsc
-----END PGP SIGNATURE-----
 
C

Chris

Brian said:
You're supposed to make a sample script that can be ran and shows your
problem. The script you provided doesn't run, because all it is is a
subroutine that is never called. Usually when you try creating the smaller
script, you figure the problem out for yourself.
Sorry, it's been a hard day. You're right of course, that snippet of
code was of no use to anyone. By paring the program down to the
subroutine I found that the problem wasn't the subroutine afterall, doh!
Looks like I just needed someone to tell me the obvious ;-)

Thanks,
Chris.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top