loop in regular expression....

B

BD

My script is :
use warnings;
use strict;
my $rs = $/;
$/='>';
$,="\t",$\="\n";
my $filename ="file.txt";
open my $file1,'<',$filename or die "Cannot open file $filename
\n $!";
while(<$file1>){
chomp;
next unless length $_;
my ($header,$seq)=split"\n",$_,2;
$seq =~s/\n//g;
print "$header\n";
$seq =~ /GO:[0-9]*/mg;
print " $&\n";
}
$/=$rs ;
close $file1;


my input data is:
GO:0009507 chloroplast C TAIR|gene:2133138~84.88~63
GO:0000004 biological_process unknown P TAIR|gene:2133138~84.88~63
GO:0005554 molecular_function unknown F TIGR_Ath1|At4g21010~84.88~63
GO:0004033 aldo-keto reductase activity F TIGR_Ath1|At1g59960~78.66~50
GO:0008536 RAN protein binding F TIGR_Ath1|At3g26100~90.11~61
GO:0004729 protoporphyrinogen oxidase
activity 1.3.3.4 F TAIR|gene:2077669~88.93~52
GO:0008131 amine oxidase activity F TAIR|gene:2077669~88.93~52
GO:0006779 porphyrin biosynthesis P TAIR|gene:2077669~88.93~52
GO:0015036 disulfide oxidoreductase
activity F TAIR|gene:2077669~88.93~52
GO:0007163 establishment and/or maintenance of cell
polarity P TIGR_Ath1|At2g22640~97.78~74
GO:0005554 molecular_function unknown F TIGR_Ath1|At2g22640~97.78~74

Here I am trying to parse this fasta file and I getting the output also
what I want accept that what I am getting from regular expression ,it
prints only once though i am trying to print every time it matches like
in first data it has 2 GO value ..so I want it should print both values
but i am getting only first value.
where should I change in my script?
Thanks.
 
J

jgraber

Perhaps you have read the posting guidelines to have created so nearly perfect
a post.
My script is :
my input data is:

To make it easier for others to run your code,
you can use __DATA__ filehandle instead of an external file,
like this:

#!/usr/local/bin/perl
use warnings;
use strict;
my $rs = $/; # save old
$/='>'; # input rec sep
$,="\t"; # output field sep
$\="\n"; # output rec sep
#my $filename ="file.txt";
#open my $file1,'<',$filename or die "Cannot open file '$filename' : $!\n";
#while(<$file1>){
while(<DATA>){
chomp;
next unless length $_;
my ($header,$seq)=split"\n",$_,2;
# print "header = '$header'\n seq = '$seq'\n"; # added for debugging
# $seq =~s/\n//g;
print $header;
foreach my $go_line (split /\n/,$seq){
# print "goline = '$go_line'\n";
my ($go_only) = $go_line =~ /(GO:[0-9]*)/;
print " $go_only";
}
# $seq =~ /GO:[0-9]*/mg;
# print " $&\n";
}
$/=$rs ; # restore old
#close $file1 or warn "Error when closing file '$filename' : $!\n"
__DATA__
GO:0009507 chloroplast stuff
GO:0000004 biological_p stuff
GO:0005554 molecular_fu stuff
TC227002 GO:0004033 aldo-keto re stuff
TC227004
GO:0008536 RAN protein stuff

Unfortunately, it looks like your data may have wrapped poorly in email.
But it looks like you are only interested in the GO sections,
so I've shortened the rest of the line,
since it isn't important to your problem.
Here I am trying to parse this fasta file and I getting the output also
what I want
This would have been a good place to show your actual output.
accept that what I am getting from regular expression ,it
prints only once though i am trying to print every time it matches like
in first data it has 2 GO value ..so I want it should print both values
but i am getting only first value.
This would have been a good place to hand-type what you wanted for output.
where should I change in my script?

You will probably need another looping construct or split
to parse off all of the GO sections and save or print each one
each time through the loop, as shown above.
Note also minor output format changes for compactness in posting.

From the above script, I get the output
TC227001
GO:0009507
GO:0000004
GO:0005554
TC227002
GO:0004033
TC227004
GO:0008536

If this isn't what you want,
you should hand-type the output you want,
so we can tell what you mean.
 
B

BD

"BD" writes:

If this isn't what you want,
you should hand-type the output you want,
so we can tell what you mean.
Joel

Thanks ,yes this is what I am trying to get.
 
T

Tad McClellan

BD said:
$seq =~ /GO:[0-9]*/mg;
print " $&\n";


You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.

if ($seq =~ /GO:[0-9]*/g )
{ print " $&\n" }
else
{ die "no GO sections found" }


The m//m modifier only affects the ^ and $ anchors, it is useless
if your pattern does not contain those anchors.
 
B

Brian McCauley

BD said:
$seq =~ /GO:[0-9]*/mg;
print " $&\n";
[...] ,it
prints only once though i am trying to print every time it matches like
in first data it has 2 GO value ..so I want it should print both values
but i am getting only first value.

m//g in a scalar (or void) context finds only one match but records in
a special attribute of the string the position where it left off. When
you do another m//g on the same string it starts looking at the end of
the last search and finds the next match.

You need to put it in a loop. Also it may be wise to get out the habit
of using $& (see manual for details).

while( $seq =~ /(GO:[0-9]*)/g ) {
print " $1\n";
}
 
D

Dave Weaver

My script is :
....

my $rs = $/;
$/='>';
$,="\t",$\="\n";
....

$/=$rs ;

In addition to everyone else's comments;

If you want to only change the value of a variable temporarily, as
you do with $/ in your example (why preserve $/ but not $, and $\ ?),
use "local" within a block. At the end of the block the original
values will be restored:

{
local $/ = '>';
local $, = "\t"
local $\ = "\n";

# Your code here

}
# Original values restored here

Not only is this less lines of code, and more readable, it's also more
foolproof - whatever route your code takes to exit the block, the
original values of those localised variables will be restored.
 
M

Mumia W.

Tad said:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Is there a way to reset the match variables to undef?
 
A

A. Sinan Unur

Tad said:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Is there a way to reset the match variables to undef?

Why would you want to do that? There are an arbitrary number of match
variables. You could, of course, explicitly undef the ones you are
interested in, but what is the point.

Instead, check if the match suceeded:

if ( $data =~ /^(\d+) - (\w+)/ ) {

# you can use $1 and $2 now

}


or

while ( $data =~ /(\d\d) - (\w):(\w):(\w)/g ) {

# you can use $1, $2, $3, $4 now

}

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
M

Mumia W.

A. Sinan Unur said:
Tad said:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]
Is there a way to reset the match variables to undef?

Why would you want to do that? There are an arbitrary number of match
variables. You could, of course, explicitly undef the ones you are
interested in, but what is the point.

The match variables are read-only.
Instead, check if the match suceeded:

if ( $data =~ /^(\d+) - (\w+)/ ) {

# you can use $1 and $2 now

}
[...]

That's good, but when parsing I like to stack several match expressions
and exploit an assumption that, if all the matches failed, the match
variables are undefined. It makes my code more compact.

Oh well, since the match variables are read-only, if I want to unset
them, I'll do a successful match that doesn't capture. Thanks.
 
T

Tad McClellan

Mumia W. said:
Tad said:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Is there a way to reset the match variables to undef?


//;
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tad McClellan

If you enter this in Emacs (at least with hairy CPerl), it would warn
you that the results are not what you expect.

Hope this helps,
Ilya
 
M

Mumia W.

Tad said:
Mumia W. said:
Tad said:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]
Is there a way to reset the match variables to undef?


//;

That didn't work, but "'a' =~ /./;" does, thanks Tad.

PS.
//; simply re-uses that last successful pattern.
 
T

Tad McClellan

Mumia W. said:
Tad said:
Mumia W. said:
Tad McClellan wrote:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]
Is there a way to reset the match variables to undef?


//;

That didn't work
//; simply re-uses that last successful pattern.


Yeah, that was a think-o.

I meant to write this instead:

/^/;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top