loop in regular expression....

BD · May 23, 2006

My script is :
use warnings;
use strict;
my $rs = $/;
$/='>';
$,="\t",$\="\n";
my $filename ="file.txt";
open my $file1,'<',$filename or die "Cannot open file $filename
\n $!";
while(<$file1>){
chomp;
next unless length $_;
my ($header,$seq)=split"\n",$_,2;
$seq =~s/\n//g;
print "$header\n";
$seq =~ /GO:[0-9]*/mg;
print " $&\n";
}
$/=$rs ;
close $file1;

my input data is:

TC227001

GO:0009507 chloroplast C TAIR|gene:2133138~84.88~63
GO:0000004 biological_process unknown P TAIR|gene:2133138~84.88~63
GO:0005554 molecular_function unknown F TIGR_Ath1|At4g21010~84.88~63

TC227002

GO:0004033 aldo-keto reductase activity F TIGR_Ath1|At1g59960~78.66~50

TC227004

GO:0008536 RAN protein binding F TIGR_Ath1|At3g26100~90.11~61

TC227005

GO:0004729 protoporphyrinogen oxidase
activity 1.3.3.4 F TAIR|gene:2077669~88.93~52
GO:0008131 amine oxidase activity F TAIR|gene:2077669~88.93~52
GO:0006779 porphyrin biosynthesis P TAIR|gene:2077669~88.93~52
GO:0015036 disulfide oxidoreductase
activity F TAIR|gene:2077669~88.93~52

TC219924

GO:0007163 establishment and/or maintenance of cell
polarity P TIGR_Ath1|At2g22640~97.78~74
GO:0005554 molecular_function unknown F TIGR_Ath1|At2g22640~97.78~74

Here I am trying to parse this fasta file and I getting the output also
what I want accept that what I am getting from regular expression ,it
prints only once though i am trying to print every time it matches like
in first data it has 2 GO value ..so I want it should print both values
but i am getting only first value.
where should I change in my script?
Thanks.

jgraber · May 23, 2006

Perhaps you have read the posting guidelines to have created so nearly perfect
a post.

My script is :
my input data is:

To make it easier for others to run your code,
you can use __DATA__ filehandle instead of an external file,
like this:

#!/usr/local/bin/perl
use warnings;
use strict;
my $rs = $/; # save old
$/='>'; # input rec sep
$,="\t"; # output field sep
$\="\n"; # output rec sep
#my $filename ="file.txt";
#open my $file1,'<',$filename or die "Cannot open file '$filename' : $!\n";
#while(<$file1>){
while(<DATA>){
chomp;
next unless length $_;
my ($header,$seq)=split"\n",$_,2;
# print "header = '$header'\n seq = '$seq'\n"; # added for debugging
# $seq =~s/\n//g;
print $header;
foreach my $go_line (split /\n/,$seq){
# print "goline = '$go_line'\n";
my ($go_only) = $go_line =~ /(GO:[0-9]*)/;
print " $go_only";
}
# $seq =~ /GO:[0-9]*/mg;
# print " $&\n";
}
$/=$rs ; # restore old
#close $file1 or warn "Error when closing file '$filename' : $!\n"
__DATA__

TC227001

GO:0009507 chloroplast stuff
GO:0000004 biological_p stuff
GO:0005554 molecular_fu stuff

TC227002 GO:0004033 aldo-keto re stuff
TC227004

GO:0008536 RAN protein stuff

Unfortunately, it looks like your data may have wrapped poorly in email.
But it looks like you are only interested in the GO sections,
so I've shortened the rest of the line,
since it isn't important to your problem.

Here I am trying to parse this fasta file and I getting the output also
what I want

This would have been a good place to show your actual output.

accept that what I am getting from regular expression ,it
prints only once though i am trying to print every time it matches like
in first data it has 2 GO value ..so I want it should print both values
but i am getting only first value.

This would have been a good place to hand-type what you wanted for output.

where should I change in my script?

You will probably need another looping construct or split
to parse off all of the GO sections and save or print each one
each time through the loop, as shown above.
Note also minor output format changes for compactness in posting.

From the above script, I get the output
TC227001
GO:0009507
GO:0000004
GO:0005554
TC227002
GO:0004033
TC227004
GO:0008536

If this isn't what you want,
you should hand-type the output you want,
so we can tell what you mean.

BD · May 23, 2006

"BD" writes:

If this isn't what you want,
you should hand-type the output you want,
so we can tell what you mean.
Joel

Thanks ,yes this is what I am trying to get.

Tad McClellan · May 24, 2006

BD said:
$seq =~ /GO:[0-9]*/mg;
print " $&\n";

You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.

if ($seq =~ /GO:[0-9]*/g )
{ print " $&\n" }
else
{ die "no GO sections found" }

The m//m modifier only affects the ^ and $ anchors, it is useless
if your pattern does not contain those anchors.

Brian McCauley · May 24, 2006

BD said:
$seq =~ /GO:[0-9]*/mg;
print " $&\n";

[...] ,it
prints only once though i am trying to print every time it matches like
in first data it has 2 GO value ..so I want it should print both values
but i am getting only first value.

m//g in a scalar (or void) context finds only one match but records in
a special attribute of the string the position where it left off. When
you do another m//g on the same string it starts looking at the end of
the last search and finds the next match.

You need to put it in a loop. Also it may be wise to get out the habit
of using $& (see manual for details).

while( $seq =~ /(GO:[0-9]*)/g ) {
print " $1\n";
}

Dave Weaver · May 24, 2006

My script is :
....

my $rs = $/;
$/='>';
$,="\t",$\="\n";
....

$/=$rs ;

In addition to everyone else's comments;

If you want to only change the value of a variable temporarily, as
you do with $/ in your example (why preserve $/ but not $, and $\ ?),
use "local" within a block. At the end of the block the original
values will be restored:

{
local $/ = '>';
local $, = "\t"
local $\ = "\n";

# Your code here

}
# Original values restored here

Not only is this less lines of code, and more readable, it's also more
foolproof - whatever route your code takes to exit the block, the
original values of those localised variables will be restored.

Mumia W. · May 24, 2006

Tad said:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Is there a way to reset the match variables to undef?

A. Sinan Unur · May 24, 2006

Tad said:
Tad said:

You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Click to expand...

Is there a way to reset the match variables to undef?

Why would you want to do that? There are an arbitrary number of match
variables. You could, of course, explicitly undef the ones you are
interested in, but what is the point.

Instead, check if the match suceeded:

if ( $data =~ /^(\d+) - (\w+)/ ) {

# you can use $1 and $2 now

}

or

while ( $data =~ /(\d\d) - (\w)

\w)

\w)/g ) {

# you can use $1, $2, $3, $4 now

}

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

Mumia W. · May 24, 2006

A. Sinan Unur said:
Tad said:

You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Click to expand...

Is there a way to reset the match variables to undef?

Click to expand...

Why would you want to do that? There are an arbitrary number of match
variables. You could, of course, explicitly undef the ones you are
interested in, but what is the point.

The match variables are read-only.

Instead, check if the match suceeded:

if ( $data =~ /^(\d+) - (\w+)/ ) {

# you can use $1 and $2 now

}
[...]

That's good, but when parsing I like to stack several match expressions
and exploit an assumption that, if all the matches failed, the match
variables are undefined. It makes my code more compact.

Oh well, since the match variables are read-only, if I want to unset
them, I'll do a successful match that doesn't capture. Thanks.

Tad McClellan · May 24, 2006

Mumia W. said:
Tad said:

You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Click to expand...

Is there a way to reset the match variables to undef?

//;

Ilya Zakharevich · May 24, 2006

[A complimentary Cc of this posting was sent to
Tad McClellan

//;

If you enter this in Emacs (at least with hairy CPerl), it would warn
you that the results are not what you expect.

Hope this helps,
Ilya

Mumia W. · May 25, 2006

Tad said:
Mumia W. said:

Tad said:

You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]

Click to expand...

Is there a way to reset the match variables to undef?

Click to expand...

//;

That didn't work, but "'a' =~ /./;" does, thanks Tad.

PS.
//; simply re-uses that last successful pattern.

Tad McClellan · May 25, 2006

Mumia W. said:
Tad said:

Mumia W. said:

Tad McClellan wrote:
You should never use the match variables unless you have first
tested that the match _succeeded_, otherwise they will contain
old stale data from a previous match that _did_ succeed.
[...]
Is there a way to reset the match variables to undef?

Click to expand...

//;

Click to expand...

That didn't work

//; simply re-uses that last successful pattern.

Yeah, that was a think-o.

I meant to write this instead:

/^/;

regular expression backreferences	4	Feb 15, 2005
Regular expression help	4	Jul 18, 2008
Regular Expression In C++ !!!!.	6	Feb 21, 2007
problam in nesting loop	1	Nov 18, 2005
Regular Expression for Finding and Deleting comments	1	Jan 4, 2011
problem with regular expression?	0	May 14, 2004
match sequence	4	May 9, 2006
About a value error called 'ValueError: A value in x_new is below theinterpolation range'	0	Feb 6, 2013

loop in regular expression....

BD

jgraber

BD

Tad McClellan

Brian McCauley

Dave Weaver

Mumia W.

A. Sinan Unur

Mumia W.

Tad McClellan

Ilya Zakharevich

Mumia W.

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads