Help: Show specific part

A

Amy Lee

Hello,

I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
script to show the sequences. However, I have a problem while I'm going to
further process.

My output looks like this.
xxx IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag
EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYEDLLPSLIANHL
AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDAGWLTFQEDRP
NVRTNPLANHGGGAVNAVESD
qqq
tggaagccgcagaagaatcgttagaaactgctttccag
tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaaccttgtctgtcc
gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacccgggatgggt
ttagaaaaaaacaacggcggcataactagc

And I hope I can save the whole protein sequences with their
tags(>blahblah) into a file, like "protein" and save DNA sequences into
"dna" file.

So from that, "protein" is
xxx IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
zzz
EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYEDLLPSLIANHL
AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDAGWLTFQEDRP
NVRTNPLANHGGGAVNAVESD
"dna" is
yyy gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag
qqq
tggaagccgcagaagaatcgttagaaactgctttccag
tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaaccttgtctgtcc
gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacccgggatgggt
ttagaaaaaaacaacggcggcataactagc

Because of lacking of Perl knowledge, could you show me some tips?

Thank you very much~

Regards,

Amy Lee
 
J

Jürgen Exner

Amy Lee said:
I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
script to show the sequences. However, I have a problem while I'm going to
further process.

My output looks like this.
[snip lengthy text]
And I hope I can save the whole protein sequences with their
tags(>blahblah) into a file, like "protein" and save DNA sequences into
"dna" file.

So from that, "protein" is
[snip lenghty text]
Because of lacking of Perl knowledge, could you show me some tips?

In how far is the text marked as "output" different from the part marked
as "protein"? They appear to be identical to me. But then again I did
not compare each and every character in those lengthy sequences.

jue
 
A

Amy Lee

Amy Lee said:
I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
script to show the sequences. However, I have a problem while I'm going to
further process.

My output looks like this.
[snip lengthy text]
And I hope I can save the whole protein sequences with their
tags(>blahblah) into a file, like "protein" and save DNA sequences into
"dna" file.

So from that, "protein" is
[snip lenghty text]
Because of lacking of Perl knowledge, could you show me some tips?

In how far is the text marked as "output" different from the part marked
as "protein"? They appear to be identical to me. But then again I did
not compare each and every character in those lengthy sequences.

jue
Well, actually speaking, the protein is upper letter, and dna is lowercase
letter. So I suppose that I can deal with it by this. But I don't know how
to do that~

Thanks,

Amy
 
J

Jürgen Exner

Amy Lee said:
Well, actually speaking, the protein is upper letter, and dna is lowercase
letter.

What on earth are you talking about? I was asking about what is the
difference between your "output" and your "protein" character sequences,
i.e. how do you want your Perl script to manipulate/change/modify those
character sequences?
So I suppose that I can deal with it by this. But I don't know how
to do that~

I have no idea what you are talking about. What "that" are you referring
to?

jue
 
A

Amy Lee

What on earth are you talking about? I was asking about what is the
difference between your "output" and your "protein" character sequences,
i.e. how do you want your Perl script to manipulate/change/modify those
character sequences?


I have no idea what you are talking about. What "that" are you referring
to?

jue
Hmm, sorry to my poor English. Anyway, I will describe my problem in
details.

In fact, perl dose not modify any characters. As you know before, The
"output" is separated by two parts, upper letter part(dna sequences) and
lowercase letter part(protein sequences). And what I want to do is save
the "protein" part into a file and save the "dna" part into another file.
I need not change any characters.

Furthermore, there's a tag like ">xxx" and the tag follows sequences. I
hope I keep this tag when I save the "dna" part and "protein" part.

Thank you very much~

Regards,

Amy
 
J

Jürgen Exner

Amy Lee said:
In fact, perl dose not modify any characters. As you know before, The
"output" is separated by two parts, upper letter part(dna sequences) and
lowercase letter part(protein sequences).

No, I did not know. It may have been obvious to you but I did not notice
that detail in the long complicated character sequences. Thank you for
the explanation.
And what I want to do is save
the "protein" part into a file and save the "dna" part into another file.

Ok, those four lines of explanation make it quite clear what you want to
do. Posting only samples doesn't help because it leaves too much room
for confusion and misunderstandings.
Furthermore, there's a tag like ">xxx" and the tag follows sequences. I
hope I keep this tag when I save the "dna" part and "protein" part.

Here's how I would do it (sketch of code only, details and error
handling omitted):

open() the input file, open() two output files 'dna' and 'protein' with
properly named file handles $DNA and $PROTEIN.

Then

while (<$IN>) {#loop through input file
if (substr ($_, 0, 1) eq '>' ){ #found tag in this line
my $next = <$IN>; #get next line for analysis
$isDNA = $next eq lc($next); #set flag for DNA or Prot
print ($isDNA ? $DNA : $PROTEIN) $_, $next;
#print tag line and line from analysis to
#either $DNA or $PROTEIN depending on flag
} else { #not a tag line but regular data
print ($isDNA ? $DNA : $PROTEIN) $_; #print normal data line
}


jue
 
D

Dr.Ruud

Amy Lee schreef:
I'm a newbie in Perl and do some work in Bioinformatics. I write a
tiny script to show the sequences. However, I have a problem while
I'm going to further process. [...]
And I hope I can save the whole protein sequences with their
tags(>blahblah) into a file, like "protein" and save DNA sequences
into "dna" file.


The following code expects "good input". It will be fooled by mixed-up
input like
xxx IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
yyy
gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag


#!/usr/bin/perl
use strict;
use warnings;

my ($fh_dna, $fh_pro) = (\*STDOUT, \*STDERR);

my $tag;

while ( <DATA> ) {
if ( /^>.+/ ) {
$tag = $_;
next; ###
} elsif ( /^[acgt]+$/ ) {
select $fh_dna;
} elsif ( /^[A-Z]+$/ ) {
select $fh_pro;
} else {
die;
}
$tag and print $tag and undef $tag;
print;
}

__DATA__
xxx IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag
EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYEDLLPSLIANHL
AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDAGWLTFQEDRP
NVRTNPLANHGGGAVNAVESD
qqq
tggaagccgcagaagaatcgttagaaactgctttccag
tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaaccttgtctgtcc
gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacccgggatgggt
ttagaaaaaaacaacggcggcataactagc


"shrunken code" variant of the while-loop:

while ( <DATA> ) {
/^>.+/ and $tag = $_ and next;
/^[acgt]+$/ and select($fh_dna) or
/^[A-Z]+$/ and select($fh_pro) or die;
print $tag and undef $tag if $tag;
print;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top