convert protein fasta stream into harsh table

zhong.huang · Feb 28, 2006

hi,

Can anyone suggest me a simple way to convert multiple sequences fasta
(in Bio::SeqIO object) into harsh table (sequence annotation as key,
sequence as value)?

The fasta file looks like this:

gi|9049352|dbj|BAA99407.1| 3-methylcrotonyl-CoA carboxylase biotin-containing subunit [Homo sapiens] MAAASAVSVLLVAAERNRWHRLPSLLLPPRTWVWRQRTMKYTTATGRNITKVLIANRGEIACRVMRTAKKLGVQTVAVYSEADRNSMHVDMADEAYSIGPAPSQQSYLSMEKIIQVAKTSAAQAIHPGCGFLSENMEFAE

gi|4504067|ref|NP_002070.1| aspartate aminotransferase 1 [Homo sapiens]

MAPPSVFAEVPQAQPVLVFKLTADFREDPDPRKVNLGVGAYRTDDCHPWVLPVVKKVEQKIANDNSLNHEYLPILGLAEFRSCASRLALGD

I want to have the harsh table %seqharsh to hold sequences like this:

# my %seqharsh = ('seq1', MAAASAVSVL......',
# 'seq2', MAPPSVFAEVPQ......,);

My code is like this:

my $seqio = new Bio::SeqIO(-format => $format,
-file => $file);

while ( my $seq = $seqio->next_seq ) {
if( $seq->alphabet ne 'protein' ) {
confess("Skipping non protein sequence...");
next;
}

#write code here to assign each entry into harsh %seqharsh
my $seqharsh{$seq->primary_id} = $seq->seq();
bla bla bla

Thank you very much for your help!

zhong

Samwyse · Feb 28, 2006

hi,

Can anyone suggest me a simple way to convert multiple sequences fasta
(in Bio::SeqIO object) into harsh table (sequence annotation as key,
sequence as value)?

They are called HASH tables, not HARSH tables.

The fasta file looks like this:

gi|9049352|dbj|BAA99407.1| 3-methylcrotonyl-CoA carboxylase biotin-containing subunit [Homo sapiens]
MAAASAVSVLLVAAERNRWHRLPSLLLPPRTWVWRQRTMKYTTATGRNITKVLIANRGEIACRVMRTAKKLGVQTVAVYSEADRNSMHVDMADEAYSIGPAPSQQSYLSMEKIIQVAKTSAAQAIHPGCGFLSENMEFAE

gi|4504067|ref|NP_002070.1| aspartate aminotransferase 1 [Homo sapiens]

Click to expand...

MAPPSVFAEVPQAQPVLVFKLTADFREDPDPRKVNLGVGAYRTDDCHPWVLPVVKKVEQKIANDNSLNHEYLPILGLAEFRSCASRLALGD

I want to have the harsh table %seqharsh to hold sequences like this:

# my %seqharsh = ('seq1', MAAASAVSVL......',
# 'seq2', MAPPSVFAEVPQ......,);

I'm not seeing where the 'seq1' and 'seq2' values are coming from in
your input. If I'm allowed to make up hash keys, the problem is pretty
simple.

My code is like this:

my $seqio = new Bio::SeqIO(-format => $format,
-file => $file);

my %seqharsh; # declare the hash table

while ( my $seq = $seqio->next_seq ) {
if( $seq->alphabet ne 'protein' ) {
confess("Skipping non protein sequence...");
next;
}

#write code here to assign each entry into harsh %seqharsh
my $seqharsh{$seq->primary_id} = $seq->seq();

You neither need nor want the 'my'; this will add items to the hash:
$seqharsh{$seq->primary_id} = $seq->seq;

Keith Keller · Feb 28, 2006

Can anyone suggest me a simple way to convert multiple sequences fasta
(in Bio::SeqIO object) into harsh table (sequence annotation as key,
sequence as value)?

It's a hash, not harsh (at least in English). I suppose it could be
a harsh hash.

I want to have the harsh table %seqharsh to hold sequences like this:

# my %seqharsh = ('seq1', MAAASAVSVL......',
# 'seq2', MAPPSVFAEVPQ......,);

It's not such great style to write it this way, even though
we know what you mean. Try this instead:

# my %seqharsh = ('seq1' => 'MAAASAVSVL......',
# 'seq2' => 'MAPPSVFAEVPQ......',);

Under use strict the quotes are required on the right-hand side.
The => behaves like , in this context, but looks nicer for humans.

My code is like this:

my $seqio = new Bio::SeqIO(-format => $format,
-file => $file);

while ( my $seq = $seqio->next_seq ) {
if( $seq->alphabet ne 'protein' ) {
confess("Skipping non protein sequence...");
next;
}

#write code here to assign each entry into harsh %seqharsh
my $seqharsh{$seq->primary_id} = $seq->seq();

This line will not do what you want. AFAIK you can't even my
a hash key/value. Remove the my and see if you get what you want.
If running under use strict, as you should, you will want to define
my %seqharsh (or my %seqhash) before the while loop begins.

If you have further troubles, you should be sure to post a
short but complete script that contains your difficulty, as
well as the expected and actual results, as described in the
posting guidelines for this newsgroup.

--keith

January Weiner · Feb 28, 2006

hi,

Can anyone suggest me a simple way to convert multiple sequences fasta
(in Bio::SeqIO object) into harsh table (sequence annotation as key,
sequence as value)?

Using Bioperl to read in fasta _is_ an overkill. Bioperl is painfully
slow. Of course, then you can read 1001 formats, check the sequence for
correctness etc etc, but if you need to run a bigger job, forget Bioperl.

The fasta file looks like this:

gi|9049352|dbj|BAA99407.1| 3-methylcrotonyl-CoA carboxylase biotin-containing subunit [Homo sapiens]

Click to expand...

MAAASAVSVLLVAAERNRWHRLPSLLLPPRTWVWRQRTMKYTTATGRNITKVLIANRGEIACRVMRTAKKLGVQTVAVYSEADRNSMHVDMADEAYSIGPAPSQQSYLSMEKIIQVAKTSAAQAIHPGCGFLSENMEFAE

# here is a very simple code; it does not check whether the sequence is
# correct, but it is fast and does not depend on the Bioperl modules

my %seq_hash ;
my $cur_seq ;

while( <$infile> ) { # you need to open() you file earlier
chomp ;
next if( /^(#.*|\s*)$/ ) ; # skip comments and empty lines

if( /^>(.*)/ ) { # new sequence
$cur_seq = $1 ;
next ;
}

next unless( $cur_seq ) ; # skip lines that do not belong to a sequence

$seq_hash{ $cur_seq } .= $_ ;
}

#write code here to assign each entry into harsh %seqharsh
my $seqharsh{$seq->primary_id} = $seq->seq();
bla bla bla

(OK, now I see the problem)

you cannot write "my $blah{foo};". Remove the "my" and it should work
fine.

Please use Perl's warnings and strict pragmas ('use strict; use
warnings;').

j.

--

p.valente · Feb 28, 2006

http://s4.bitefight.it/c.php?uid=19648

p.valente · Feb 28, 2006

http://s4.bitefight.it/c.php?uid=19648

p.valente · Feb 28, 2006

http://s4.bitefight.it/c.php?uid=19648

Anno Siegel · Feb 28, 2006

January Weiner said:
# here is a very simple code; it does not check whether the sequence is ^
# correct, but it is fast and does not depend on the Bioperl modules

Just a language-related note: "Code" in the sense of "program text"
is a mass noun that doesn't take an indefinite article and doesn't form
a plural.

Anno

January Weiner · Mar 1, 2006

Anno Siegel said:
Just a language-related note: "Code" in the sense of "program text"
is a mass noun that doesn't take an indefinite article and doesn't form
a plural.

Thanks for that! Corrected version: a piece of very simple code.

j.

P.S. What does Lublin do at the tu-berlin?

--

Anno Siegel · Mar 1, 2006

January Weiner said:
Thanks for that! Corrected version: a piece of very simple code.

Nicely applied.

P.S. What does Lublin do at the tu-berlin?

Nothing at all. I happen to use Polish place names when I get to name
a machine. There are quite a few short and euphonious ones when you
look on a map, enough for the foreseeable demand. It means I don't have
to come up with something original when I name a machine, but it's less
predictable than flowers.

Anno

split the sequences	25	Jan 30, 2006
Is there any othere way to do it;	1	Jan 23, 2006
problam in nesting loop	1	Nov 18, 2005
I hope this is an easy problem to solve	1	Nov 25, 2003
Pattern matching problem	6	Jun 12, 2004
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008
comp.lang.c FAQ list Table of Contents	0	Jan 1, 2006
comp.lang.c FAQ list Table of Contents	0	Jul 1, 2003

convert protein fasta stream into harsh table

zhong.huang

Samwyse

Keith Keller

January Weiner

p.valente

p.valente

p.valente

Anno Siegel

January Weiner

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads