print given character range.

Jayaprakash Rudraraju · Apr 5, 2004

Most of the files in bioinformatics save their sequences in fasta
format. Fasta format files contain header lines followed by dna
sequence. I have been using the following short-cut to get sequence
given the range in the sequence.

perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470

Is there is a better and convinient way to do it.

Andre Majorel · Apr 6, 2004

Most of the files in bioinformatics save their sequences in fasta
format. Fasta format files contain header lines followed by dna
sequence. I have been using the following short-cut to get sequence
given the range in the sequence.

perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470

Is there is a better and convinient way to do it.

Other ways to do it would be:

grep -v '>' FASTA.TXT | tr -d '\n' | cut -c3450-3470

perl -ne '
chomp;
next if />/;
$result .= $_;
if (length ($result) >= 3470)
{
print substr ($result, 3449, 21), "\n";
exit 0
}'

Whether they're faster or more convenient than the above, I
don't know. But the solutions involving cut(1) may not do what
you want if FASTA.TXT is too big to be swallowed in one line.

Cognition Peon · Apr 6, 2004

Yesterday, IP packets from Andre Majorel delivered:

Other ways to do it would be:

grep -v '>' FASTA.TXT | tr -d '\n' | cut -c3450-3470

Thanks for the solution.. wanted a simpler way to get the range of
sequence from a fasta file. The headers in fasta files always start
with '>' but I was not looking for a faster solution. will use a script
if fasta file is too long.

Adam Price · Apr 8, 2004

Most of the files in bioinformatics save their sequences in fasta
format. Fasta format files contain header lines followed by dna
sequence. I have been using the following short-cut to get sequence
given the range in the sequence.

perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470

Is there is a better and convinient way to do it.

You could try looking at CPAN, try
http://search.cpan.org/~birney/bioperl-1.4/
as a place to start looking.
It seems to cover lots of stuff to do with FASTA files.
Adam

Kevin Collins · Apr 8, 2004

Jayaprakash Rudraraju said:
Most of the files in bioinformatics save their sequences in fasta
format. Fasta format files contain header lines followed by dna
sequence. I have been using the following short-cut to get sequence
given the range in the sequence.

perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470

Is there is a better and convinient way to do it.

Try this:

perl -ne 'chomp; print substr($_, 3449, 20) unless (/^>/);'

The "^" assumes (as you mentioned in another reply) that the header
starts with '>' - otherwise you can leave it out. However, if the
lines do start with '>', it is much faster (especially for the long
records) for the regexp engine if you anchor the RE with '^'.

This single perl command should always be faster that 'perl | cut'...

Kevin

How do i resolve this error message Please! I need help	1	Mar 30, 2013
Help: How to use filehandle to save files?	6	Aug 16, 2007
Taskcproblem calendar	4	Aug 31, 2023
problam in nesting loop	1	Nov 18, 2005
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Need help with a program	25	Jan 28, 2010
Ann: CoreBio 0.4	2	Sep 11, 2006
evaluate sine series.	26	Nov 22, 2005

print given character range.

Jayaprakash Rudraraju

Andre Majorel

Cognition Peon

Adam Price

Kevin Collins

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads