P
PeroMHC
Hi All, So here is the problem... I have a FASTA file (used for DNA
analyses) that looks like this:
....
CGCCTATCCACGTCTGTCACAACCTCATACATCAGACAGTCACACTTACCAACATATCCAAGCACCTCAA
GCAACACATCAT
....
This snippet represents 3 individual DNA sequences. Each sequences is
identified by the line starting with >
The complete file has about 10 million individual sequences.
A simple enough problem, I want to read in this data, and cut out the
last 76 letters (nucleotides) from each individual sequence and send
them to a new txt file with a similar format.
Any help on how to do this would be appreciated.
Thanks!
analyses) that looks like this:
....
NGTCGGGTCTTCGCTATCACTGGACTGCTCCCATCAGCTATAGGTCCTCCCCGCCACACCCCATGCCCACgnl|SRA|SRR019045.10.1 SL-XAY_956090708:2:1:0:1028.1 length=152 NCTTTTTTTATTTTTTGTATAAATGAAGTTTCACTATATCGGACGAGCGGTTCAGCAGTCATTCCGAGAC
CGATATAGTGAAACTTCATTTCTACAAAAANTACCAAACGTCGCTCGGCAGAGCGTCGTGTTGGGCAAGA
GAGTAGCACTCG
gnl|SRA|SRR019045.11.1 SL-XAY_956090708:2:1:0:1151.1 length=152 NGGTNTGGNNNNCNCCNTNCTNCNNCNTCANCCTCCNGTCNCANNCCNCNTNNNNNCNNNNNCNNTNCTT
CTNCNNTCTCCATTCCTTCTTNATAGCCTGCTCCANCGCACGTTGAACCTTCTGCACCACGAACGCACTC
ACACCACTCATC
gnl|SRA|SRR019045.12.1 SL-XAY_956090708:2:1:0:1197.1 length=152
CGCCTATCCACGTCTGTCACAACCTCATACATCAGACAGTCACACTTACCAACATATCCAAGCACCTCAA
GCAACACATCAT
....
This snippet represents 3 individual DNA sequences. Each sequences is
identified by the line starting with >
The complete file has about 10 million individual sequences.
A simple enough problem, I want to read in this data, and cut out the
last 76 letters (nucleotides) from each individual sequence and send
them to a new txt file with a similar format.
Any help on how to do this would be appreciated.
Thanks!