concatenate fasta file

Discussion in 'Python' started by PeroMHC, Feb 12, 2010.

  1. PeroMHC

    PeroMHC Guest

    Hi All, I have a simple problem that I hope somebody can help with. I
    have an input file (a fasta file) that I need to edit..

    Input file format

    >name 1

    tactcatacatac
    >name 2

    acggtggcat
    >name 3

    gggtaccacgtt

    I need to concatenate the sequences.. make them look like

    >concatenated

    tactcatacatacacggtggcatgggtaccacgtt

    thanks. Matt
     
    PeroMHC, Feb 12, 2010
    #1
    1. Advertising

  2. PeroMHC

    Roy Smith Guest

    In article
    <>,
    PeroMHC <> wrote:

    > Hi All, I have a simple problem that I hope somebody can help with. I
    > have an input file (a fasta file) that I need to edit..
    >
    > Input file format
    >
    > >name 1

    > tactcatacatac
    > >name 2

    > acggtggcat
    > >name 3

    > gggtaccacgtt
    >
    > I need to concatenate the sequences.. make them look like
    >
    > >concatenated

    > tactcatacatacacggtggcatgggtaccacgtt
    >
    > thanks. Matt


    Some quick ideas. First, try something along the lines of (not tested):

    data=[]
    for line in sys.stdin:
    if line.startswith('>'):
    continue
    data.append(line.strip())
    print ''.join(data)

    Second, check out http://biopython.org/wiki/Main_Page. I'm sure somebody
    has solved this problem before.
     
    Roy Smith, Feb 12, 2010
    #2
    1. Advertising

  3. PeroMHC wrote:
    > Hi All, I have a simple problem that I hope somebody can help with. I
    > have an input file (a fasta file) that I need to edit..
    >
    > Input file format
    >
    >
    >> name 1
    >>

    > tactcatacatac
    >
    >> name 2
    >>

    > acggtggcat
    >
    >> name 3
    >>

    > gggtaccacgtt
    >
    > I need to concatenate the sequences.. make them look like
    >
    >
    >> concatenated
    >>

    > tactcatacatacacggtggcatgggtaccacgtt
    >
    > thanks. Matt
    >

    A solution using regexp:

    found = []
    for line in open('seqfile.txt'):
    found += re.findall('^[acgtACGT]+$', line)

    print found
    > ['tactcatacatac', 'acggtggcat', 'gggtaccacgtt']


    print ''.join(found)
    > 'tactcatacatacacggtggcatgggtaccacgtt'



    JM
     
    Jean-Michel Pichavant, Feb 12, 2010
    #3
  4. On 2010-02-12, PeroMHC <> wrote:
    > Hi All, I have a simple problem that I hope somebody can help with. I
    > have an input file (a fasta file) that I need to edit..
    >
    > Input file format
    >
    >>name 1

    > tactcatacatac
    >>name 2

    > acggtggcat
    >>name 3

    > gggtaccacgtt
    >
    > I need to concatenate the sequences.. make them look like
    >
    >>concatenated

    > tactcatacatacacggtggcatgggtaccacgtt


    (echo "concantenated>"; grep '^ [actg]*$' inputfile | tr -d '\n'; echo) > outputfile

    --
    Grant
     
    Grant Edwards, Feb 13, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Lasher
    Replies:
    26
    Views:
    773
    Bengt Richter
    Jan 16, 2005
  2. idiotprogrammer
    Replies:
    4
    Views:
    1,187
    Joseph Kesselman
    Mar 5, 2007
  3. kgk
    Replies:
    1
    Views:
    309
    Marc 'BlackJack' Rintsch
    Jul 11, 2007
  4. Replies:
    9
    Views:
    211
    Anno Siegel
    Mar 1, 2006
  5. Carlos

    Concatenate/De-Concatenate

    Carlos, Oct 12, 2012, in forum: VHDL
    Replies:
    10
    Views:
    960
Loading...

Share This Page