match, concatenate based on filename

M

Matt

Hi All,

I am trying to concatenate several hundred files based on their filename.. Filenames are like this:

Q1.HOMOblast.fasta
Q1.mus.blast.fasta
Q1.query.fasta
Q2.HOMOblast.fasta
Q2.mus.blast.fasta
Q2.query.fasta
....
Q1223.HOMOblast.fasta
Q1223.mus.blast.fasta
Q1223.query.fasta

All the Q1's should be concatenated together in a single file = Q1.concat.fasta.. Q2's go together, Q3's and so on...

I envision something like

for file in os.listdir("/home/matthew/Desktop/pero.ngs/fasta/final/"):
if file.startswith("Q%i"):
concatenate...

But I can't figure out how to iterate this process over Q-numbers 1-1223

Any help appreciate.
 
P

Patrick Maupin

Hi All,

I am trying to concatenate several hundred files based on their filename...  Filenames are like this:

Q1.HOMOblast.fasta
Q1.mus.blast.fasta
Q1.query.fasta
Q2.HOMOblast.fasta
Q2.mus.blast.fasta
Q2.query.fasta
...
Q1223.HOMOblast.fasta
Q1223.mus.blast.fasta
Q1223.query.fasta

All the Q1's should be concatenated together in a single file = Q1.concat.fasta.. Q2's go together, Q3's and so on...

I envision something like

for file in os.listdir("/home/matthew/Desktop/pero.ngs/fasta/final/"):
        if file.startswith("Q%i"):
           concatenate...

But I can't figure out how to iterate this process over Q-numbers 1-1223

Any help appreciate.

I haven't tested this, so may have a typo or something, but it's often
much cleaner to gather your information, massage it, and then use,
than it is to gather it and use it in one go.


from collections import defaultdict

filedict = defaultdict(list)

for fname in sorted(os.listdir(mydir)):
if fname.startswith('Q') and '.' in fname:
filedict[fname[:fname.find('.')]].append(fname)

for prefix, fnames in filedict.iteritems():
#print prefix, fnames
concatenate...

HTH,
Pat
 
J

John Gordon

In said:
But I can't figure out how to iterate this process over Q-numbers 1-1223

for i in xrange(1, 1224):
Q = "Q%d" % i
file1 = "%s.HOMOblast.fasta" % Q
file2 = "%s.mus.blast.fasta" % Q
file3 = "%s.query.fasta" % Q
target = "%s.concat.fasta" % Q
concatenate(file1, file2, file3, target)

Assuming that there are exactly three files to be concatenated for each
value of i.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,771
Messages
2,569,587
Members
45,097
Latest member
RayE496148
Top