I used defaultdic to store some variables but the output is blank

  • Thread starter claire morandin
  • Start date
C

claire morandin

I have the following script which does not return anything, no apparent mistake but my output file is empty.I am just trying to extract some decimal number from a file according to their names which are in another file. from collections import defaultdict import numpy as np

Code:
ercc_contigs= {}
for line in open ('Faq_ERCC_contigs_name.txt'):
gene = line.strip().split()

ercc_rpkm = defaultdict(lambda: np.zeros(1, dtype=float))
output_file = open('out.txt','w')

rpkm_file = open('RSEM_Faq_Q1.genes.results.txt')
rpkm_file.readline()
for line in rpkm_file:
line = line.strip()
columns =  line.strip().split()
gene = columns[0].strip()
rpkm_value = float(columns[6].strip())
if gene in ercc_contigs:
ercc_rpkm[gene] += rpkm_value

ercc_fh = open ('out.txt','w')
for gene, rpkm_value in ercc_rpkm.iteritems():
ercc = '{0}\t{1}\n'.format(gene, rpkm_value)
ercc_fh.write (ercc)

If someone could help me spot what's wrong it would be much appreciate cheers
 
P

Peter Otten

claire said:
I have the following script which does not return anything, no apparent
mistake but my output file is empty.I am just trying to extract some
decimal number from a file according to their names which are in another
file. from collections import defaultdict import numpy as np

Code:
ercc_contigs= {}
for line in open ('Faq_ERCC_contigs_name.txt'):
gene = line.strip().split()[/QUOTE]

You probably planned to use the loop above to populate the ercc_contigs
dict, but there's no code for that.

[QUOTE]
ercc_rpkm = defaultdict(lambda: np.zeros(1, dtype=float))
output_file = open('out.txt','w')

rpkm_file = open('RSEM_Faq_Q1.genes.results.txt')
rpkm_file.readline()
for line in rpkm_file:
line = line.strip()
columns =  line.strip().split()
gene = columns[0].strip()
rpkm_value = float(columns[6].strip())[/QUOTE]

Remember that ercc_contigs is empty; therefore the test
[QUOTE]
if gene in ercc_contigs:[/QUOTE]

always fails and the following line is never executed.
[QUOTE]
ercc_rpkm[gene] += rpkm_value

ercc_fh = open ('out.txt','w')
for gene, rpkm_value in ercc_rpkm.iteritems():
ercc = '{0}\t{1}\n'.format(gene, rpkm_value)
ercc_fh.write (ercc)

If someone could help me spot what's wrong it would be much appreciate
cheers

By the way: it is unclear to my why you are using a numpy array here:
ercc_rpkm = defaultdict(lambda: np.zeros(1, dtype=float))

I think

ercc_rpkm = defaultdict(float)

should suffice. Also:
line = line.strip()
columns = line.strip().split()
gene = columns[0].strip()
rpkm_value = float(columns[6].strip())

You can remove all strip() method calls here as line.split() implicitly
removes all whitespace.
 
C

claire morandin

Thanks Peter, true I did not realize that ercc_contigs is empty, but I am not sure how to "populate" the dictionary if I only have one column for the value but no key
 
P

Peter Otten

claire said:
Thanks Peter, true I did not realize that ercc_contigs is empty, but I am
not sure how to "populate" the dictionary if I only have one column for
the value but no key

You could use a "dummy value"

ercc_contigs = {}
for line in open('Faq_ERCC_contigs_name.txt'):
gene = line.split()[0]
ercc_contigs[gene] = None

but a better approach is to use a set instead of a dict:

ercc_contigs = set()
for line in open('Faq_ERCC_contigs_name.txt'):
gene = line.split()[0]
ercc_contigs.add(gene)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top