T
tejsupra
Hello Everyone,
I need to read a .csv file which has a size of 2.26 GB . And I wrote a
Python script , where I need to read this file. And my Computer has 2
GB RAM Please see the code as follows:
"""
This program has been developed to retrieve all the promoter sequences
for the specified
list of genes in the given cluster
So, this program will act as a substitute to the whole EZRetrieve
system
Input arguments:
1) Cluster.txt or DowRatClust161718bwithDummy.txt
2) TransProCrossReferenceAndSequences.csv -> This is the file that has
all the promoter sequences
3) -2000
4) 500
"""
import time
import csv
import sys
import linecache
import re
from sets import Set
import gc
print time.localtime()
fileInputHandler = open(sys.argv[1],"r")
line = fileInputHandler.readline()
refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)
for data2 in reader2_list:
refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
promoterSequencesinTransPro.append(data2[4])
while line:
l = line.rstrip('\n')
for j in range(1,len(refSeqIDsinTransPro)):
found = re.search(l,refSeqIDsinTransPro[j])
if found:
"""promoterSequencesinTransPro[j] """
print l
line = fileInputHandler.readline()
fileInputHandler.close()
The error that I got is given as follows:
Traceback (most recent call last):
File "RefSeqsToPromoterSequences.py", line 31, in <module>
reader2_list.extend(reader2)
MemoryError
I understand that the issue is Memory error and it is caused because
of the line reader2_list.extend(reader2). Is there any other
alternative method in reading the .csv file line by line?
sincerely,
Suprabhath
I need to read a .csv file which has a size of 2.26 GB . And I wrote a
Python script , where I need to read this file. And my Computer has 2
GB RAM Please see the code as follows:
"""
This program has been developed to retrieve all the promoter sequences
for the specified
list of genes in the given cluster
So, this program will act as a substitute to the whole EZRetrieve
system
Input arguments:
1) Cluster.txt or DowRatClust161718bwithDummy.txt
2) TransProCrossReferenceAndSequences.csv -> This is the file that has
all the promoter sequences
3) -2000
4) 500
"""
import time
import csv
import sys
import linecache
import re
from sets import Set
import gc
print time.localtime()
fileInputHandler = open(sys.argv[1],"r")
line = fileInputHandler.readline()
refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)
for data2 in reader2_list:
refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
promoterSequencesinTransPro.append(data2[4])
while line:
l = line.rstrip('\n')
for j in range(1,len(refSeqIDsinTransPro)):
found = re.search(l,refSeqIDsinTransPro[j])
if found:
"""promoterSequencesinTransPro[j] """
print l
line = fileInputHandler.readline()
fileInputHandler.close()
The error that I got is given as follows:
Traceback (most recent call last):
File "RefSeqsToPromoterSequences.py", line 31, in <module>
reader2_list.extend(reader2)
MemoryError
I understand that the issue is Memory error and it is caused because
of the line reader2_list.extend(reader2). Is there any other
alternative method in reading the .csv file line by line?
sincerely,
Suprabhath