Discussion in 'Python' started by tejsupra, Nov 10, 2008.

  tejsupra

    tejsupra Guest

    Hello Everyone,

    I need to read a .csv file which has a size of 2.26 GB . And I wrote a
    Python script , where I need to read this file. And my Computer has 2
    GB RAM Please see the code as follows:

    This program has been developed to retrieve all the promoter sequences
    for the specified
    list of genes in the given cluster

    So, this program will act as a substitute to the whole EZRetrieve

    Input arguments:

    1) Cluster.txt or DowRatClust161718bwithDummy.txt
    2) TransProCrossReferenceAndSequences.csv -> This is the file that has
    all the promoter sequences
    3) -2000
    4) 500

    import time
    import csv
    import sys
    import linecache
    import re
    from sets import Set
    import gc

    print time.localtime()

    fileInputHandler = open(sys.argv[1],"r")
    line = fileInputHandler.readline()

    refSeqIDsinTransPro = []
    promoterSequencesinTransPro = []
    reader2 = csv.reader(open(sys.argv[2],"rb"))
    reader2_list = []

    for data2 in reader2_list:
    for data2 in reader2_list:

    while line:
    l = line.rstrip('\n')
    for j in range(1,len(refSeqIDsinTransPro)):
    found =,refSeqIDsinTransPro[j])
    if found:
    """promoterSequencesinTransPro[j] """
    print l

    line = fileInputHandler.readline()


    The error that I got is given as follows:
    Traceback (most recent call last):
    File "", line 31, in <module>

    I understand that the issue is Memory error and it is caused because
    of the line reader2_list.extend(reader2). Is there any other
    alternative method in reading the .csv file line by line?

    tejsupra, Nov 10, 2008
  James Mills

    James Mills Guest

    Without testing, this looks like you're reading the _ENTIRE_
    input stream into memory! Try this:

    def readCSV(file):

    if type(file) == str:
    fd = open(file, "rU")
    fd = file

    sniffer = csv.Sniffer()
    dialect = sniffer.sniff(fd.readline())

    reader = csv.reader(fd, dialect)
    for line in reader:
    yield line

    for line in readCSV(open("foo.csv", "r")):

    James Mills, Nov 10, 2008
  John Machin

    John Machin Guest

    All you need to do is replace the above by:

    reader2 = csv.reader(open(sys.argv[2],"rb"))

    for data2 in reader2:
    John Machin, Nov 10, 2008
  tejsupra

    tejsupra Guest

    Thanks a Lot James Mills. It worked
    tejsupra, Nov 20, 2008
