Memory error due to the huge/huge input file size

Discussion in 'Python' started by tejsupra, Nov 10, 2008.

  1. tejsupra

    tejsupra Guest

    Hello Everyone,

    I need to read a .csv file which has a size of 2.26 GB . And I wrote a
    Python script , where I need to read this file. And my Computer has 2
    GB RAM Please see the code as follows:

    This program has been developed to retrieve all the promoter sequences
    for the specified
    list of genes in the given cluster

    So, this program will act as a substitute to the whole EZRetrieve

    Input arguments:

    1) Cluster.txt or DowRatClust161718bwithDummy.txt
    2) TransProCrossReferenceAndSequences.csv -> This is the file that has
    all the promoter sequences
    3) -2000
    4) 500

    import time
    import csv
    import sys
    import linecache
    import re
    from sets import Set
    import gc

    print time.localtime()

    fileInputHandler = open(sys.argv[1],"r")
    line = fileInputHandler.readline()

    refSeqIDsinTransPro = []
    promoterSequencesinTransPro = []
    reader2 = csv.reader(open(sys.argv[2],"rb"))
    reader2_list = []

    for data2 in reader2_list:
    for data2 in reader2_list:

    while line:
    l = line.rstrip('\n')
    for j in range(1,len(refSeqIDsinTransPro)):
    found =,refSeqIDsinTransPro[j])
    if found:
    """promoterSequencesinTransPro[j] """
    print l

    line = fileInputHandler.readline()


    The error that I got is given as follows:
    Traceback (most recent call last):
    File "", line 31, in <module>

    I understand that the issue is Memory error and it is caused because
    of the line reader2_list.extend(reader2). Is there any other
    alternative method in reading the .csv file line by line?

    tejsupra, Nov 10, 2008
    1. Advertisements

  2. tejsupra

    James Mills Guest

    Without testing, this looks like you're reading the _ENTIRE_
    input stream into memory! Try this:

    def readCSV(file):

    if type(file) == str:
    fd = open(file, "rU")
    fd = file

    sniffer = csv.Sniffer()
    dialect = sniffer.sniff(fd.readline())

    reader = csv.reader(fd, dialect)
    for line in reader:
    yield line

    for line in readCSV(open("foo.csv", "r")):

    James Mills, Nov 10, 2008
    1. Advertisements

  3. tejsupra

    John Machin Guest

    All you need to do is replace the above by:

    reader2 = csv.reader(open(sys.argv[2],"rb"))

    for data2 in reader2:
    John Machin, Nov 10, 2008
  4. tejsupra

    tejsupra Guest

    Thanks a Lot James Mills. It worked
    tejsupra, Nov 20, 2008
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.