Memory error due to the huge/huge input file size

Discussion in 'Python' started by tejsupra, Nov 10, 2008.

  1. tejsupra

    tejsupra Guest

    Hello Everyone,

    I need to read a .csv file which has a size of 2.26 GB . And I wrote a
    Python script , where I need to read this file. And my Computer has 2
    GB RAM Please see the code as follows:

    """
    This program has been developed to retrieve all the promoter sequences
    for the specified
    list of genes in the given cluster

    So, this program will act as a substitute to the whole EZRetrieve
    system

    Input arguments:

    1) Cluster.txt or DowRatClust161718bwithDummy.txt
    2) TransProCrossReferenceAndSequences.csv -> This is the file that has
    all the promoter sequences
    3) -2000
    4) 500
    """

    import time
    import csv
    import sys
    import linecache
    import re
    from sets import Set
    import gc

    print time.localtime()

    fileInputHandler = open(sys.argv[1],"r")
    line = fileInputHandler.readline()

    refSeqIDsinTransPro = []
    promoterSequencesinTransPro = []
    reader2 = csv.reader(open(sys.argv[2],"rb"))
    reader2_list = []
    reader2_list.extend(reader2)

    for data2 in reader2_list:
    refSeqIDsinTransPro.append(data2[3])
    for data2 in reader2_list:
    promoterSequencesinTransPro.append(data2[4])

    while line:
    l = line.rstrip('\n')
    for j in range(1,len(refSeqIDsinTransPro)):
    found = re.search(l,refSeqIDsinTransPro[j])
    if found:
    """promoterSequencesinTransPro[j] """
    print l

    line = fileInputHandler.readline()


    fileInputHandler.close()


    The error that I got is given as follows:
    Traceback (most recent call last):
    File "RefSeqsToPromoterSequences.py", line 31, in <module>
    reader2_list.extend(reader2)
    MemoryError

    I understand that the issue is Memory error and it is caused because
    of the line reader2_list.extend(reader2). Is there any other
    alternative method in reading the .csv file line by line?

    sincerely,
    Suprabhath
     
    tejsupra, Nov 10, 2008
    #1
    1. Advertisements

  2. tejsupra

    James Mills Guest

    Without testing, this looks like you're reading the _ENTIRE_
    input stream into memory! Try this:

    def readCSV(file):

    if type(file) == str:
    fd = open(file, "rU")
    else:
    fd = file

    sniffer = csv.Sniffer()
    dialect = sniffer.sniff(fd.readline())
    fd.seek(0)

    reader = csv.reader(fd, dialect)
    for line in reader:
    yield line

    for line in readCSV(open("foo.csv", "r")):
    ...

    --JamesMills
     
    James Mills, Nov 10, 2008
    #2
    1. Advertisements

  3. tejsupra

    John Machin Guest


    All you need to do is replace the above by:

    reader2 = csv.reader(open(sys.argv[2],"rb"))

    for data2 in reader2:
    refSeqIDsinTransPro.append(data2[3])
    promoterSequencesinTransPro.append(data2[4])
     
    John Machin, Nov 10, 2008
    #3
  4. tejsupra

    tejsupra Guest

    Thanks a Lot James Mills. It worked
     
    tejsupra, Nov 20, 2008
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.