Memory Problem

Christoph Scheit · Sep 18, 2007

Hi,

I have a short script/prog in order to read out binary files from a numerical
simulation. This binary files still need some post-processing, which is
summing up results from different cpu's, filtering out non-valid entrys
and bringing the data in some special order.

Reading the binary data in using the struct-module works fine - I read
one chunk of data into a tuple, this tupel I append to a list.
At the end of reading, I return the list.

Then the data is added to a table, which I use for the actual Post-Processing.
The table is actually a Class with several "Columns", each column internally
being represented by array.
Now adding all the data from the simulation results to the table makes the
memory usage exploding. So I would like to know, where exactly the memory
is vasted.

Here the code to add the data of one file (I have to add the data of various
files to the same table in total)

# create reader
breader = BDBReader("<var>", "<type>", "#")

# read data
bData = breader.readDB(dbFileList[0])

# create table
dTab = DBTable(breader.headings, breader.converters, [1,2])
addRows(bData, dTab)

Before I add a new entry to the table, I check if there is already an entry
like this. To do so, I store keys for all the entries with row-number in a
dictionary. What about the memory consumption of the dictionary?

Here the code for adding a new row to the table:

# check if data already exists
if (self.keyDict.has_key(key)):
rowIdx = self.keyDict[key]
for i in self.mutableCols:
self.cols[rowIdx] += rowData
return

# key is still available - insert row to table
self.keyDict[key] = self.nRows

# insert data to the columns
for i in range(0, self.nCols):
self.cols.add(rowData)

# add row i and increment number of rows
self.rows.append(DBRow(self, self.nRows))
self.nRows += 1

Maybe somebody can help me. If you need, I can give more implementation
details.

Thanks in advance,

Christoph
--

============================
M.Sc. Christoph Scheit
Institute of Fluid Mechanics
FAU Erlangen-Nuremberg
Cauerstrasse 4
D-91058 Erlangen
Phone: +49 9131 85 29508
============================

Marc 'BlackJack' Rintsch · Sep 18, 2007

Then the data is added to a table, which I use for the actual Post-Processing.
The table is actually a Class with several "Columns", each column internally
being represented by array.

Array or list?

# create reader
breader = BDBReader("<var>", "<type>", "#")

# read data
bData = breader.readDB(dbFileList[0])

# create table
dTab = DBTable(breader.headings, breader.converters, [1,2])
addRows(bData, dTab)

Before I add a new entry to the table, I check if there is already an entry
like this. To do so, I store keys for all the entries with row-number in a
dictionary. What about the memory consumption of the dictionary?

The more items you put into the dictionary the more memory it uses. ;-)

Here the code for adding a new row to the table:

# check if data already exists
if (self.keyDict.has_key(key)):
rowIdx = self.keyDict[key]
for i in self.mutableCols:
self.cols[rowIdx] += rowData
return

# key is still available - insert row to table
self.keyDict[key] = self.nRows

# insert data to the columns
for i in range(0, self.nCols):
self.cols.add(rowData)

# add row i and increment number of rows
self.rows.append(DBRow(self, self.nRows))
self.nRows += 1

Maybe somebody can help me. If you need, I can give more implementation
details.

IMHO That's not enough code and/or description of the data structure(s).
And you also left out some information like the number of rows/columns and
the size of the data.

Have you already thought about using a database?

Ciao,
Marc 'BlackJack' Rintsch

Christoph Scheit · Sep 18, 2007

Array or list?

array

More details:
class DBTable:
# the class DBTable has a list, each list entry referencing a DBColu bject
self.cols = []

self.dict = {, -1} #the dictionary is used to look up if an entry
# already exists

class DBColumn:
# has a name (string and a datatype (int, float, e.g.) as attribute plus
self.data = array('f') # an array of type float

I have to deal with several millions of data, actually I'm trying an example
with
360 grid points and 10000 time steps, i.e. 3 600 000 entries (and each row
consits of 4 int and one float)

Of course, the more keys the bigger is the dictionary, but is there a way to
evaluate the actual size of the dictionary?

Greets and Thanks,

Chris

# create reader
breader = BDBReader("<var>", "<type>", "#")

# read data
bData = breader.readDB(dbFileList[0])

# create table
dTab = DBTable(breader.headings, breader.converters, [1,2])
addRows(bData, dTab)

Before I add a new entry to the table, I check if there is already an
entry like this. To do so, I store keys for all the entries with
row-number in a dictionary. What about the memory consumption of the
dictionary?

Click to expand...

The more items you put into the dictionary the more memory it uses. ;-)

Here the code for adding a new row to the table:

# check if data already exists
if (self.keyDict.has_key(key)):
rowIdx = self.keyDict[key]
for i in self.mutableCols:
self.cols[rowIdx] += rowData
return

# key is still available - insert row to table
self.keyDict[key] = self.nRows

# insert data to the columns
for i in range(0, self.nCols):
self.cols.add(rowData)

# add row i and increment number of rows
self.rows.append(DBRow(self, self.nRows))
self.nRows += 1

Maybe somebody can help me. If you need, I can give more implementation
details.

Click to expand...

IMHO That's not enough code and/or description of the data structure(s).
And you also left out some information like the number of rows/columns and
the size of the data.

Have you already thought about using a database?

Ciao,
Marc 'BlackJack' Rintsch

--

============================
M.Sc. Christoph Scheit
Institute of Fluid Mechanics
FAU Erlangen-Nuremberg
Cauerstrasse 4
D-91058 Erlangen
Phone: +49 9131 85 29508
============================

Gabriel Genellina · Sep 18, 2007

En Tue, 18 Sep 2007 10:58:42 -0300, Christoph Scheit

I have to deal with several millions of data, actually I'm trying an
example
with
360 grid points and 10000 time steps, i.e. 3 600 000 entries (and each
row
consits of 4 int and one float)
Of course, the more keys the bigger is the dictionary, but is there a
way to
evaluate the actual size of the dictionary?

Yes, but probably you should not worry about it, just a few bytes per
entry.
Why don't you use an actual database? sqlite is fast, lightweight, and
comes with Python 2.5

This looks suspicious, and may indicate that your structure contains
cycles, and Python cannot always recall memory from those cycles, and you
end using much more memory than needed.

Bruno Desthuilliers · Sep 18, 2007

Christoph Scheit a écrit :

(snip)

I have to deal with several millions of data, actually I'm trying an example
with
360 grid points and 10000 time steps, i.e. 3 600 000 entries (and each row
consits of 4 int and one float)

Hem... My I suggest that you use a database then ? If you don't want to
bother with a full-blown RDBMS, then have a look at SQLite - it's
lightweight, works mostly fine and is a no-brainer to use.

Of course, the more keys the bigger is the dictionary, but is there a way to
evaluate the actual size of the dictionary?

You can refer to the thread "creating really big lists" for a Q&D, raw
approx of such an evaluation. But it's way too big anyway to even
consider storing all this in ram.

Christoph Scheit · Sep 18, 2007

Hi, Thank you all very much,

so I will consider using a database. Anyway I would like
how to detect cycles, if there are.

This looks suspicious, and may indicate that your structure contains
cycles, and Python cannot always recall memory from those cycles, and you
end using much more memory than needed.

How can I detect if there are cycles?

self.rows is a list containing DBRow-objects,
each itself being an integer pointer (index) to the i-th row.
Im using this list in order to sort the table by sorting the index-list
instead of realy sorting the entries. (or to filter).

--

============================
M.Sc. Christoph Scheit
Institute of Fluid Mechanics
FAU Erlangen-Nuremberg
Cauerstrasse 4
D-91058 Erlangen
Phone: +49 9131 85 29508
============================

Gabriel Genellina · Sep 20, 2007

En Tue, 18 Sep 2007 12:24:46 -0300, Christoph Scheit

How can I detect if there are cycles?

Analyzing your code, or maybe inspecting gc.garbage, or looking at
sys.getrefcount(x)

self.rows is a list containing DBRow-objects,
each itself being an integer pointer (index) to the i-th row.
Im using this list in order to sort the table by sorting the index-list
instead of realy sorting the entries. (or to filter).

What looks strange is the "self" argument to DBRow, since the items are
already contained in self.rows
But it's hard to tell anything more without looking at your code.

Database Manager: A C++ Console Application	14	May 12, 2025
How to add dropdown selected data to table using jquery	2	Jul 2, 2022
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Memory error	3	Mar 24, 2014
Sort by number of characters	1	Nov 2, 2023
Dependent Drop Downs	1	Nov 4, 2023
Problem with psycopg2, bytea, and memoryview	0	Jul 31, 2013
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Memory Problem

Christoph Scheit

Marc 'BlackJack' Rintsch

Christoph Scheit

Gabriel Genellina

Bruno Desthuilliers

Christoph Scheit

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads