determining available space for Float32, for instance

David Socha · May 23, 2006

I am looking for a way to determine the maxium array size I can allocate
for arrays of Float32 values (or Int32, or Int8, ...) at an arbitrary
point in the program's execution. This is needed because Python cannot
allocate enough memory for all of the data we need to process, so we
need to "chunk" the processing, as described below.

Python's memory management process makes this more complicated, since
once memory is allocated for Float32, it cannot be used for any other
data type, such as Int32. I'd like a solution that includes either
memory that is not yet allocated, or memory that used to be allocated
for that type, but is no longer used.

We do not want a solution that requires recompiling Python, since we
cannot expect our end users to do that.

Does anyone know how to do this?

The following describes our application context in more detail.

Our application is UrbanSim (www.urbansim.org), a micro-simulation
application for urban planning. It uses "datasets," where each dataset
may have millions of entities (e.g. households), and each entity (e.g.
household) may have dozens of attributes (e.g. number_of_cars, income,
etc.). Attributes can be any of the standard Python "base" types,
though most attributes are Float32 or Int32 values. Our models often
create a set of 2D arrays with one dimension being agents, and the
second dimention being choices from another dataset. For insances, the
agents may be households that choose a new gridcell to live in. For our
Puget Sound application, there are 1 to 2 million households, and 800K
gridcells. Each attribute of a dataset has such a 2D array. Given that
we may have dozens of attributes, they can eat up a lot of memory,
quickly.

Given the sizes of these arrays, and Python's limited address space,
Python usually cannot allocate enough memory for us to create the entire
set of 2D arrays at once. Instead, we "chunk" the model along the
agents dimension, processing a chunk of agents at a time. Some of our
models can do their work in a single chunk. Others require hundreds of
chunks. It depends upon the number of agents, the number of locations,
the number of agent attributes, and the number of location attributes
used by that particular model.

What we would like is for the code to be able to automatically determine
the number of agents that can be in a single chunk. This requires we
solve two sub-problems.

First, we need to know how many attributes of each type (Float32, Int32,
etc.) will be used by this model. We can do that.

Second, we need to know how much space is available for an array of a
particular type of values, e.g. for Float32 values. Is there a way to
get this information for Python?

Cheers,

David Socha
Center for Urban Simulation and Policy Analysis
University of Washington
www.urbansim.org

Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Determining if any threads are waiting for GIL	0	Dec 20, 2012
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Help for my project in the last minute	0	Apr 23, 2022
[PAID][REMOTE] Hiring programmer/dev for indie game	2	Feb 19, 2023
People are needed for a mental model study of concurrent programming. (>19 years old, English Speaking, Programmers who know concurrency)	1	Sep 19, 2022
Python Descriptor as Instance Attribute	0	Jan 19, 2012
Numpy record array - field names for all dimensions	1	Dec 3, 2008

determining available space for Float32, for instance

David Socha

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads