determining available space for Float32, for instance

D

David Socha

I am looking for a way to determine the maxium array size I can allocate
for arrays of Float32 values (or Int32, or Int8, ...) at an arbitrary
point in the program's execution. This is needed because Python cannot
allocate enough memory for all of the data we need to process, so we
need to "chunk" the processing, as described below.

Python's memory management process makes this more complicated, since
once memory is allocated for Float32, it cannot be used for any other
data type, such as Int32. I'd like a solution that includes either
memory that is not yet allocated, or memory that used to be allocated
for that type, but is no longer used.

We do not want a solution that requires recompiling Python, since we
cannot expect our end users to do that.

Does anyone know how to do this?

The following describes our application context in more detail.

Our application is UrbanSim (www.urbansim.org), a micro-simulation
application for urban planning. It uses "datasets," where each dataset
may have millions of entities (e.g. households), and each entity (e.g.
household) may have dozens of attributes (e.g. number_of_cars, income,
etc.). Attributes can be any of the standard Python "base" types,
though most attributes are Float32 or Int32 values. Our models often
create a set of 2D arrays with one dimension being agents, and the
second dimention being choices from another dataset. For insances, the
agents may be households that choose a new gridcell to live in. For our
Puget Sound application, there are 1 to 2 million households, and 800K
gridcells. Each attribute of a dataset has such a 2D array. Given that
we may have dozens of attributes, they can eat up a lot of memory,
quickly.

Given the sizes of these arrays, and Python's limited address space,
Python usually cannot allocate enough memory for us to create the entire
set of 2D arrays at once. Instead, we "chunk" the model along the
agents dimension, processing a chunk of agents at a time. Some of our
models can do their work in a single chunk. Others require hundreds of
chunks. It depends upon the number of agents, the number of locations,
the number of agent attributes, and the number of location attributes
used by that particular model.

What we would like is for the code to be able to automatically determine
the number of agents that can be in a single chunk. This requires we
solve two sub-problems.

First, we need to know how many attributes of each type (Float32, Int32,
etc.) will be used by this model. We can do that.

Second, we need to know how much space is available for an array of a
particular type of values, e.g. for Float32 values. Is there a way to
get this information for Python?

Cheers,

David Socha
Center for Urban Simulation and Policy Analysis
University of Washington
www.urbansim.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
DewittMill
Top