More efficient array processing

J

John [H2O]

Hello,

I'm trying to do the following:

datagrid = numpy.zeros(360,180,3,73,20)

But I get an error saying that the dimensions are too large? Is there a
memory issue here?

So, my workaround is this:

numpoint = 73

datagrid = numpy.zeros(360,180,3,73,1)

for np in range(numpoint):
datagrid[:,:,:,np,0] = datagrid[:,:,:,np,0] + concgrid[:,:,:,np,0]

But this is SLOW.. what can I do to increase efficiency here? Is there a way
to create the larger array? The program loops through several days actually,
filling the 5th dimension. Eventually I just sum the 5th dimension anyway
(as done in the loop of the workaround).

Thanks!
john


--
Configuration
``````````````````````````
Plone 2.5.3-final,
CMF-1.6.4,
Zope (Zope 2.9.7-final, python 2.4.4, linux2),
Five 1.4.1,
Python 2.4.4 (#1, Jul 3 2007, 22:58:17) [GCC 4.1.1 20070105 (Red Hat
4.1.1-51)],
PIL 1.1.6
Mailman 2.1.9
Postfix 2.4.5
Procmail v3.22 2001/09/10
 
M

Marc 'BlackJack' Rintsch

I'm trying to do the following:

datagrid = numpy.zeros(360,180,3,73,20)

But I get an error saying that the dimensions are too large? Is there a
memory issue here?

Let's see:

You have: 360 * 180 * 3 * 73 * 20 * 8 bytes
You want: GiB
* 2.1146536
/ 0.47289069

Do you have a 32 bit system? Then 2 GiB is too much for a process.

Ciao,
Marc 'BlackJack' Rintsch
 
J

John [H2O]

Thanks for the clarification.

What is strange though, is that I have several Fortran programs that create
the exact same array srtucture... wouldn't they be restricted to the 2Gb
limit as well?

Thoughts on a more efficient work around?
 
M

Marc 'BlackJack' Rintsch

What is strange though, is that I have several Fortran programs that
create the exact same array srtucture... wouldn't they be restricted to
the 2Gb limit as well?

They should be. What about the data type of the elements? Any chance
they are just 4 byte floats in your Fortran code i.e. C floats instead of
C doubles like the default in `numpy`?

Ciao,
Marc 'BlackJack' Rintsch
 
J

John [H2O]

I'm using zeros with type np.float, is there a way to define the data type to
be 4 byte floats?
 
M

Marc 'BlackJack' Rintsch

I'm using zeros with type np.float, is there a way to define the data
type to be 4 byte floats?

Yes:

In [13]: numpy.zeros(5, numpy.float32)
Out[13]: array([ 0., 0., 0., 0., 0.], dtype=float32)

Ciao,
Marc 'BlackJack' Rintsch
 
R

Robert Kern

John said:
I'm using zeros with type np.float, is there a way to define the data type to
be 4 byte floats?

np.float32. np.float is not part of the numpy API. It's just Python's builtin
float type which corresponds to C doubles.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
I

Ivan Reborin

Thanks for the clarification.

What is strange though, is that I have several Fortran programs that create
the exact same array srtucture... wouldn't they be restricted to the 2Gb
limit as well?

Depends on lot of things, as Mark has hinted.
But, why are you rewriting fortran subroutines to py ? Expecially for
this kind of array processing.

And, if it's no secret, would you mind telling what do you need an
array of that size for ?
I cannot think of many uses that would require an array of that size
and that many dimensions, which couldn't be rewritten in a more
efficient manner, to several smaller arrays.
 
J

John [H2O]

No secret at all...

As you might have guessed, it is global model fields that I am working with:

360x180 (lon,lat)

I have three 'z' levels.
(360,180,3)

Then I have different 'fields', usually on the order of ~50-80
(360,180,3,60)

Lastly, I have output for a several timesteps, then those timesteps
ultimately become summed for each field, but in the interim I need..
(360,180,3,60,x)

I am currently working on a f2py solution, but it has caused me some
problems as well...

The format of the data is not something I can change, though I may be able
to read it in in smaller 'chunks', say creating separate arrays for each
field, or day.
 
S

sturlamolden

datagrid = numpy.zeros(360,180,3,73,20)

On a 32 bit system, try this instead:

datagrid = numpy.zeros((360,180,3,73,20), dtype=numpy.float32)

(if you can use single precision that is.)
 

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top